Robust visual tracking : coupling 2D motion and 3D pose estimation

Contact: Éric Marchand, Patrick Bouthemy, Francois Chaumette

Creation date: July 1999

Description of the demonstration

Visual servoing needs image data as input to realize robotics tasks such as positioning, docking or mobile target pursuit. This often requires to track the 2D projection of the object of interest in the image sequence. To increase the versatility of visual servoing, objects cannot be assumed to carry landmarks. We proposed an original method for tracking in an image sequence complex objects which can be modeled approximately by a polyhedral shape. The approach relies on the estimation of the object 2D image motion as well as the computation of the object 3D pose. The proposed method fulfills real-time constraints as well as reliability and robustness requirements.


This work was completed in collaboration with EDF (Électricité de France, Pôle industrie, division recherche et développement). Disassembly and monitoring tasks are very important in the nuclear power plant context and automatic system based on a real-time visual feedback are evaluated by EdF.

Scientific context

We have developed a method involving an estimation of the 2D object motion and an estimation of the 3D pose of the object. It supplies a fast and robust tracking of complex objects which can be approximately modeled by a polyhedral shape. An affine 2D motion model is estimated, using a robust statistical method, from the computation of the normal displacements evaluated along the object shape contours with the ECM algorithm. The affine motion model does not always match the real displacement of the object. A second step that consists in fitting the projection of the object shape on the intensity gradients of the image is necessary. This is achieved using an iterative minimization of a non-linear energy function with respect to 3D pose estimation parameters.

The main advantages of this two steps method can be summarized up as follows. The motion estimation step allows us to handle large displacements of the object and to avoid a prediction step. The result of this step is exploited to provide an appropriate initialization to the pose estimation. Our CAD-based tracking only requires a coarse calibration of the camera and a rough model of the object. Both 2D motion estimation and 3D pose estimation do not involve edge detection (we only consider gray level images). Both are robust to partial occlusions of the object. Finally, the algorithm supplies a real-time tracking (currently 10Hz). The efficiency of this method has been demonstrated through various experiments.


The objects have been chosen since they can be considered as quite representative for the applications we are interested in. Most of the experiments reported here involve a nut as the object to be tracked. We chose this object as it is of interest for EdF .

Overview Full Sequence Comments
Mpeg Let us point out that the tracking of the nut silhouette in the image must deal with low intensity contrast, presence of cast shadows, mirror specularities,... Moreover, the nut is not exactly polyhedral, since it presents no physically precisely defined ridges. Furthermore, camera calibration is not precisely known. Despite these difficulties, the proposed method have proven its efficiency to track this object along long image sequences.

In this sequence the difficulties lie also in the multiple occlusions of the nut.

Mpeg The main difficulty here is the very important rotation around the x axis. Furthermore, the illumination conditions are not constant along the sequence.
Mpeg The nut is tracked within a highly textured environment during a visual servoing experiment
Mpeg We have evaluated our tracking algorithm on a still more complex object. Indeed, we consider a serial port connector placed on a newspaper forming a ``cluttered'' background. The CAD model of this objet is be more complex than the nut one, but again, we only built a rough approximate CAD model. Here, we have also to deal with low intensity gradient images, specularities, and no precisely defined contours.
The serial connector is successfully tracked over a 170 frames sequence. The camera performs a large displacement around the object. A face of the object appears while another disappears. Tracking is performed at 3 Hz (this lower rate is mainly due to a more complex CAD model of the object, that leads to a more important number of sites in the evaluation of the energy function).
Mpeg Here, a piece of wood is tracked along a 500 images sequences with fast displacement of the object. Appearance/dissapearance of faces.
Mpeg A micro controller is tracked within a visual servoing experiment.
The sequence features multiple occlusions of the target by various tools.


E. Marchand, P. Bouthemy, F. Chaumette. A 2D-3D model-based approach to real-time visual tracking. Image and Vision Computing, 19(13):941-955, Novembre 2001.

E. Marchand, P. Bouthemy, F. Chaumette, V. Moreau. - Robust real-time visual tracking using a 2D-3D model-based approach. - IEEE Int. Conf. on Computer Vision, ICCV'99 , Kerkira, Greece, September 1999.

| Lagadic | Map | Team | Publications | Demonstrations |
Irisa - Inria - Copyright 2009 Lagadic Project