Nouvelle page www
My Phd thesis, which will be soon defended, is titled "Real-Time Simultaneous localization and mapping using monocular vision using planar structures". It was done under a partnership between the national computer science lab IRISA (Rennes, France) and the R&D department of Orange (Major french telecom firm). My thesis supervisor is Dr. Eric Marchand (Lagadic, IRISA) and my industrial supervisor is Pascal Houlier (R&D engineer from Orange Labs). During this thesis, some conference papers were written and a software prototype was delivered to the firm.
Orange is interested in the problem of Real Time augmented reality using mobile devices to build new services for customers. Augmented reality presents many industrial and scientific problems. This thesis focuses on the real time estimation of a camera movement in indoor and urban outdoor environments. It describe a solution using planes which are numerous in these kinds of environments. A low cost camera is used and is fused with MEMS inertial sensors. From initialization to tracking, a complete processing pipeline is shown. It allows a complete automation of the movement estimation process and a long term use. Results videos are shown in this web page and prove the effectiveness of this method. The whole camera estimation process is composed of several part which are described above :
Before my thesis, i obtained a research master in 'Complex systems simulation and modelisation' at the Calais University (France) in 2005. The research training done during this master focused on Genetic Programming. The goal was to study and realize a genetic programming system for stack based machine. Generated programs are directly injected in the Java virtual machine in assembly code. Using a stack based machine strongly constrains the program generation but generate better structures and much faster programs than classical logical trees based programs. This training concluded on an article for an international conference and a plugin for the famous JEB library.
This thesis is based on the idea of 'Monocular Simultaneous Localization and Mapping" as introduced by Dr. Davison (London Imperial College, U.K.). Because indoor and urban environment contains many planes, this thesis focused on an adapting Monocular SLAM to use planes as geometric primitives. This method handles measurements uncertainties, thus providing robust estimation. Using planes improve estimation by giving more information than points. Using planes also reduce the size of computation costs by collecting map items which belongs to the same plane.
Image region tracking is done using an enhanced version of the homography tracker as described by Dr. Benhimanne (INRIA, Sophia Antipolis). This tracker enables very fast and robust tracking of planar regions. The homography estimated by the region tracker is the input of our EKF-SLAM. The tracking quality is improved by using the SLAM measurement prediction to initialize the minimization (Allowing larger movements between two images). The homography matrix is compared to its associated prediction to reject outliers and improve robustness of estimation.
One problem of monocular EKF-SLAM is the need of gaussian random variables for the state. Camera measurements are in two dimensions. One dimension is lost with measurement and so must be considered as an uniform random variable. Thus, it's impossible to introduce a new plane using classical representation without any knowledge about it using only one camera measurement. One solution is shown in this thesis to introduce a plane using only one view and a special representation. This solution enables the use of plane without any a priori about it.
Another problem is to know which part of the image is planar. Which region must be chosed for tracking ? A first solution, based on template recognition, is proposed. Because our template database contains only planar regions, recognition of these templates in the image stream is a sufficient proof that a the region is planar. A Sift Based method with a hierarchical K-Means is proposed.
The other solution is to extract, from the images sequence, regions whom transformation is similar to a plane transformation. Classical solution is to consider a point cloud. After a sufficient translation, points respecting the same homography are extracted (e.g. using RANSAC). Regions contained inside the contours built with theses points are considered planar. This solution has a major drawback : A sufficient translation is necessary to extract planes and thus a (important) delay is mandatory. This is not possible in Real-Time pose tracking because there may be some time without any measurements. This thesis proposes a solution called "Simultaneous Localization and Planar regions extraction". Each point of the cloud has its depth added to the SLAM estimation. A delaunay triangulation on the point cloud is computed. At each new frame, the fundamental matrix between the first view and the current view is computed (LMedS). This fundamental matrix is used to compute an homography for each triangle. This homography is an input for our slam method. When all points depth of a same triangle are known enough, it is grouped with triangles with same plane parameters. The created region is then considered planar and used as described before. This way, pose continue to be estimated even if plane extraction is not finished.
Because image measurements are noisy, because image may not contain any useful information, we studied the fusion of inertial sensors measurements. Lot of work has been done to synchronize both measurements unit and to calibrate their relative rotation. Those sensors work at a 100Hz frequency. They give both translation acceleration and rotation speed. They compensate the cameras problems. They give us information about the scene scale factor, which is impossible using only a camera without any model of the environment. Statistically, they reduce the estimation uncertainty, thus improving numerical stability and the measurement prediction : All the method is made more robust using inertial sensors.
All the previously proposed solution put together give a robust, fast and long term estimation pipeline of the camera transformation.
Complete list (with postscript or pdf files if available)