Online Automatic Selection of Features for Visual Servoing Tasks

Submitted by Agnes COTTAIS on
Team
Date of the beginning of the PhD (if already known)
October 2020
Place
IRISA - Campus universitaire de Beaulieu, Rennes
Laboratory
IRISA - UMR 6074
Description of the subject

The quest for finding a good set of features for visual control and 3D Structure from Motion (SfM) is a classical problem in the visual servoing community, and it has indeed attracted a large body of literature over the last decades. A number of approaches has been proposed over the years for exploiting local geometrical primitives, such as points or lines tracked on the image, as visual features to be measured/controlled. A comprehensive overview of these possibilities can be found in [1]. In parallel, another very successful line of research has considered the use of more ‘integral’ image descriptors able to encode the information contained over a region of interest on the image plane (e.g., enclosed by the contour of a tracked object). This indeed usually results in a (relatively) easier extraction, matching and spatio-temporal tracking across multiple frames of the region of interest, thus generally improving the robustness against image processing errors. Image moments of binary dense closed regions or of discrete sets of points [2] are a typical (and by no classical) example of integral features exploited for visually controlling the camera pose [3]–[5]. Further extensions of these ideas have dealt with, e.g., the use of particular kernels for evaluation the image moments [6], or the direct use of pixel intensities [7] by processing the whole acquired image. Image moments from dense regions [8] or discrete point clouds [9]–[11] have also been exploited as visual features for recovering the 3D structure of a planar scene via SfM schemes. 

Despite this state-of-the-art, it is however worth noting that the selection of a good set of image moments for 6-dof visual control or SfM is still an open problem. Ideally, one would like to find a unique set of visual features resulting in the ‘most linear’ control problem with the largest convergence domain, or in maximum observability (i.e., information gain) for a given camera displacement in case of SfM tasks. However, so far, only local, partial (e.g., depending of the particular shape of the object) or heuristic results are currently available. For instance, [2], [6], [7] propose different combinations of image moments able to only guarantee local 6-dof stability of the servoing loop around the desired pose, and with a basin of attraction to be heuristically determined case by case. As for what concerns the SfM case, the choice of which moments to exploit for allowing a converging estimation of the scene structure is also not straightforward. In [8], [11] the area a and barycenter coordinates of a dense region are successfully fed to a SfM scheme based on the (intuitive) motivation that the same set is also the typical choice for controlling the camera translational motion in a servoing loop [2]. However, this intuition breaks down when considering moments of a discrete point cloud: in this case, the typical choice for controlling the camera translational motion is empirically shown in [10] to not provide enough information for allowing a converging estimation of the scene structure. One could argue that the hope of finding a unique set of visual features optimal in all situations might eventually prove to be unrealistic, if not impossible, while it could just be more appropriate (and reasonable) to rely on an automatic and online selection of the best feature set (within a given class) tailored to the particular task at hand. Motivated by these considerations, the goal of this PhD is to propose automatic strategies able to select online the ‘best’ set of image moments in order to optimize the performance of visual servoing schemes (in case of positioning tasks) or SfM schemes (in case of scene estimation tasks). An initial attempt in this direction has been proposed in [11] where an algorithm for locally optimizing online a linear combination of basic image moments in the context of SfM has been proposed. The ideas discussed in [11] are a good starting point and show a good performance in simulation tests, but are still very preliminary in terms of depth and thoroughness of the results. The PhD candidate will then ideally start from what initially proposed in [11] and develop her/his research along the following lines: choice of suitable basis functions (or kernels) for automatically extracting the relevant information from the images; application to the case of visual servoing ([11] only considers the SfM case); exploitation of machine learning techniques; experimental validation on the robots available in the Rainbow team.

Bibliography

1. F. Chaumette and S. Hutchinson, “Visual servo control, Part I: Basic approaches,” IEEE Robotics and Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006

2. O. Tahri and F. Chaumette, “Point-Based and Region-Based Image Moments for Visual Servoing of Planar Objects,” IEEE Trans. On Robotics, vol. 21, no. 6, pp. 1116–1127, 2005.

3. F. Hoffmann, T. Nierobisch, T. Seyffarth, and G. Rudolph, “Visual servoing with moments of SIFT features,” in IEEE Int. Conf. on Systems, Man and Cybernetics, 2006, pp. 4262–4267.

4. E. Bugarin and R. Kelly, “Direct visual servoing of planar manipulators using moments of planar targets,” in Robot Vision, A. Ude, Ed. INTECH, 2010.

5. C. Copot, C. Lazar, and A. Burlacu, “Predictive control of nonlinear visual servoing systems using image moments,” IET control theory & applications, vol. 6, no. 10, pp. 1486–1496, 2012

6. V. Kallem, M. Dewan, J. P. Swensen, G. D. Hager, and N. J. Cowan, “Kernel-based visual servoing,” in 2007 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2007, pp. 1975–1980.

7. M. Bakthavatchalam, O. Tahri, F. Chaumette. A Direct Dense Visual Servoing Approach using Photometric Moments. IEEE Trans. on Robotics, 34(5):1226-1239, October 2018.

8. P. Robuffo Giordano, A. De Luca, and G. Oriolo, “3D structure identification from image moments,” in 2008 IEEE Int. Conf. on Robotics and Automation, Pasadena, CA, may 2008, pp. 93–100.

9. A. De Luca, G. Oriolo, and P. Robuffo Giordano, “Feature depth observation for image-based visual servoing: Theory and experiments,” Int. Journal of Robotics Research, vol. 27, no. 10, pp. 1093–1116, 2008.

10. R. Spica, P. Robuffo Giordano, and F. Chaumette, “Experiments of plane estimation by active vision from point features and image moments,” in 2015 IEEE Int. Conf. on Robotics and Automation, Seattle, WA, May 2015, IEEE Int. Conf. on Robotics and Automation, 2013, pp. 5521–5526.

11. P. Robuffo Giordano, R. Spica, F. Chaumette. Learning the Shape of Image Moments for Optimal 3D Structure Estimation. In IEEE Int. Conf. on Robotics and Automation, Pages 5990-5996, Seattle, WA, May 2015.

Researchers

Lastname, Firstname
François Chaumette
Type of supervision
Director
Laboratory
UMR 6074, Inria

Lastname, Firstname
Paolo Robuffo Giordano
Type of supervision
Supervisor (optional)
Laboratory
UMR 6074, Inria
Contact·s
Keywords
Robotics, visual servoing, SfM, learning