Visual Odometry / SLAM

The increasing demand for real-time high-precision Visual Odometry (VO) systems for navigation and localization on computationally constrained platforms has recently been driving research towards more versatile and scalable solutions. Over the years, I have explored the following extensions to pure single-camera motion estimation:
L Kneip, M Chli, and R Siegwart. Robust real-time visual odometry with a single camera and an IMU. In Proceedings of the British Machine Vision Conference (BMVC), 2011 (oral presentation, acc. rate ~8%).
In this paper we present a framework that-in addition to the camera-also takes short-term full 3D relative rotation information from an Inertial Measurement Unit (IMU) into account. This supports the geometric computation and reliably reconstructs the traversed trajectory, even in situations of increased dynamics. Similar sensor-setups are available in all modern smart phones and micro-aerial vehicles. The presented minimal geometric solutions do not suffer from geometric degeneracies, always return a unique solution from only two points, and thus increase the robustness and efficiency of the algorithm. The work is also in defense of purely geometric approaches, which do not rely on the validity of motion models or smoothness assumptions.
T Kazik, L Kneip, J Nikolic, M Pollefeys, and R Siegwart. Real-Time 6D Stereo Visual Odometry with Non-Overlapping Fields of View. In Proceedings of the  IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 (acc. rate ~24%).

Y Wang and L Kneip. On scale initialization in non-overlapping multi-perspective visual odometry. In International Conference on Computer Vision Systems (ICVS), 2017 (best student paper award).

In contrast to a monocular approach, a stereo configuration not only renders motion estimation more robust, but also allows inference of metric scale. The field of view in a classical stereo setup is however limited as both cameras are required to observe the same scene. The following method relaxes this constraint, and allows the cameras to perceive different scenes while still operating in absolute scale. The algorithm operates in real-time and employs information from two cameras with non-overlapping fields of view. The motion of the rig is estimated by first performing monocular visual odometry in each camera individually. The two motion estimates are then used to derive the absolute scale by enforcing the known rigid relative placement of the two cameras. This method is similar to hand-eye calibration. The extended field of view is especially beneficial in poorly textured environments. The approach is backed by automotive vision system designers who increasingly propose almost non-overlapping settings with cameras placed in the side mirrors of the car.
C Forster, S Lynen, L Kneip, and D Scaramuzza. Collaborative Monocular SLAM with Multiple Micro Aerial Vehicles. In Proceedings of The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013.
C Forster, S Lynen, L Kneip, and R Siegwart. Collaborative Visual SLAM with multiple MAVs. In Workshop on Integration of Perception and Control for Resource-Limited, Highly Dynamic, Autonomous Systems (RSS), 2012.
In this work we extended our visual odometry pipeline by a collaborative structure-from-motion back-end. The main idea is to process the keyframes generated by multiple visual odometry nodes running on multiple cameras in parallel, and merge them in one global map. By reusing the relative keyframe orientation information from the visual odometry nodes, the task of the collaborative structure-from-motion back-end is mainly reduced to global optimization, overlap detection, loop closure, and merging of submaps. The concurrent design allows to handle all these tasks in parallel. The system runs in real-time, and it has been tested and applied to videos captured by a swarm of micro aerial vehicles.

Camera Pose Estimation

A major part of my research aims at novel solutions to fundamental geometric camera pose estimation problems. The below is a summary of the problems that I have worked on. Please note that all my algorithms are made available in the open-source library OpenGV, which can be downloaded from github:
Paper: L Kneip and P Furgale. OpenGV: A Unified and Generalized Approach to Calibrated Geometric Vision. In Proceedings of The IEEE International Conference on Robotics and Automation (ICRA), 2014.
Some of the problems require the solution of algebraic geometry formulations, for which I use my own solver generator. It is called polyjam, and also made available as an open-source package on github:
Please visit the Software page for more details.
  • Minimal Absolute Pose

    My most cited work solves the Perspective-Three-Point (P3P) problem (also known as camera resectioning), which aims at determining the position and orientation of a camera in the world reference frame from three 2D-3D point correspondences. Most solutions attempt to first solve for the position of the points in the camera reference frame, and then compute the point-aligning transformation between the camera and the world frame. In contrast, I propose a novel closed-form solution to the P3P problem, which computes the aligning transformation directly in a single stage, without the intermediate derivation of the points in the camera frame. The resulting superior computational efficiency is particularly suitable for any RANSAC outlier-rejection step, which is always recommended before applying PnP or non-linear optimization for the final solution.

    L Kneip, D Scaramuzza, and R Siegwart. A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. (acc. rate: 22.5% posters).
  • n-point absolute pose with generalized cameras

    More recently I also started investigating the absolute pose problem with n point correspondences, and notably in the context of generalized cameras. A generalized camera is an abstraction that allows the treatment of measurements that correspond to spatial rays that do no longer intersect in a single point, and thus differ from most classical monocular cameras. A practically relevant example is given by an extrinsically calibrated multi-camera system (with cameras that are pointing into arbitrary directions). I developped a set of algorithms for which the complexity is linear in the number of point correspondences. My most recent contribution, UPnP, is furthermore applicable to both central and non-central (generalized) cameras, and it computes the solution under a geometrically optimal error criterion.

    L Kneip, P Furgale, and R Siegwart. Using Multi-Camera Systems in Robotics: Efficient Solutions to the NPnP Problem. In Proceedings of The IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe,  2013 (best computer-vision paper award finalist).

    L Kneip, H Li, and Y Seo. UPnP: An Optimal O(n) Solution to the Absolute Pose Problem with Universal Applicability. In Proceedings of The European Conference on Computer Vision (ECCV), 2014.

  • minimal and non-minimal rotation-only relative pose solvers for central and non-central camera systems

    I also looked into the relative pose problem, and notably developped solutions that solve for the rotation independently of the translation. Besides a minimal solution for the central case, I also developped a suite of non-minimal iterative solvers for central and non-central camera systems. Especially in the non-central case, embedding the algorithm into a robust sampling scheme provides a very good trade-off between the number of employed point correspondences and computational efficiency. My WACV'16 contribution-the final step in this series-provides the first fully general solution to the generalized relative pose and scale problem, a problem in which two generalized cameras are registered with respect to each other, knowing that they are calibrated only up to an unknown relative scale factor. Treating scale-invariant view-graphs as virtual generalized cameras, this algorithm enables us to determine the similarity transformation between pairs of view-graphs directly from the original 2D-2D point correspondences (and thus avoids the use of noisy triangulated points). Important applications are given by loop closure in visual SLAM and hierarchical structure from motion.

    L Kneip, R Siegwart, and M Pollefeys. Finding the Exact Rotation Between Two Images Independently of the Translation. In Proceedings of The European Conference on Computer Vision (ECCV), 2012.

    L Kneip and S Lynen. Direct Optimization of Frame-to-Frame Rotation. In Proceedings of The International Conference on Computer Vision (ICCV), 2013.

    L Kneip and H Li. Efficient computation of Relative Pose for Multi-Camera Systems. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2014.

    L Kneip, C Sweeney, and R Hartley. The generalized relative pose and scale problem: View-graph fusion via 2D-2D registration. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2016 (accepted for publication).

    C Sweeney, L Kneip, T Höllerer, and M Turk. Computing Similarity Transformations from Only Image Correspondences. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

Rolling shutter camera calibration

Rolling Shutter (RS) cameras are used across a wide range of consumer electronic devices-from smart-phones to high-end cameras. It is well known, that if a RS camera is used with a moving camera or scene, significant image distortions are introduced. The quality or even success of structure from motion on rolling shutter images requires the usual intrinsic parameters such as focal length and distortion coefficients as well as accurate modelling of the shutter timing. The current state-of-the-art technique for calibrating the shutter timings requires specialised hardware. We present a new method that only requires video of a known calibration pattern. Experimental results on over 60 real datasets show that our method is more accurate than the current state of the art. We further present the first relative pose algorithm for rolling shutter cameras.
L Oth, P Furgale, L Kneip, and R Siegwart. Rolling Shutter Camera Calibration. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

Y Dai, H Li, and L Kneip. Rolling Shutter Camera Relative Pose: Generalized Epipolar Geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

Semi-dense registration

This work introduces a novel strategy for real-time monocular camera tracking over semi-dense depth maps. We employ a geometric iterative closest point technique instead of a photometric error criterion, which has the conceptual advantage of requiring neither isotropic enlargement of the employed semidense regions, nor pyramidal subsampling. We outline the detailed concepts leading to robustness and efficiency even for large frame-to-frame disparities, and demonstrate successful real-time processing over very large view-point changes and significantly corrupted semi-dense depth-maps.
L Kneip, Y Zhou, and H Li. SDICP: Semi-Dense Tracking based on Iterative Closest Points. In Proceedings of The British Machine Vision Conference (BMVC), 2015.

Y Zhou, L Kneip, and H Li. Semi-dense Visual Odometry for RGB-D Cameras using Approximate Nearest Neighbour Fields. In Proceedings of the International Conference on Robotics and Automation (ICRA), 2017.

MAV state estimation

At the beginning of my PhD I collaborated on fusing complementary sensorial information in the context of filter-based position and velocity estimation for our sFly micro aerial vehicles. The filter inputs are given by inertial measurements and camera poses obtained from a visual SLAM algorithm, however only up to an unknown scale-factor. The absolute scale is then recovered in a loosely coupled centripetal acceleration motion filter also containing the scale as an additional variable in the state. We furthermore investigated extensions to handle delayed, dropout-susceptible camera pose measurements.
I put a particular emphasis on the initialization of such Kalman filters. The convergence behavior of the estimated scale factor is largely depending on a good initial value. I therefore investigated deterministic ways for computing the scale and the gravity direction through short term integration of inertial measurements.
S Weiss, M W Achtelik, S Lynen, M C Achtelik, L Kneip, M Chli, and R Siegwart. Monocular Vision for Long-Term MAV State-Estimation: A Compendium. Journal of Field Robotics, Vol. 30, No. 5, pp. 803-831, 2013.
F Bourgeois, L Kneip, S Weiss, and R Siegwart. Delay and dropout tolerant state estimation for MAVs. In O Khatib, V Kumar, and G Sukhatme, editors, Experimental Robotics, volume 79 of Springer Tracts in Advanced Robotics, pages 571–584. Springer, 2014.
L Kneip, S Weiss, and R Siegwart. Deterministic Initialization of Metric State Estimation Filters for Loosely-Coupled Monocular Vision-Inertial Systems. In Proceedings of The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011.
L Kneip, A Martinelli, S Weiss, D Scaramuzza, and R Siegwart. Closed-Form Solution for Absolute Scale Velocity Determination Combining Inertial Measurements and a Single Feature Correspondence. In Proceedings of The IEEE International Conference on Robotics and Automation (ICRA), 2011.
L Kneip, D Scaramuzza, and R Siegwart. On the Initialization of Statistical Optimum Filters with Application to Motion Estimation. In Proceedings of The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010.

Binaural sound localization based on head motion

Before I started my PhD, I investigated the application of a binaural model for artificial spatial sound localization. It is based on the assumption that the set of possible locations for a certain sound source perceived by two different microphones can be asymptotically approximated by a well defined semi-cone in space. Based on time difference of arrival measurements and movements of the corresponding interaural axis, we then obtain a powerful instrument for localizing a static sound source. For instance, it is shown that a single determined rotation of the interaural axis is sufficient to exactly yield the direction of an immobile sound source in the far field, up to an ambiguity, and independently of the speed of sound (and hence of the surrounding medium). Parallax motion unlocks essential information about the distance and the Cartesian coordinates of the sound source. Combining rotation and translation movements completely solves the localization problem. The described principle is strongly inspired by nature, where scientific studies revealed that head rotations are providing essential cues for sound localization, too.
L Kneip and C Baumann. Binaural Model for Artificial Spatial Sound Localization Based on Interaural Time Delays and Movements of the Interaural Axis. Journal of the Acoustical Society of America, Vol. 124, No. 5, pp. 3108-3119, 2008.