Structure from Motion (SfM) is inspired by our ability to learn about the 3D structure in the surrounding environment by moving through it. Given a sequence of images, we are able to simultaneously estimate both the 3D structure and the path the camera took.
SfM can be solved using either Factorization Method or Algebraic Method. Some of the drawbacks of the Factorization Method are: (1) It assumes that the 3D points are visible in all cameras. (2) It assumes Affine Cameras and therefore, there exists affine ambiguity in the solution in addition to similarity ambiguity.
In Algebraic Method, we assuming a projective camera and then solve for camera poses and 3D points using non-linear optimization for bundle adjustment. We first compute Fundamental Matrix and then use it to find out 4 possible pairs of R,T for each pair of cameras. For each pair of R,T, we then compute the linear estimate of 3D points which is further refined using bundle adjustment for minimizing reprojection errors between pairwise cameras with non-linear optimization method (Gauss-Newton). These 3D points can further be used to filter the best R,T, by enforcing the criteria that all the points should be ahead of camera planes.
Visit the GitHub repository for the implementation of both methods along with bundle adjustment.
Space carving of voxels for building 3d models
RGB Image and Corresponding Silhouette
3D Model built with a series of image silhouettes from various viewpoint.
Contours of an object from various viewpoints provide an immense amount of information about the object 3D geometry. This requires the ability to compute object silhouette from image plane projection, and then enforcing consistency across multiple views.
We start from an initial volume of voxels and then carve away the voxels that do not exist within the visual cone for the object silhouette in the image space. We do this for each camera and compute the intersection of all visual cones to eventually obtain a visual hull.
If we have enough viewpoints available and the corresponding camera poses known, then this visual hull will correspond to a good 3D model representation of the object.
LiDAR-based Ground-Plane Segmentation and Object Detection
fundamental matrix computation
Estimating Fundamental Matrix from point-correspondence between two images is a key component to solving multi-view geometry problems. In multi-view geometry, point-projections in each camera plane are related by this 3x3 matrix - projection of a 3D point in one camera plane corresponds to a line (epipolar line) in other camera planes. It allows us to triangulate 3D points using multiple-views and subsequently determine the 3D geometry.
One of the very well-known algorithms to compute fundamental matrix is called the 8-point algorithm. As the name suggests, it requires at least 8-point correspondences between images and uses linear least-squares to solve for the fundamental matrix using SVD. Since the points are in pixel-space, the magnitude of some elements in linear equation formulation [ Ax = 0 ] tends to be much higher than the others resulting in an average error between pixel and epipolar line is of order ~10px. This issue, however, can be resolved using a variant of the algorithm - normalized 8-point algorithm. This algorithm normalizes the points such that they are centered at the centroid with a mean-squared distance of 2 pixels.
Stereo Rectification is the process of making any two given images parallel such that the epipolar lines are aligned and parallel to the image horizontal axis. This requires generating two homography matrices H1, and H2 for left and right images respectively.
First step in the process is to compute a homography matrix, H2, which moves the right image epipole to infinity along the horizontal axis, thereby making the epipolar lines parallel. Next, the homography matrix, H1, is computed which transforms the left image such that the epipolar lines in left and right images are parallel.
Given both homography matrices, rectified images can be computed using inverse pixel mapping.
camera intrinsic calibration from a single image
Estimating camera intrinsic parameters is essential for any computer vision task. Usually, this entails using a calibration rig of known dimensions which contains a simple pattern like a checkerboard. By using at least 6 correspondences, we can obtain 12 constraints since each correspondence provides us with 2 constraints in x and y. This allows us to solve for extrinsic and intrinsic matrix together, which is a 3x4 matrix and has 11 degrees-of-freedom.
An easier but less accurate way of estimating intrinsic matrix is by making use of orthogonal planes and by leveraging our knowledge about the world. Vanishing points in the image plane define 2D projections of 3D points at infinity, and one can compute the 3D ray vector given the 2D coordinate of the vanishing point and camera intrinsic matrix, K. We utilize this property to get a constraint by each pair of vanishing points. Thus, with a minimum of 3 vanishing points, we get 3 constraints to solve the intrinsic matrix. By making (1) square pixels, and (2) zero-skew assumption, we reduce the degrees of freedom to 3, and can estimate the camera parameters.
Lane Detection and Tracking using Birds-Eye view
homography matrix estimation and inverse pixel mapping
Source Image which needs to be warped.
Warped Image using the Homography Matrix
Homography matrix (H) relates a set of pixel coordinates in source images with a different set of pixel coordinates in the destination image. It is a 3x3 matrix that transforms a source coordinate into a destination coordinate system. In order to find this matrix, we need to build up linear equations relating the two coordinate systems. Since each equation gives us two constraints (x and y), we require a minimum of 4 equations (or corresponding points). These overdetermined set of linear equations, Ax = 0, can then be solved using SVD subject to ||x||=1, which is also the reason that why the last element of 3x3 matrix , H, is equal to 1. This reduces the degree-of-freedom of matrix, H, to be 8.
Please visit the GitHub repository for code implementation of this homography matrix estimation given a set of source and destination points.