Since October 2016, I have been working on some research motivated by Engelmann et. al. from RTWH Aachen. Notably, the two papers Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors and SAMP: Shape and Motion Priors for 4D Vehicle Reconstruction were a great inspiration. My work has been to extend their approach to be faster, more robust, and work on-line. However, the fundamental ideas, particularly with respect to shape representation and optimization, are due to these works.
In this series of articles I intend to write up the step-by-step shape priors pipeline, before discussing my results and extensions of the algorithms detailed in these papers. The articles are intended for the enthusiast, with again, a more technical treatment coming in an upcoming scientific paper.
Introduction: What are Shape Priors?
‘Shape Priors’ is a fancy way of saying that our understanding of objects we can see is informed by our experiences with similar objects we’ve seen before. Humans have an intuitive understanding of the shape of objects. We can easily infer the overall shape of a lamp, or a car or an aeroplane, with a high degree of accuracy - even if we only observe a small piece of it from only one angle. In effect, we can predict the shape of an object from very limited observations, because we have an expectation about that object’s shape. Being able to guess the overall shape of an object from partial information is a very useful skill that allows us to more easily predict and interact with the world.
In contrast, machines have no inherent notion of what objects look like; in computer vision we often regard the world as simply a collection of textured points in space. It is impossible for ‘naive’ machines to predict the overall shapes of objects, because no expectation about shape is encoded into these collections of points. In order for machines to have the same degree of autonomy when interacting with the world, they must have some idea of ‘shape’.
Key to giving machines this ‘idea of shape’ is having a way to represent shapes so that computers can understand and manipulate them.
Representing Shapes with Signed Distance Functions
Our goal is to find a way to represent arbitrary 3D shapes. In the real world, at least at the visual level, shapes are made up of continuous compound surfaces. To a certain degree of simplification, this means that in order to describe an arbitrary shape we would need to store an infinite number of points; clearly this isn’t feasible with finite memory.
Instead of storing every point on the shape, we seek an approximation that is sufficiently good for our purposes. There are many ways to approximate shapes, including as simple point clouds, meshes, implicit surfaces, and others. All of these have their own particular applications, advantages and drawbacks.
A representation of paricular interest is called a Signed Distance Function (SDF). Signed distance functions are very useful because they combine a compact representation of shape with high fidelity and lightweight computations. The key idea is that an arbitrary shape can be implicitly represented by measuring the distances from the surface of that shape to some chosen, finite, set of points.
Computing the signed distance
Signed distance functions are relatively straightforward to understand and compute, but first let’s visualize them in 2D before we extend the idea to three dimensions. Let’s investigate how to represent a curve in 2D space using signed distance functions. First we must understand what is meant by signed distance.
Consider a plane divided into a grid of squares, with a curve passing through it. Label one side of the curve the ‘inside’ and one the ‘outside’ (for closed curves there is a rigorous idea of outside and inside, but for our purposes this is an arbitrary label).
The signed distance from the curve to a point is then simply the shortest euclidian distance between the point and the curve, signed positively if the point is on the outside and negatively if it’s on the inside. Importantly, if a point lies on the curve, the signed distance will be zero.
Signed distance functions
A signed distance function is a mapping from the position of a set of points in space, to their signed distance with respect to the curve we wish to represent. The curve can then be reconstructed as the zero level set of the signed distance function; that is, the set of points for which the signed distance function is 0.
We can compute these signed distances for arbitrary points relative to the curve, but we would like to avoid computing them everywhere; after all there are an infinite number of points to choose from. You can think of each point at which we compute the distance being like a sample on the shape of the curve; each distance contains some information about what the curve looks like. The more densely we sample the distances, the closer the zero level set of the SDF corresponds to the true curve.
Part of the power of signed distance functions comes from this abiliy to choose the best sampling for our needs. For example we can more densely sample areas of the curve that are somehow more ‘interesting’, or we can choose to sample the space near the curve randomly (which is like a monte-carlo approximation for the shape of the curve), or we can choose to sample in a regular grid.
Sampling on a regular grid gives us another advantage that leads to a more compact representation, which we discuss below.
SDFs and Interpolation
Suppose we sample the signed distance to the curve at each vertex of the squares of our regular grid;
We would like to compute a single value for each grid cell, to implicitly represent the signed distance function over the entire space. To do this we linearly interpolate the values at each vertex to the center of the cell.
We repeat this process over the entire space to get the interpolated SDF at every cell. This then gives us an implicit signed distance function;
Though only relatively few values are needed, they are in practice sufficient to accurately find the level set of the implied SDF, and hence the original curve. Several algorithms are available, including the famous marching squares, and raycasting. This decomposition allows us to encode arbitrary closed shapes in a compact, computationally tractable way.
A note on truncation
Grid cells that are relatively distant from the shape don’t really add any extra information when we reconstruct from the implicit SDF to the explicit shape representation. We therefore truncate the values of the signed distance function beyond a certain threshold, in both positive and negative directions. This means that in practice we only need to store the values of the SDF close to the curve, making the representation even more compact and improving the computation efficiency.
This gives rise to the Truncated Signed Distance Function, or TSDF. In every other respect, they function just the same as the SDF.
Extending to 3D
Being able to compute an SDF and recover a shape in 2D is well and good, but in machine perception we wish to be able to represent shapes in the 3D world. Fortunately, SDFs can be readily extended to the 3D case using voxels. Voxels divide 3D space into a uniform lattice.
We compute the signed distance from each vertex to the nearest point on the 3D surface of the shape, and interpolate in three rather than two dimensions. This results in an implicit SDF, with a signed distance value for each voxel.
Putting it Together
We now have a method for encoding 3D shapes as a signed distance function, a convenient and compact representation.
Here we see some CAD models of cars and their corresponding TSDFs (thanks to Engelmann et. al.). This is the first step in working with shape priors for 3D computer vision. We can easily convert between the explicit surface of the shape, and the implicit signed distance function. Furthermore, this compact numerical representation will allow us to analyze and manipulate the underlying shape in some very useful ways.
In part 2, we will discuss some of the ways we can decompose the shape representation even further, and find a common way to represent different shapes from the same class (in our case, road vehicles), using only a few parameters.