In part 1 of this series, we discussed using truncated signed distance functions to represent shapes. The next step in the pipeline is to find a way to parametrize a whole class of shapes in a unified way. This will allow us to later reason about that class collectively, instead of only working with particular shape instances.
Introduction: What is a shape embedding?
With TSDFs we have an accurate and efficient way of representing any shape - whether a car, a person, a building, or anything else. However, because they’re so general, TSDFs don’t allow us to reason about the common properties of the shapes of objects of a given class in a unified way. We’d like to find a way to constrain our shape representation even further, so that we can answer questions like ‘what is the car-like shape that best fits to the data we have observed’. We’d also like to generate and compare car-like shapes from just a few numbers.
Shape embeddings can be thought of as a parametrization of shapes based on their most salient features. For example; though the shapes of cars differ from one another in many ways, some of the biggest sources of difference are their lengths, heights, widths, whether or not they have a hatchback. We’d like to find a way to describe the shapes of all cars in terms of only a small number of their most salient properties. We’d like to ‘encode’ the shapes of all cars into only a small vector.
More formally, an embedding is a mapping from a high dimensional space onto a lower dimensional space. We would like to find such a mapping that is smooth and differentiable, i.e. one that, given smooth changes in the lower dimensional space, produces smooth changes in the higher dimensional space. We will see in a future post that this allows us to perform optimization in this lower dimensional space to efficiently approximate the shapes of objects.
Principle Component Analysis (PCA) is a classic data analysis technique, originating from mechanical engineering. I will omit the rigorous mathematical treatment of PCA here as the wikipedia article covers it nicely. There are many intuitive ways to think about what PCA does, however the one I find most useful is to consider the resulting principal components as an orthogonal basis for the vectors of the higher dimensional space, with each basis vector corresponding to some ‘axis of variability’ in the data. The first axis has the greatest variability, the second has the next greatest, and so on (some readers will note that this seems like an orthogonal linear transformation, and it is!).
A convenient way to compute the principal components is via Eigendecomposition, also known as Spectral Decomposition. Suppose we have some vectors from an dimensional space, . We compute the covariance matrix, over our vectors. has the nice property of being real valued and symmetric () by definition, and hence is always diagonalizable.
We may then factorize the covariance matrix as , where is an matrix whose columns consist of the eigenvectors of , and the diagonal matrix of corresponding squared eigenvalues. The vectors in then form the basis in the space of the PCA. Projecting the points in to the new basis then amounts to computing;
where is the mean over the samples in .
Note that this projection also results in an N-dimensional space, . In order to get a low dimensional mapping, , all we need to do now is discard the basis vectors corresponding to low variability, i.e. construct as an matrix consisting of the eigenvectors in corresponding only to the largest squared eigenvalues in . Hence , and we have acheived our dimensional reduction.
From TSDF to PCA
As a linear mapping, accomplishes our goal of finding a smooth, differentiable mapping to a lower dimensional space. Applying this mapping to our TSDF representation is straightforward. We need only unroll the 3-D TSDF to a 1-D vector representation. Which order we do this is unimportant, so long as we maintain the same ordering for each TSDF.
We then apply PCA to the resulting vector. Engelmann et. al. choose a 5-component . The result is illustrated below;
We now have a low-dimenional shape embedding, through a linear mapping. This will allow us to later deal with efficiently optimizing over shapes using only a few variables that encode the most salient aspects, rather than over the entire high dimensional TSDF representation. Coincidentally, we also compute a ‘mean car shape’, which will be useful to us later on in initializing our optimization.
Getting back to a TSDF
We would also like to be able to reconstruct the TSDF, and hence a 3-D representation of our shape. Fortunately, since is a linear mapping, reprojecting back into the TSDF space is also simple and computation. To get back to from we need only compute;
Of course, in performing the projection, information about the shape will be lost. An intuitive way of thinking about this is that discarding eigenvectors corresponds to an ‘averaging’ over the dimension that was discarded, and encoding only this average over the remaining eigenvectors. The results of perfoming PCA over TSDFs, reducing to five components, then reprojecting back to the shape is visualized in the following;
Clearly some detail is lost. However, we can make this tradeoff in a principled way, since the magnitude of the discarded eigenvalues corresponds to the variance of the discarded components. Now that we have a reasonable low-dimensional shape representation, we may perform all sorts of manipulations in this space, including qualitatively observing the influence of the various components.
While other approaches to creating embeddings for shapes exist, including recent machine-learning approaches with autoencoders and generative adversarial networks, in practice PCA is suitable for our immediate needs. In part 3, we will examine how to use the TSDF and the low dimensional embedding to solve an important class of problem in photometric computer vision, using a principled optimization approach.