We are co-organizing the 2nd International Workshop on Recovering 6D Object Pose, in conjunction with ICCV'15.
Object Detection and 3D Pose Estimation
Detecting poorly textured objects and estimating their 3D pose reliably is still a very challenging problem. We introduce a simple but powerful approach to computing descriptors for object views that efficiently capture both the object identity and 3D pose.
By contrast with previous manifold-based approaches, we can rely on the Euclidean distance to evaluate the similarity between descriptors, and therefore use scalable Nearest Neighbor search methods to efficiently handle a large number of objects under a large range of poses. To achieve this, we train a Convolutional Neural Network to compute these descriptors by enforcing simple similarity and dissimilarity constraints between the descriptors.
We show that our constraints nicely untangle the images from different objects and different views into clusters that are not only well-separated but also structured as the corresponding sets of poses: The Euclidean distance between descriptors is large when the descriptors are from different objects, and directly related to the distance between the poses when the descriptors are from the same object. These important properties allow us to outperform state-of-the-art object views representations on challenging RGB and RGB-D data.
Descriptor Evolution during the optimization of the CNN
Code and DataCode: This is the code for our CVPR'15 paper "Learning Descriptors for Object Recognition and 3D Pose Estimation". It is distributed in two packages: readme file explaining some basics. Over time, when we receive feedback, we will provide more details. Meanwhile, if you have questions just contact us.
The data used in the paper is essentially the LineMOD dataset created by Stefan Hinterstoisser. However, we have our own way to render the synthetic images with Blender and a median-inpainting-filter for the real-world Kinect depth data.
Thus, here we provide our version of the data for unmodified use with the code above:
ape, benchviseblue, bowl, cam, can, cat, cup, driller, duck, eggbox, glue, holepuncher, iron, lamp, phone, camOrientations.txt, camPositionsElAz.txt.
Additionally, you can download the Blender file we used to render the synthetic data.Update (Oct 15th 2015):
As state above the data was taken from the LineMOD dataset. There the origin
of the objects was defined as a central point the object stands on (or really
the center of the marker-board on which they were captured).
For cropping the training and test images, we defined the "center point" which the
camera looks at to be the point (0,0,5) (in cm, so 5cm above ground).
Recently, however, for further work
Wadim Kehl decided to go fo a
more practical scheme and centered all objects.
You can download the updated blender file and
the corresponding ground-truth poses for the real-world data sequences here. The poses file contains a poseXXXX.txt file for each image containing a homogenous 4x4 transformation matrix that maps camera world coordinate to camera coordinates. The translation part of these matrices is in meters!
Also, if you do not want to render the images yourself, but use the images
exactly how we cropped them, you can download the set of images cropped from GT location and rescaled to 64x64.
TODO: New code to work with this data will follow.Again, the readme file contains information about how to use this data with the code.