I’m still adjusting to part-time work this week, plus Oktoberfest is on so I’m not making quite as much progress as I’d like. This should stabilize nicely soon. I really want to get more writing out every week so that my blog isn’t full of these ‘weekly updates’ and nothing else. On the other hand I’m a bit stuck because I don’t want to present stuff without at least some kind of implementation or data.
The LSD Hallucinations article got good traction, about 800 people actually read it. A lot of traffic came from r/computervision and r/robotics, and the commentary there was surprisingly good. Hacker News, somewhat true to form, missed the point completely.
Papers I’m reading
A couple of new papers came out this week in time for NIPS:
Image-based Localization using LSTMs for structured feature correlation. A neat end-to-end approach for feature matching based pose-regression tha can be trained end-to-end.
- Camera pose regression using an LSTM combined with a CNN, trained end-to-end.
- Basically ‘learns’ good features to use for localization, instead of using hand-tuned ones like SIFT.
- CNN for feature extraction, LSTM for feature correlation.
Video Object Segmentation Without Temporal Information Slightly misleading title, but is a semi-supervised approach to object segmentation. Information actually is propagated between frames, but without the use of a temporal tracker.
- First frame has a foreground segmentation of the desired object.
- Two-stage process; train to segment foreground/background in general (‘this is an object’), then a few more iterations to get a particular object (‘this object in particular’).
- Do something called ‘semantic propagation’, which is the meat of the paper and I still need to dig into it.
I’m trying to go a bit deeper on what’s really possible with ML in robotics and computer vision. These are some of the first results that have made me confident there’s something to this beyond single images after all.
I managed to get my hands on the LIDAR data accompanying the KITTI odometry dataset, so now I can do a few more things. First I’m going to check my approach VS the naive approach, using the LIDAR estimates as ground-truth data.
I’m considering doing a ‘part one’ article on this since there’s already a bit to talk about.
- Try empirical error from ground truth, depth image
- Try emporical error from gound truth, ‘native’ TSDF way
- Run the algo over just the LIDAR data.
Good response to the article, now I need to allocate some time to the math. I’ll leave this one aside for now.
Found a great tutorial for doing what I want with Blender - Lego Blocks might be a good test subject. Still, there’s a lot of code to write here.
- Write paper review as article.
Human Intent Prediction
I generated a hypothesis that velocities in the human pose (measured as the arm extension and relative angle to the grasp point) should predict the acceleration of the table system. i.e., changes in the human’s pose indicate that the human wants to do something else than what’s happening now.
- Found some odd artifacts in the table’s motion, specifically straight line motion features a lot of acceleration I wasn’t expecting. Investigating that.
- Once the straight line is debugged, test this relative pose velocity assumption.
Working 2 days this week.