Learning from Demonstration (on-going work)

  • This is an ongoing project examining robotic few-shot multi-task learning using a Learning From Demonstration + GAIL framework.
  • More details coming soon.

PersonLab: Person Pose Estimation and Instance Segmentation (ECCV '18)

  • This work presents a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model.
  • Posted to Arxiv. (link)

Learning Actionable Representations from Visual Observations (IROS '18)

  • An archiecture for video-based action recognition using a novel latent prediction loss to constrain and improve latent representations.
  • CVPR workshop paper. (link)

Temporal Reasoning in Videos using Convolutional Gated Recurrent Units (CVPR Workshop '18)

  • A novel framework for learning robust visual features for training robotic agents. Building upon the TCN framework, we show that our learned representations are as performant as policies trained from true state representations.
  • Project website. (link). Posted to arxiv. (link)

Discovery of Semantic 3D Keypoints via End-to-end Geometric Reasoning

  • This work presents a semi-supervised method to recover semantically consistent 3D keypoints from weakly labeled RGB data.
  • Project website. (link). Posted to arxiv. (link)

Learning Robotic Manipulation of Granular Media (CoRL '17)

  • This paper examines the problem of robotic manipulation of graunular media, where we learn predictive models of granular media dynamics to perform scooping and dumping actions.
  • Posted to arxiv. (link)
  • Submission video (link)

Human Pose (CVPR '17)

  • State-of-the-art RGB human pose on MSCOCO.
  • Posted to arxiv. (link)
  • Slides from an invited talk at RSS '17 Articulated Tacking Workshop (link)

Fluid Simulation (ICML '17)

  • A learning-based system for simulating fluids in real-time.
  • Posted to arxiv. (link)
  • Project page is now up. Code and data here.

Open Source Tools

  • I've written or contributed to many OSS tools over the years. This is a short list of some:
    • jtorch - Torch7 Utility Library for running models in OpenCL / C++
    • jcl - OpenCL Wrapper (to make OpenCL easier)
    • torchzlib - A utility library for zlib compression / decompression of Torch7 tensors
    • matlabnoise - Matlab procedural noise library
    • matlabobj - Matlab obj reader
    • torch7 - I'm a reasonably regular contributor to torch and it's various packages
    • icp - A C++ Iterative Closest Point library (with Matlab interface)
    • ik - A very simple inverse kinematics library (in C++)
    • ModelFit - Off-line fitting portion of the hand-tracking paper below
    • jzmq - A ZeroMQ Utility Library (C++)
    • There are probably others... See my github page (link) for more details

PhD Thesis ('15)

Motion Capture (CVPR '15)

  • Joint work with the amazing folks over at MPII. (They definitely did the lions share of the work on this project!)
  • State-of-the-art system for obtaining high quality motion capture in arbitrary scenes from a low number of cameras.
  • Accepted to CVPR '15. (pdf) (project page)

Body Tracking (CVPR '15)

  • Improved our own state-of-the-art (NIPS '14, ACCV '14) by introducing:
    • A novel cascaded architecture to help overcome the effects of MaxPooling.
    • A modified dropout that works better in the presence of spatially-coherent activations.
  • Accepted to CVPR '15 and posted on arxiv (pdf).
  • Here are the predictions for our model: Note: The MPII dataset analysis code has been verified against the code at MPII (so it's correct and fair). I'm supplying the validation set (which has ground-truth joint positions) so that you can make sure your own code matches ours. Particularly, the MPII guys apply a head-box scale of 0.6 before calculating PCKh (making performance worse), which a lot of people miss because it's not mentioned in their paper.
  • We turked the MPII images for images that contain only a single person. The image list can be found here.
  • The Matlab array of our test and training set can be found here.
  • Here are the Matlab figures (from the paper) for FLIC and MPII: figures_flic_mpii.zip.

Local Image Descriptor (Submitted Work)

  • Designed a SIFT-like descriptor, which outperforms all existing state-of-the-art.
  • More details will be available soon!
  • Submitted to major conference (decision pending).

Body Tracking (NIPS '14)

  • Following ICLR '14: we substantially improved the architecture, incorporated the MRF into the ConvNet and significantly outperformed existing state-of-the-art.
  • Accepted to NIPS '14 and posted on arxiv (pdf)
  • For comparison with our model we've released our LSP and FLIC predictions
  • We've also released the FLIC-plus dataset

Body Tracking (ACCV '14)

  • We investigated the use of motion features when training ConvNet architectures.
  • For ambiguous poses with poor image evidence (such as detecting the pose of camouflaged actors), we showed that motion flow features allow us to outperform state-of-the-art techniques.
  • Accepted to ACCV '14 and posted on arxiv (pdf)

Slow-feature Auto-encoder (NIPS '14 Workshop)

  • We presented a sparse auto-encoder architecture to make use of temporal coherence. This formulation enables pre-training on unlabeled video data (of which there is a massive abundance), to improve ConvNet performance.
  • Submitted to major conference (decision pending) also accepted to NIPS '14 workshop (pdf).

Body Tracking (ICLR '14)

  • It was a new architecture for human pose estimation using a ConvNet + MRF spatial model.
  • First paper to show that a variation of deep learning could outperform existing architectures.
  • Accepted to ICLR '14 (pdf)

Hand Tracking (SIGGRAPH '14)

  • A novel method for real-time pose recovery of markerless complex articulable objects from a single depth image. We showed state-of-the-art results for real-time hand tracking.
  • Accepted to TOG and presented at SIGGRAPH'14 (pdf) (ppt)
  • Dataset is now public!
  • Offline fitting code is now public!

Distributed Locking Protocol (Summer Internship)

  • Worked at MongoDB Inc with the server kernel team (under Alberto Lerner).
  • I developed a new distributed lease protocol (for the sharding config server) using a heavily modified 2-phase commit with timeout mechanism.

Randomized Decision Forests (Early Hand-Tracking Research)

ARCADE (SIGGRAPH Realtime Live '12)

  • ARCADE was a system that allowed real-time video-based presentations that convey the illusion that presenters are directly manipulating holographic 3D objects with their hands.
  • Group project with the MIT Media Lab and NYU Media Research Lab: Jonathan Tompson, Ken Perlin, Murphy Stein, Charlie Hendee, Xiao Xiao Hiroshi Ishii.
  • The content in the above video was presented at “SIGGRAPH '12 - Real-Time Live!”.

Mesh Decimation

PRenderer - A Pretty Renderer

  • DirectX 9c & OpenGL 4.2 deferred rendering engine with the following features:
    • Compact G-Buffer encoding
    • Light volume optimizations
    • Soft shadows using PSVSM
    • Screen space ambient occlusion
    • Motion blur
    • HDR Rendering pipeline
    • Depth of Field

XNA 3.1 Game - Hungry Bee!

  • Take control of "Bumble the bee" and rescue all his bee-friends!
  • Custom 3D, impulse-based physics engine with RK4 integrator.
  • Vertex and Fragment Shaders for cell-shaded cartoon effects and post-processing.
  • Game settings and level design implemented with document-driven programming for easy editing without re-compilation.
  • Art and music assets come from various open-source game-dev resource sites.
SVN Repository & Source: code.google.com/p/hungrybee/

OBB Tree Collision Detection

  • A physics engine to showcase a practical implementation of the paper: Gottschalk et. al., "OBBTree: A Hierarchical Structure for Rapid Interference Detection" (SIGGRAPH '96).
  • RK4 integrator with full 3D RBO simulation implemented.
  • Performs offline covariance-based OBB fitting of general polygonal soups. Including convex hull generation for statistically optimal OBB axes.
  • Runtime collision detection of OBB nodes using Gottschalk's algorithm based on the separating axis theorem. O(log(n)) algorithm and optimized code suitable for realtime applications.
  • 3D models and background from 3rd party sources.
SVN Repository & Source: code.google.com/p/obbdetection/

Cloth Simulation Engine

  • Deformable object engine based on the SIGGRAPH paper: “large steps in cloth simulation”.
  • Optimized Baraff and Witkin’s algorithm by simplifying shear and bend force calculations, while maintaining the ODE complexity for damped harmonic oscillations between cloth vertices.
  • Engine achieves good stability for unconstrained cloth simulations.
SVN Repository & Source: code.google.com/p/clothsim/

3D Sound Rendering

  • Created a real-time sound renderer to synthesize 3D acoustic cues from 2D sources.
  • This work was loosely inspired by the paper: C.P. Brown et. al., "A Structural Model for Binaural Sound Synthesis", IEEE Transactions on Speech and Audio Processing (1998).

Software Administrator at Epoch Microelectronics

  • Developed programs to aid circuit design and verification (lisp/skill):
    • Custom place-and-route tools
    • Simulation engine add-ons to verify circuit design performance
    • Database management
  • Server maintenance for use in batch-computing environments for simulation suites (Spectre, Columbus, etc).