Long-term Recurrent Convolutional Networks

This is the project page for Long-term Recurrent Convolutional Networks (LRCN), a class of models that unifies the state of the art in visual and sequence learning. LRCN was accepted as an oral presentation at CVPR 2015. See our arXiv report for details on our approach.

Code

We have created a Pull Request to the official BVLC Caffe repository which adds support for RNNs and LSTMs, and provides an example of training an LRCN model for image captioning in the COCO dataset. To use the code before it is merged into the official Caffe repository, you can check out the recurrent branch of Jeff Donahue's Caffe fork at git@github.com:jeffdonahue/caffe.git. Please find instructions for replicating activity recognition experiments at Activity Recognition. We will update this page as the code is officially released and code for the video description becomes available.

Example Results

Video description (multiple sentences)

Contributors

This research was supported by the Berkeley vision group and BVLC. To cite LRCN with BibTeX, use:
@inproceedings{lrcn2014,
   Author = {Jeff Donahue and Lisa Anne Hendricks and Sergio Guadarrama
             and Marcus Rohrbach and Subhashini Venugopalan and Kate Saenko
             and Trevor Darrell},
   Title = {Long-term Recurrent Convolutional Networks
            for Visual Recognition and Description},
   Year  = {2015},
   Booktitle = {CVPR}
}