Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

Suman Saha, Gurkirt Singh, Michael Sapienza, Philip Torr and Fabio Cuzzolin

Abstract

In this work, we propose an approach to the spatiotemporal localisation (detection) and classification of multiple concurrent actions within temporally untrimmed videos. Our framework is composed of three stages. In stage 1, appearance and motion detection networks are employed to localise and score actions from colour images and optical flow. In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap. In stage 3, sequences of detection boxes most likely to be associated with a single action instance, called action tubes, are constructed by solving two energy maximisation problems via dynamic programming. While in the first pass, action paths spanning the whole video are built by linking detection boxes over time using their class-specific scores and their spatial overlap, in the second pass, temporal trimming is performed by ensuring label consistency for all constituting detection boxes. We demonstrate the performance of our algorithm on the challenging UCF101, J-HMDB-21 and LIRIS-HARL datasets, achieving new state-of-the-art results across the board and significantly increasing detection speed at test time.

Session

Posters 1

Files

Extended Abstract (PDF, 2M)

Paper (PDF, 9M)

Supplemental Materials (ZIP, 14M)

DOI

10.5244/C.30.58
https://dx.doi.org/10.5244/C.30.58

Citation

Suman Saha, Gurkirt Singh, Michael Sapienza, Philip Torr and Fabio Cuzzolin. Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 58.1-58.13. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_58,
        	title={Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos},
        	author={Suman Saha, Gurkirt Singh, Michael Sapienza, Philip Torr and Fabio Cuzzolin},
        	year={2016},
        	month={September},
        	pages={58.1-58.13},
        	articleno={58},
        	numpages={13},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.58},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.58}
        }