Two-Stream SR-CNNs for Action Recognition in Videos

Yifan Wang, Jie Song, Limin Wang, Luc Van Gool and Otmar Hilliges

Abstract

Human action is a high-level concept in computer vision research and understanding it may benefit from different semantics, such as human pose, interacting objects, and scene context. In this paper, we explicitly exploit semantic cues with aid of existing object detectors for action recognition in videos, and thoroughly study their effect on the recognition performance for different types of actions. Specifically, we propose a new deep architecture by incorporating object/human detection results into the framework for action recognition, called two-stream semantic region based CNNs (SR-CNNs). Our proposed architecture not only shares great modeling capacity with two-stream input augmentation, but also exhibits the flexibility of leveraging semantic cues (e.g. scene, person, object) for action understanding. We perform experiments on UCF101 dataset and demonstrate its superior performance to the original two-stream CNNs. In addition, we systematically study the effect of incorporating semantic cues on the recognition performance for different types of action classes, and try to provide some insights for building more reasonable action benchmarks and robust recognition algorithms.

Session

Posters 2

Files

PDF iconExtended Abstract (PDF, 2M)
PDF iconPaper (PDF, 3M)
ZIP iconSupplemental Materials (ZIP, 194K)

DOI

10.5244/C.30.108
https://dx.doi.org/10.5244/C.30.108

Citation

Yifan Wang, Jie Song, Limin Wang, Luc Van Gool and Otmar Hilliges. Two-Stream SR-CNNs for Action Recognition in Videos. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 108.1-108.12. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_108,
        	title={Two-Stream SR-CNNs for Action Recognition in Videos},
        	author={Yifan Wang, Jie Song, Limin Wang, Luc Van Gool and Otmar Hilliges},
        	year={2016},
        	month={September},
        	pages={108.1-108.12},
        	articleno={108},
        	numpages={12},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.108},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.108}
        }