Learning temporal structures for human activity recognition
Tiantian Xu and Edward Wong
Abstract
We propose a hierarchical method for learning temporal structures for the recognition
of complex human activities or actions in videos. Low level features (HOG, HOF, MBHx
and MBHy) are first computed from video snippets to form concatenated feature vectors.
A novel segmentation algorithm based on K-means clustering is then used to divide the
video into segments, with each segment corresponding to a sub-action with uniform motion characteristics. Using low level features as inputs, a many-to-one encoder is trained
to extract generalized features for the snippets in each segment. A second many-to-one
encoder is then used to compute higher-level features from the generalized features. The
higher-level features from individual segments are then concatenated together and used
to train a third many-to-one encoder to extract a high-level feature representation for the
entire video. The final descriptor is the concatenation of higher-level features from individual segments and the high-level feature for the entire video.
Session
Posters
Files
Paper (PDF)
DOI
10.5244/C.31.160
https://dx.doi.org/10.5244/C.31.160
Citation
Tiantian Xu and Edward Wong. Learning temporal structures for human activity recognition. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 160.1-160.13. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_160,
title={Learning temporal structures for human activity recognition},
author={Tiantian Xu and Edward Wong},
year={2017},
month={September},
pages={160.1-160.13},
articleno={160},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.160},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.160}
}