Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model
Hany El-Ghaish, Mohamed Hussein and Amin Shoukry
Abstract
Human action recognition is a challenging problem, especially in the presence of
multiple actors and/or multiple scene views. In this paper, multi-modal integration and
a hybrid deep learning architecture are deployed in a unified action recognition model.
The model incorporates two main types of modalities: 3D skeletons and images, which
together capture the two main aspects of an action, which are the body motion and part
shape.
Instead of a mere fusion of the two types of modalities, the proposed model
integrates them by focusing on specific parts of the body, whose locations are known
from the 3D skeleton data. The proposed model combines both Convolutional Neural
Networks (CNN) and Long Short Term Memory (LSTM) deep learning architectures
into a hybrid one. The model is called MCL, for (M)ulti-Modal (C)NN + (L)STM.
MCL consists of two sub-models: CL1D and CL2D that simultaneously extract the
spatial and temporal patterns for the two sought input modality types. Their decisions
are combined to achieve better accuracy. In order to show the efficiency of the MCL
model, its performance is evaluated on the large NTU-RGB+D dataset in two different
evaluation scenarios: cross-subject and cross-view. The obtained recognition rates, 74.2
% in cross-subject and 81.
Session
Posters
Files
Paper (PDF)
Supplementary (PDF)
DOI
10.5244/C.31.84
https://dx.doi.org/10.5244/C.31.84
Citation
Hany El-Ghaish, Mohamed Hussein and Amin Shoukry. Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 84.1-84.13. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_84,
title={Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model},
author={Hany El-Ghaish, Mohamed Hussein and Amin Shoukry},
year={2017},
month={September},
pages={84.1-84.13},
articleno={84},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.84},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.84}
}