Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model

Hany El-Ghaish, Mohamed Hussein and Amin Shoukry

Abstract

Human action recognition is a challenging problem, especially in the presence of multiple actors and/or multiple scene views. In this paper, multi-modal integration and a hybrid deep learning architecture are deployed in a uniﬁed action recognition model. The model incorporates two main types of modalities: 3D skeletons and images, which together capture the two main aspects of an action, which are the body motion and part shape. Instead of a mere fusion of the two types of modalities, the proposed model integrates them by focusing on speciﬁc parts of the body, whose locations are known from the 3D skeleton data. The proposed model combines both Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) deep learning architectures into a hybrid one. The model is called MCL, for (M)ulti-Modal (C)NN + (L)STM. MCL consists of two sub-models: CL1D and CL2D that simultaneously extract the spatial and temporal patterns for the two sought input modality types. Their decisions are combined to achieve better accuracy. In order to show the efﬁciency of the MCL model, its performance is evaluated on the large NTU-RGB+D dataset in two different evaluation scenarios: cross-subject and cross-view. The obtained recognition rates, 74.2 % in cross-subject and 81.

Session

Posters

Files

Paper (PDF)

Supplementary (PDF)

DOI

10.5244/C.31.84
https://dx.doi.org/10.5244/C.31.84

Citation

Hany El-Ghaish, Mohamed Hussein and Amin Shoukry. Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 84.1-84.13. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_84,
                title={Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model},
                author={Hany El-Ghaish, Mohamed Hussein and Amin Shoukry},
                year={2017},
                month={September},
                pages={84.1-84.13},
                articleno={84},
                numpages={13},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.84},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.84}
            }