AST-Net: An Attribute-based Siamese Temporal Network for Real-Time Emotion Recognition

Shu-Hui Wang and Chiou-Ting Hsu

Abstract

Predicting continuous facial emotions is essential to many applications in human-computer interaction. In this paper, we focus on predicting the two dimensional emotions: valence and arousal, to interpret the dynamically yet subtly changed facial emotions. We propose an Attribute-based Siamese Temporal Network (AST-Net), which includes a discrete emotion CNN model and a Stacked-LSTM, to incorporate both the spatial facial attributes and the long-term dynamics into the prediction. The discrete emotion CNN model aims to extract attribute-related but pose- and identity-invariant features; and the Stacked-LSTM is used to characterize the dynamic dependency along the temporal domain. Furthermore, in order to stabilize the training procedure and also to derive a smoother and reliable long-term prediction, we propose to jointly learn the model from two temporally-shifted videos under the Siamese network architecture. Experimental results on AVEC2012 dataset show that the proposed AST-Net not only processes in real time (40.1 frames per second) but also achieves the state-of-the-art performance even when using the vision modality alone.

Session

Posters

Files

PDF iconPaper (PDF)
PDF iconSupplementary (PDF)

DOI

10.5244/C.31.70
https://dx.doi.org/10.5244/C.31.70

Citation

Shu-Hui Wang and Chiou-Ting Hsu. AST-Net: An Attribute-based Siamese Temporal Network for Real-Time Emotion Recognition. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 70.1-70.13. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_70,
                title={AST-Net: An Attribute-based Siamese Temporal Network for Real-Time Emotion Recognition},
                author={Shu-Hui Wang and Chiou-Ting Hsu},
                year={2017},
                month={September},
                pages={70.1-70.13},
                articleno={70},
                numpages={13},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.70},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.70}
            }