Convolutional aggregation of local evidence for large pose face alignment

Adrian Bulat and Yorgos Tzimiropoulos

Abstract

Methods for unconstrained face alignment must satisfy two requirements: they must not rely on accurate initialisation/face detection and they should perform equally well for the whole spectrum of facial poses. To the best of our knowledge, there are no methods meeting these requirements to satisfactory extent, and in this paper, we propose Convolutional Aggregation of Local Evidence (CALE), a Convolutional Neural Network (CNN) architecture particularly designed for addressing both of them. In particular, to remove the requirement for accurate face detection, our system firstly performs facial part detection, providing confidence scores for the location of each of the facial landmarks (local evidence). Next, these score maps along with early CNN features are aggregated by our system through joint regression in order to refine the landmarks' location. Besides playing the role of a graphical model, CNN regression is a key feature of our system, guiding the network to rely on context for predicting the location of occluded landmarks, typically encountered in very large poses. The whole system is trained end-to-end with intermediate supervision. When applied to AFLW-PIFA, the most challenging human face alignment test set to date, our method provides more than 50% gain in localisation accuracy when compared to other recently published methods for large pose face alignment. Going beyond human faces, we also demonstrate that CALE is effective in dealing with very large changes in shape and appearance, typically encountered in animal faces.

Session

Posters 2

Files

Extended Abstract (PDF, 180K)

Paper (PDF, 9M)

DOI

10.5244/C.30.86
https://dx.doi.org/10.5244/C.30.86

Citation

Adrian Bulat and Yorgos Tzimiropoulos. Convolutional aggregation of local evidence for large pose face alignment. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 86.1-86.12. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_86,
        	title={Convolutional aggregation of local evidence for large pose face alignment},
        	author={Adrian Bulat and Yorgos Tzimiropoulos},
        	year={2016},
        	month={September},
        	pages={86.1-86.12},
        	articleno={86},
        	numpages={12},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.86},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.86}
        }