Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

Yang Zhou, Kai Yu, Biao Leng, zhang Zhang, Dangwei Li and Kaiqi Huang

Abstract

Most existing methods for pedestrian attribute recognition in video surveillance can be formulated as a multi-label image classiﬁcation methodology, while attribute localization is usually disregarded due to the low image qualities and large variations of camera viewpoints and human poses. In this paper, we propose a weakly-supervised learning based approaching to implementing multi-attribute classiﬁcation and localization simultaneously, without the need of bounding box annotations of attributes. Firstly, a set of mid-level attribute features are discovered by a multi-scale attribute-aware module receiving the outputs of multiple inception layers in a deep Convolution Neural Network (CNN) e.g., GoogLeNet, where a Flexible Spatial Pyramid Pooling (FSPP) operation is performed to acquire the activation maps of attribute features. Subsequently, attribute labels are predicted through a fully-connected layer which performs the regression between the response magnitudes in activation maps and the image-level attribute annotations. Finally, the locations of pedestrian attributes can be inferred by fusing the multiple activation maps, where the fusion weights are estimated as the correlation strengths between attributes and relevant mid-level features. To validate the proposed approach, extensive experiments are performed on the two currently largest pedestrian attribute datasets, i.e. the PETA dataset [4] and the RAP dataset [10]. In comparison with other state-of-theart methods, competitive performance on attribute classification can be achieved. The additional capability of attribute localization is also evaluated.

Session

Posters

Files

Paper (PDF)

Supplementary (PDF)

DOI

10.5244/C.31.69
https://dx.doi.org/10.5244/C.31.69

Citation

Yang Zhou, Kai Yu, Biao Leng, zhang Zhang, Dangwei Li and Kaiqi Huang. Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 69.1-69.12. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_69,
                title={Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization},
                author={Yang Zhou, Kai Yu, Biao Leng, zhang Zhang, Dangwei Li and Kaiqi Huang},
                year={2017},
                month={September},
                pages={69.1-69.12},
                articleno={69},
                numpages={12},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.69},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.69}
            }