Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos
Trung-Nghia Le and Akihiro Sugimoto
Abstract
This paper presents a novel end-to-end 3D fully convolutional network for salient
object detection in videos. The proposed network uses 3D filters in the spatiotemporal
domain to directly learn both spatial and temporal information to have 3D deep features,
and transfers the 3D deep features to pixel-level saliency prediction, outputting saliency
voxels. In our network, we combine the refinement at each layer and deep supervision
to efficiently and accurately detect salient object boundaries. The refinement module
recurrently enhances to learn contextual information into the feature map. Applying
deeply-supervised learning to hidden layers, on the other hand, improves details of the
intermediate saliency voxel, and thus the saliency voxel is refined progressively to become finer and finer. Intensive experiments using publicly available benchmark datasets
confirm that our network outperforms state-of-the-art methods.
Session
Posters
Files
Paper (PDF)
DOI
10.5244/C.31.38
https://dx.doi.org/10.5244/C.31.38
Citation
Trung-Nghia Le and Akihiro Sugimoto. Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 38.1-38.13. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_38,
title={Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos},
author={Trung-Nghia Le and Akihiro Sugimoto},
year={2017},
month={September},
pages={38.1-38.13},
articleno={38},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.38},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.38}
}