Action and Attention in First-person Vision

Kristen Grauman

Abstract

A traditional third-person camera passively watches the world, typically from a stationary position. In contrast, a first-person (wearable) camera is inherently linked to the ongoing experiences of its wearer. It encounters the visual world in the context of the wearer’s physical activity, behavior, and goals. This distinction has many intriguing implications for computer vision research, in topics ranging from fundamental visual recognition problems to high-level multimedia applications. Prof. Grauman will present their recent work in this space, driven by the notion that the camera wearer is an active participant in the visual observations received. First, she will show how to exploit egomotion when learning image representations. Cognitive science tells us that proper development of visual perception requires internalizing the link between “how I move” and “what I see”—yet today’s best recognition methods are deprived of this link, learning solely from bags of images downloaded from the Web. Prof. Grauman introduces a deep feature learning approach that embeds information not only from the video stream the observer sees, but also the motor actions he simultaneously makes. She will demonstrate the impact for recognition, including a scenario where features learned from ego-video on an autonomous car substantially improve large-scale scene recognition. Next, she will present their work exploring video summarization from the first person perspective. Leveraging cues about ego-attention and interactions to infer a storyline, the work automatically detects the highlights in long videos. Prof. Grumman will show how hours of wearable camera data can be distilled to a succinct visual storyboard that is understandable in just moments, and examine the possibility of person- and scene-independent cues for heightened attention. Overall, whether considering action or attention, the first-person setting offers exciting new opportunities for large-scale visual learning.

Session

Keynote 2

Files

Paper (PDF, 161K)

DOI

10.5244/C.29.91
https://dx.doi.org/10.5244/C.29.91

Citation

Kristen Grauman. Action and Attention in First-person Vision. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 91.1-91.1. BMVA Press, September 2015.

Bibtex

@inproceedings{BMVC2015_91,
	title={Action and Attention in First-person Vision},
	author={Kristen Grauman},
	year={2015},
	month={September},
	pages={91.1-91.1},
	articleno={91},
	numpages={1},
	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
	publisher={BMVA Press},
	editor={Xianghua Xie, Mark W. Jones, and Gary K. L. Tam},
	doi={10.5244/C.29.91},
	isbn={1-901725-53-7},
	url={https://dx.doi.org/10.5244/C.29.91}
}