BMVA 
The British Machine Vision Association and Society for Pattern Recognition 

BibTeX entry

@PHDTHESIS{201104Lubor_Ladicky,
  AUTHOR={Lubor Ladicky},
  TITLE={Global Structured Models towards Scene Understanding},
  SCHOOL={Oxford Brookes University},
  MONTH=Apr,
  YEAR=2011,
  URL={http://www.bmva.org/theses/2011/2011-ladicky.pdf},
}

Abstract

Many scene understanding tasks are formulated as a labelling problem that tries to assign a label to each pixel of an image. These discrete labels may vary depending on the task, for example they may correspond to different object classes such as car, grass or sky, or to depths or to intensity after denoising. These labelling problems are typically formulated as a pairwise Markov or Conditional Random Field, modelling the dependencies of labels of pairs of variables in the local neighbourhoods. However, these pairwise models are very restricted in their expressivity. They can not model rich natural statistics and induce desired complex structures in the output labelling. In this thesis we propose global structured formulations beyond pairwise models, showing that they are very useful in computer vision, furthermore that they can still be learnt and optimised efficiently. First we propose a model, which generalises existing approaches for semantic object class segmentation, formulated in terms of pixels, segments or groups of segments. The proposed method efficiently integrates the strengths of these different approaches, capturing discriminative information across different scales. Next we show how the standard approaches for the semantic object class segmentation problem can be improved by the inclusion of costs based on high level statistics, including object class co-occurrence, which capture knowledge of scene semantics, for example that motorbikes and cows are unlikely to occur together in an image. Then we propose a novel latent random field support vector machine for object detection with a convex MRF regularization and suggest a way to include this information in the object class segmentation formulation. Finally we propose a model that jointly estimates labellings of multiple domains over a product space of labels. We demonstrate the usefulness of this model on the problem of joint object class semantic segmentation and dense 3D stereo reconstruction and show that this approach significantly outperforms existing methods. We show that all proposed models can be optimised efficiently using powerful graph cut based move making algorithms.