Ph.D. Dissertation Defense: Ming Du
Wednesday, August 6, 2014
10:30 a.m. AVW 2328
For More Information:
301 405 3681 firstname.lastname@example.org
ANNOUNCEMENT: Ph.D. Dissertation Defense
Name: Ming Du
Professor Rama Chellappa, Chair
Professor K. J. Ray Liu Professor Min Wu Professor Larry Davis Professor David Jacobs, Dean's Representative
Date/Time: Wednesday, August 6, 2014 at 10:30 a.m.
Place: AVW 2328
Title: RECOGNITION OF FACES FROM SINGLE AND MULTI-VIEW VIDEOS
Face recognition has been an active research field for decades. In recent years, with videos playing an increasingly important role in our everyday life, video-based face recognition has begun to attract considerable research interest. This leads to a wide range of potential application areas, including TV/movies search and parsing, video surveillance, access control etc. Preliminary research results in this field have suggested that by exploiting the abundant spatial-temporal information contained in videos, we can greatly improve the accuracy and robustness of a visual recognition system. On the other hand, as this research area is still in its infancy, developing an end-to-end face processing pipeline that can robustly detect, track and recognize faces remains a challenging task. The goal of this dissertation is to study some of the related problems under different settings.
We first investigate the face association problem, in which one attempts to extract face tracks of multiple subjects while maintaining label consistency at the same time. It lays the foundation for subsequent recognition stage. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or intense camera motions are present. We argue that contextual features, in addition to the face appearance itself, should play an important role in this case. We propose principled methods to combine multiple features together. More specifically, we rely on Conditional Random Field and Max-Margin Markov networks to infer labels for detected faces using different sources of evidence. Different from many existing approaches, our algorithms work in an online mode and hence have a wider range of applications. We address issues related to the proposed framework, such as parameter learning, inference and handling false positives and negatives.
We next propose a novel video-based face recognition framework. We address the problem from two different aspects: To handle pose variations, we learn a Structural-SVM based detector which can simultaneously localize face fiducial points and estimate face pose. By adopting a different optimization criterion from existing algorithms, we are able to improve localization accuracy. To model face variations of other kinds, we use intra-personal/extra-personal dictionaries. The intra-personal/extra-personal modeling of human faces have been shown to work successfully in the Bayesian face recognition framework. It has additional advantages in scalability and generalization, which are of critical importance to real-world applications. Combining intra-personal/extra-personal models with dictionary learning enables us to achieve state-of-arts performance on unconstrained video data, even when the training data come from a different database.
Finally, we present an approach for video-based face recognition in camera networks. The focus on handling pose variations by applying the strength of the multi-view camera network. However, rather than taking the typical approach of modeling these variations, which eventually requires explicit knowledge about pose parameters, we rely on a pose-robust feature that eliminates the need for pose estimation. The pose-robust feature is developed using the Spherical Harmonic (SH) representation theory. It is extracted using the surface texture map of a spherical model which approximates the subject's head. Feature vectors extracted from a video are modeled as an ensemble of instances of a probability distribution in the Reduced Kernel Hilbert Space (RKHS). The ensemble similarity measure in RKHS improves both robustness and accuracy of the recognition system. The proposed approach outperforms traditional algorithms on a multi-view video database collected using a camera network.