Ph.D. Dissertation Defense: Chris Reale

Thursday, August 3, 2017
1:30 a.m.
Room 2460, AVW Bldg.
Maria Hoo
301 405 3681
mch@umd.edu

ANNOUNCEMENT: Ph.D. Dissertation Defense

Name: Chris Reale

Committee members:

Professor Rama Chellappa, Chair

Professor Behtash Babadi

Professor Joseph JaJa

Professor Nasser Nasrabadi

Professor Larry Davis

Date/Time: Thursday, August 3, 2017 at 1:30PM to 3:30PM

Place: A.V. Williams Room 2460

Title: Multimodal Approaches to Computer Vision Problems

Abstract:

The goal of computer vision research is to automatically extract high-level information from images and videos. The vast majority of this research focuses specifically on visible light imagery. In this dissertation, we present approaches to computer vision problems that incorporate data obtained from alternative modalities including thermal infrared imagery, near-infrared imagery, and text. We consider approaches where other modalities are used in place of visible imagery as well as approaches that use other modalities to improve the performance of traditional computer vision algorithms. The bulk of this dissertation focuses on Heterogeneous Face Recognition (HFR). HFR is a variant of face recognition where the probe and gallery face images are obtained with different sensing modalities. We also present a method to incorporate text information into human activity recognition algorithms.

We first present a kernel task-driven coupled dictionary model to represent the data across multiple domains for thermal infrared HFR. We extend a linear coupled dictionary model to use the kernel method to process the signals in a high dimensional space; this effectively enables the dictionaries to represent the data non-linearly in the original feature space. We further improve the model by making the dictionaries task-driven. This allows us to tune the dictionaries to perform well on the classification task at hand rather than standard reconstruction task. We show that our algorithms outperform standard coupled dictionaries on three datasets for thermal infrared to visible face recognition.

Next, we present a deep learning based approach to near-infrared (NIR) HFR. Most approaches to HFR involve modeling the relationship between corresponding images from the visible and sensing domains. Due to data constraints, this is typically done at the patch level and/or with shallow models to prevent overfitting. In this approach, rather than modeling local patches or using a simple model, we use a complex, deep model to learn the relationship between the entirety of cross-modal face images. We describe a deep convolutional neural network based method that leverages a large visible image face dataset to prevent overfitting. We present experimental results on two benchmark datasets showing its effectiveness.

Third, we present a model order selection algorithms for deep neural networks. In recent years, deep learning has emerged as a dominant methodology in machine learning. While it has been shown to produce state-of-the-art results for a variety of applications, one aspect of deep networks that has not been extensively researched is how to determine the optimal network structure. This problem is generally solved by ad hoc methods. In this work we address a subproblem of this task: determining the breadth (number of nodes) of each layer. We show how to use group-sparsity-inducing regularization to automatically select these hyperparameters. We demonstrate our method by using it to reduce the size of networks while maintaining performance for our NIR HFR deep-learning algorithm. Additionally, we demonstrate the generality of our algorithm by applying it to image classification tasks.

Finally, we present a method to improve activity recognition algorithms through the use of multitask learning and information extracted from a large text corpora. Current state-of-the-art deep learning approaches are limited by the size and scope of the data set they use to train the networks. We present a multitask learning approach to expand the training data set. Specifically, we train the neural networks to recognize objects in addition to activities. This allows us to expand our training set with large, publicly available object recognition data sets and thus use.

Audience: Graduate Faculty

Browse All Events

April 2024

SU	MO	TU	WE	TH	FR	SA
31	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	1	2	3	4

Submit an Event