Abstract Compact representations are about taking a complicated data set and representing its elements using a very small number of bits, such that interesting things can be estimated from this highly compressed representation. Such tools are a central building block in the algorithmic toolkit for analysis of very large data sets, leading to orders of magnitude savings in storage space and running time. In this talk, we present simple methods to construct compact representations for estimating similarity. These schemes are simple and elegant enough to teach in an undergraduate class, yet powerful enough that they are actually used in practice in massive data sets, e.g. in eliminating near duplicates in search engines. The principles behind these constructions have important applications in other areas of theoretical computer science and I will sketch some of these connections.

Biography Moses Charikar is a Professor of Computer Science at Princeton University. He received his undergraduate degree from the Indian Institute of Technology, Bombay in 1995 and his Ph.D. from Stanford University in 2000. He joined Princeton in 2001 after spending a year at Google Research. He is a winner of the best paper award at FOCS 2003, as well as a Sloan Fellow. He is broadly interested in the design and analysis of algorithms, with an emphasis on approximation algorithms for NP-hard problems, embeddings of metric spaces and algorithmic techniques for massive data sets.

This Event is For: Graduate • Undergraduate • Faculty • Post-Docs