Information and Coding Theory Seminar  

Abstracts of talks

Fall 2005

2/2 Grigory Kabatiansky (IPPI, Moscow), Collusion secure digital fingerprinting: the case of two pirates revisited

Abstract: We consider the problem of digital fingerprinting codes in the simplest case of two "pirates". We calculate the rate achievable via the family of random linear codes and discuss upper bounds in order to estimate the capacity of the corresponding channel. A related problem of so-called tracing traitors codes will be also discussed.

2/9 Adrian Papamarcou (UMD), Writing on Dirty Paper: A Geometrical Approach

Abstract: It was shown by Costa (1981) that the capacity of the power- limited AWGN channel is not reduced by the presence of additive Gaussian interference known to the transmitter but not the receiver. The proof of that result was based on a previously known single-letter characterization of the capacity (due to Gel'fand and Pinsker) which involved an auxiliary variable; the iid realization of that variable was also instrumental in the random coding argument of the proof. In this talk, we briefly review the role played by n- dimensional hyperspheres in the representation and visualization of codes for the AWGN, and proceed to develop and interpret the Costa result using a geometrical approach that does not rely on auxiliary variables or distributions.

2/16 Jon Feldman (Columbia U.), Linear programming decoding

Abstract: In this talk we introduce the method of LP decoding, and show how the error rate of an LP decoder can be characterized as the mass of a certain "normal cone" under a probability distribution defined by the channel. We then apply LP decoding to low-density parity-check (LDPC) codes, and study the relationship between "polytope vertices" and "graph covers," (Koetter and Vontobel) unified under the notion of a "pseudocodeword." We then use a "dual witness" to prove a lower bound on the number of errors that can be corrected by an LP decoder, as long as the graph expands sufficiently.

(Joint work with David Karger, Tal Malkin, Rocco Servedio, Cliff Stein, and Martin Wainwright)

3/2 Olgica Milenkovic (U. Colorado, Boulder), Constrained and Error-Control Coding for DNA Computers

Abstract: The last century was marked by the birth of two outstanding scientific and engineering disciplines: silicon based computing and the theory and technology of genetic data analysis. The research field very likely to dominate the area of scientific computing in the foreseeable future is the merger of these two disciplines, leading to unprecedented possibilities for applications in varied areas of biology and engineering. The first steps toward this goal were made in 1994, when Leonard Adleman solved a quite unremarkable computational problem with an outstanding method. The computational problem considered by Adleman was a simple instant of the directed travelling salesmen problem. The technique used for solving the problem was a new technological paradigm, termed DNA computing. Adleman's experiment represents a landmark demonstration of data processing and communication on the level of biological molecules.
    Following Adleman's initial work, many other DNA computing schemes were developed, including solvers for the 3-SAT problem, DNA systems for analyzing chess games, DNA-based logical components, as well as an intelligent, interactive DNA "game machine", called MAYA. Other applications of DNA techniques include DNA-based storage, DNA biosensing, molecular signal processing and intra-cell disease diagnostics and treatment.
    One of the major obstacles to efficient DNA computing, storage and signal processing is the very low reliability of DNA hybridization operations. DNA computing experiments require the creation of a controlled environment that allows for a set of DNA oligonucleotide strands (codewords) to hybridize with their complements in an appropriate fashion. If the DNA codewords are not carefully chosen, unwanted, or non-selective, hybridization may occur. Even more detrimental is the fact that an oligonucleotide sequence may fold back onto itself, forming a secondary structure which prevents the sequence from participating in the computational process.
    In this talk, we describe some new DNA code design methods which meet the hybridization-constraint requirements. Several classes of such codes are cyclic, which is a particularly useful property in light of the fact that the presence of a cyclic structure reduces the complexity of testing DNA codes for secondary structure formation. Based on some simple properties of a dynamic-programming algorithm, known as Nussinov's method, which is an effective predictor of secondary structure given the sequence of bases in a DNA codeword, we identify some code design criteria that reduce the possibility of secondary structure formation in a codeword. These design criteria can be formulated in terms of the requirement that the Watson-Crick distance between a DNA codeword and a number of its shifts be larger than a given threshold or that the DNA codewords do not contain long subsequences and their reverse complements. In this talk we address both the issue of enumerating DNA sequences with such properties and the problem of practical DNA code construction under such constraints.
Joint work with Navin Kashyap from Queen’s University, Kingston, Canada.

3/9 Damianos Karakos (Johns Hopkins), An Introduction to Decision Trees

Classification (decision) trees are usually built in a supervised manner, where a set of observation-label pairs {(X,Y)} is provided. For each input observation X, a series of decisions is taken, as to which branch of the tree X will follow at each internal node. Once X reaches a leaf, a leaf-specific "label" hat{Y} is chosen as the output. The tree is designed to minimize (at least approximately) the expected loss between Y and hat{Y}. In this talk, we will present the fundamentals of decision trees, and we will describe various locally optimal practical procedures for growing them. We will also see some problems that have been successfully tackled using decision trees.

4/1 Tom Richardson (Flarion Technologies), Design and Modeling Issues for Mobile Wireless Data: An introduction to Flash OFDM
(CSHCN Colloquium)

Abstract: Recent experience in a deployment of Flash OFDM in the Raleigh area by Nextel has shown that Flash OFDM offers the best technological solution for mobile wireless data. The transition from wireless voice to wireless data necessitates a reconsideration of both system networking design and the wireless channel. This talk will introduce some of the key ideas that underpin and differentiate the Flash OFDM system.

4/6 Alexander Barg (UMD), Quantum Codes: An Introduction

Abstract: A tutorial on the basics of quantum algorithms and error correction.

4/13 Alexander Barg (UMD), Quantum Codes: An Introduction (continued)

4/20 Punarbasu Purkayastha (UMD), Weight distributions of LDPC codes

Abstract: This talk will mainly focus on how the asymptotic distance spectrum of regular LDPC codes can be estimated using saddle point approximation, and how these estimates can be used to provide bounds on the error rate under ML Decoding.

References: [1] David Burshtein, Gadi Miller, "Asymptotic Enumeration Methods for Analyzing LDPC Codes", IEEE Transactions on Information Theory, June 2004.
[2] Ohad Barak, David Burshtein, "Lower Bounds on the Spectrum and Error Rate of LDPC Code Ensembles"

4/27 Prakash Narayan (UMD), The Minimum Description Length (MDL) Principle

Abstract: Rissanen's Minimum Description Length (MDL) principle, which lies at the interface of information theory and statistics, states that the statistical information contained in data is best extracted in terms of a possibly short and invertible description (code) of the data. The statistical model (e.g., probability distribution) inferred from the data (as best fitting the data) is the one that leads to the shortest description, taking into account that the inferred model (e.g., probability distribution) itself must also be described. The entrails of the MDL principle will be joyously examined.

[1] I. Csiszar and P.C. Shields, Notes on Information Theory and Statistics, available in Foundations and Trends in Communications and Information Theory, S. Verdu, Ed., 2004. (The relevant material appears in Section 7 - pdf)
[2] A. Barron, J. Rissanen and B. Yu, "The Minimum Description Length Principle in Coding and Modeling",IEEE Trans IT, vol. 44, no. 6, pp. 2743-2760, Oct. 1998.

5/4 Martin Wainwright (UC Berkeley), Tree-reweighted message-passing algorithms in graphical models: Some theory and applications (CSPL seminar)

Abstract: Graphical models (e.g., factor graphs, Markov random fields) provide a flexible framework for capturing statistical dependencies among large collections of random variables, and are widely used in various fields, including statistical signal processing, coding and communication theory, and machine learning. Central to the wide-spread use of graphical models are distributed ``message-passing'' algorithms (e.g., sum-product, max-product), in which nodes perform local computations and communicate with their neighbors. The purpose of these algorithms is to solve various statistical inference problems (e.g., image denoising, error-control decoding) that arise in applying a graphical model. These methods are well-known to be exact for trees, but they are also widely applied to graphs with cycles, where they yield approximate answers.
    To motivate our work, we describe some potential pitfalls associated with applying the standard forms of message-passing to graphs with cycles (sum-product: multiple fixed points and instability; max-product: misleading outputs). We then describe a novel family of tree-reweighted algorithms, and present some theoretical guarantees on their behavior for graphs with cycles. The tree-reweighted sum-product algorithm has a unique fixed point for any problem, and it can be understood as computing an optimal upper bound on the log partition function. On the other hand, the tree-reweighted max-product algorithm has the desirable property of never deceiving the user; this behavior can be understood in terms of a particular linear programming relaxation of the mode-finding problem. We conclude with an overview of some applications of these modified algorithms. Talk based on joint work with: Jon Feldman, Tommi Jaakkola, Alan Willsky.

5/12 Pierre Moulin (UIUC) (CSPL seminar, note the unusual day), Communication games with side information

In this talk we consider some communication problems with channel uncertainty and attempt to derive the worst noise distribution within a prescribed class of distributions. For point-to-point communications this is a classical problem, for which elegant solutions have been derived over the last forty years.
    Motivated by problems such as data hiding and watermarking, we have recently been interested in extending this work to communication problems with side information. Optimal strategies for the communicator and the adversary (jammer) may then be derived by selecting an appropriate cost function and solving minmax optimization problems with respect to the variables of the game. The inner maximization problem (over the noise distribution) is infinite-dimensional but concave. For detection problems the worst noise distributions are strongly non-Gaussian; in some cases exotic solutions are obtained. We also quantify the benefits of randomized strategies for the communicator. This methodology is illustrated by applications to scalar and hexagonal quantization codes for data hiding.