Information Theory and Probability Estimation:
From Shannon to Shakespeare via Laplace, Good, Turing, Hardy, Ramanujam, and Fisher

Speaker: Dr. Alon Orlitsky, University of California, San Diego

Standard information-theoretic results show that data over small, typically binary, alphabets can be compressed to Shannon's entropy limit. Yet most practical sources, such as text, audio, or video, have essentially infinite support. Compressing such sources requires estimating probabilities of unlikely, even unseen, events, a problem considered by Laplace. Of existing estimators, an ingenious if cryptic one derived by Good and Turing while deciphering the Enigma code works best yet not optimally. Hardy and Ramanujan's celebrated results on the number of integer partitions yield an asymptotically optimal estimator that compresses arbitrary-alphabet data patterns to their entropy. The same approach generalizes Fisher's seminal work estimating the number of butterfly species and its extension authenticating a poem purportedly written by The Bard. The talk covers these topics and is self contained.


Alon Orlitsky received B.Sc. degrees in Mathematics and Electrical Engineering from Ben Gurion University in 1980 and 1981, and M.Sc. and Ph.D. degrees in Electrical Engineering from Stanford University in 1982 and 1986.
From 1986 to 1996 he was with the Communications Analysis Research Department of Bell Laboratories. He spent the following year as a quantitative analyst at D.E. Shaw and Company, an investment firm in New York City. In 1997 he joined the University of California, San Diego, where he is currently a professor of Electrical and Computer Engineering and of Computer Science and Engineering.
Alon's research concerns information theory, learning, and speech recognition. He is a recipient of the 1981 ITT International Fellowship and of the 1992 IEEE W.R.G. Baker Paper Award, and a co-recipient of the 2006 IEEE Information Theory Paper Award.

Presented On: Nov 3rd, 2006
Video: QuickTime Streaming Video