We consider hypothesis testing using training data in the large alphabet regime (motivated by classification of natural language data such as emails). The key feature of this scenario is that the length of the observed string is of the same order as the size of the underlying alphabet (words of the natural language). The traditional ML estimate is widely off the mark (many elements are unseen in the training data). Universal methods are also inapplicable since the data is not universally compressible (redundancy of compression in this regime is nonzero).
Pramod Viswanath received the PhD degree in EECS from the University of California at Berkeley in 2000. He was a member of technical staff at Flarion Technologies until August 2001 before joining the ECE department at the University of Illinois, Urbana-Champaign. He is a recipient of the Eliahu Jury Award from the EECS department of UC Berkeley (2000), the Bernard Friedman Award from the Mathematics department of UC Berkeley (2000), and the NSF CAREER Award (2003). He is an associate editor of the IEEE Transactions on Information Theory for the period 2006-2008.
|Presented On:||April 12th, 2007|
|Video:||Click here to see the video|