Thursday, October 15, 2009

Pattern Recognition in Speech and Natural Language Processing - Chapter 8

I've been reading Pattern Recognition in Speech and Natural Language Processing by Chou and Jang. I have to admit I wasn't really enjoying it much and skimmed much of the first 7 chapters. This wasn't so much that the book is bad, as the first 7 chapters are more about speech processing and my primary interest is in language processing.

That all changed when I got to chapter 8 and they started addressing HMM. I found the "Topic Classification" section rather interesting but was most interested in the section entitled "Unsupervised Topic Detection". In this part they discuss using inverse document frequency and regular word frequency to create a measurement of how likely a term is an important concept in the document. They then use this to create possible topics for the document. When they compared their results to human judges they found a very high (90%) relevence of the topics they chose this way. Pretty impressive as far as I am concerned.

I'm always interested in unsupervided techniques.