Saturday, February 19, 2011

Changing reading material

I started reading "Pattern Recognition in Speech and Language Processing" - Wu Chou, Biing-Hwang Juang http://books.google.com/books?id=M1OgYGlpJn8C a few weeks ago.  There were a couple of passages that struck a cord with me:
"Hidden Markov modeling is a powerful statistical framework for time varying quasi-stationary process..." I found this to be a VERY succinct statement of what Hidden Markov models are best for.
Here is another set of quotes from the same chapter that I found interesting:
"...regardless of the practical effectiveness of HMM...it should not be taken as the true distribution form ..."
"...HMM is not going to achieve the minimum error rate implied in the true Bayes MAP decision.""This motivates effort of searching for other alternative criteria in classification design...MMI (maximum mutual information) and MDI (minimum discriminative information)..."
Though I've heard this before I felt it was well stated here and important to remember.  Basically, I believe they are saying that HMM are effective in practice and this gives us the ILLUSION that it is the true distribution but in fact it is not.  HMMs are not going to achieve the minimum error rate of MAP even if they achieve a good estimate.  Again, this is something that is easy to forget when you use them regularly.
Another quote:
"..without knowledge of the form of the class posterior probabilities required in the classical Bayes decision theory, classifier design by distribution estimation often does not lead to optimal performance."
"This motivates effort of searching for other alternative criteria in classification design...MMI (maximum mutual information) and MDI (minimum discriminative information)..."I especially liked this because it reminded me that HMM is distribution "estimation" and it linked together, for me, the reasoning for exploring MMI and MDI.  I've often wondered these other criteria are used and this passage made it clear to me why they are explored.

I ended up putting down "Pattern Recognition in Speech and Language Processing".  When scanning through the pattern recognition book below, I found myself loosing interest int he Chou book and anxious to pick up the Bishop book.  So this week I started reading http://books.google.com/books?id=kTNoQgAACAAJ "Pattern Recognition and Machine Learning" by Bishop.  I am finding it easier for me to understand.  Mostly because the amount of new material that I haven't been exposed to isn't as dense.  I'm only about 1/2 way through the first chapter but the review is good for me.  I'm excited to get to the Neural Network parts because all my study of Neural Networks to date has been about building classifier networks.  I'm also interested in building a network that predicts and actual value.

I came across this paper this week as well: "An empirical comparison of supervised learning algorithms" by A Niculescu-Mizil, R. Caruanahttp://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf.  In it the compare the performance of 
- Boosted Trees
- Random forests
- Bagged Trees
Support Vector Machines
Neural Networks
k nearest neighbors
- Boosted Stumps
Decision Trees
Logistic Regression
Naive Bayes
I'm only familiar with the highlighted ones and am interested in looking into the others when I get a chance.  It was interesting that the paper said that Neural Networks seem to be the best choice for general purpose machine learning though many of the other techniques can perform better if you tune them to your problem.
I also stumbled across this article this week: "Natural Language Processing (almost) from Scratch" by Collobert et. al. http://leon.bottou.org/morefiles/nlp.pdf . I've scanned through it quickly and hope to dig into it further when I have more time.

Wednesday, February 9, 2011

Book I recently read

I just finished a book last night that was given to me by my boss this year:
"I married a Travel Junkie - by Samuel Jay Keyser I thought it was a great book. Quick read but had me laughing at several spots and had a good message in it. The message I got out is was that we are all attempting to run from the mundane that creeps into our lives in various ways. The author's wife chose travel as a way to experience the new. I've personally known people do this by meeting as many new people as they possibly can. I've do this intellectually by learning new subjects all the time or more accurately by learning new details about subjects that I am interested in. When I was younger I did this by moving from job to job.

Thursday, February 3, 2011

BioInformatics lecture and other stuff I'm doing

Saw a very interesting article on online education today: http://www.bbc.co.uk/news/business-11735404
I think that online education is the way of the future. The article mentions how the textbook industry is struggling with this trend toward online and pdf version. I think this is a good thing. In fact, I expect a lot of text books to go open source. For a long time I've been hearing that people claim you don't make money by writing a book anymore. You do it for the reputation for the joy of writing the book. I use this website frequently for textbooks http://www.freebookcentre.net/ With sites like this I don't know why we aren't too far from having all textbooks open source.

I've been listening to this online lecture: http://biochem218.stanford.edu/ I've been studying Machine Learning and Natural Language Processing for the past couple of years so its good to see another field where machine learning is ued. I'm only on the second lecture so far. The first one was REAL good on how many techniques are used in the medical field. The most interesting thing that I saw was how they use this thing called "Multiple Sequence Alignment". Seems very similar to semantic processing in natural language. They are looking for where else in a series a pattern they have found also matches. The purpose seems to be that if you find a binding site on DNA for a particular compound, you want to be able to search the string for other binding sites. These are other sites that have the same sequence patterns. In Natural Language, this could be useful when you find a syntactical pattern, you might want to find other similar patterns to aid interpretation. Lecture two is mostly how to do searches on various Literature searches.

I just finished "The World Jones Made" by Philip K. Dick last night. http://books.google.com/books?id=McAgAQAAIAAJ It was pretty good though I had to laugh at the way that he portrayed the surface of Venus as being more hospitable to life than Mars. I'm making a strong effort to read more fiction this year.

Monday I finished work on my implementation of a Multi Layer Neural Network in C#. I had gotten the Neural Network going last week with just one hidden level. I am testing the network by testing XOR. James Mathews was nice enough at this site: http://www.generation5.org/content/2001/xornet.asp to publish some trial weights and how they change after the first iteration so I have something to test against that I am sure doesn't have any mistakes. Over the weekend I fixed a bug that was caused by an error in my understanding. I wasn't aware that the bias of each node actually had a weight associated with it. This had the effect of the 1 XOR 1 test never really improved during my training. I then extended the code to have multiple hidden layers. Of course I broke the system again. Monday I realized that the problem was caused by a stupid coding error. When I updated the weights on the bias, I replaced a += with a = so the weights were being replaced instead of updated. Now that I have the Neural Network functioning I'm starting to manipulate the data I want to test to get it into a form I can play with.

I'm reading Pattern Recognition in Speech and Language Processing right now http://books.google.com/books?id=M1OgYGlpJn8C (yea I know its not open source like I mentioned above. I'm not sure if this was good book to pick to read now because it has a lot more speech stuff than I am usually interested in. However, until I get deep into the book I won't know if it will catch my interest and at the very least it can reinforce much of the stuff I already know. I'm only about 25 pages in so far so it is too early to tell much.

I saw a link on http://metaoptimize.com/qa today to a paper I want to read: "Unsupervised Semantic Parsing" http://alchemy.cs.washington.edu/papers/pdfs/poon-domingos09.pdf Hopefully I'll get some time to read it in the next few days. Also saw a link for "Topical Semantics of Twitter Links" http://rose.cs.ucla.edu/~cho/papers/WSDM11.pdf

Tuesday, November 16, 2010

Long Time.

Just noticed its been a LONG time since I blogged. I need to get in the habit of doing this regularly.

Over the last year I've been reading a number of books and watching a lot of online lectures. I recently completed Machine Learning by Tom Mitchell and am close to finishing Pattern Classification by Duda, Hart and Stork.

I also listened to Andrew Ng's lectures on Machine Learning. I realized that my linear algebra was a bit rusty so I also watched the Linear Algebra lectures at Khan Academy. I have also been working my way thru the WSDM2010 conference videos but I haven't found much of interest to me in these.



As usual, I have more books on my reading list than I can possibly read but I'll have to make another post to show the ones I'm planning on reading in the near future.

Sunday, November 1, 2009

Makrov Logic

I watched a video lecture on Markov Logic at http://videolectures.net/icml08_domingos_ipk/ yesterday and was very impressed by the technology. The idea is that you apply first order logic to various Natural Language Processing tasks. However, first order logic is too contraining because things can either be true or false. Specifically, if you get one piece of evidence that doesn't support your premis then it is considered false. Markov Logic solves this issue by applying weights to the first order formulas. Then, when a contracticting piece of evidence is encountered, instead of making that formula false, it just lowers the probability of it being true.

In the video, Pedor Domingos, explains how this can be used to resolve the two statements "Smoking causes cancer." and "Friends have similar smoking habits." Either of these might be true in the majority of cases but there might be a contradictory instance found for each. He also discusses Belief propagation and how this can be very slow on VERY large networks so he presents a Lifted Belief Propagation algorithm to reduce the network size being worked on. For learning he recommends a "Voted Perceptron" and gives and application example involving recognizing citations.

Domingos mentions how this could be also used in unsupervised co-reference resolution and Ontology induction. He includes a link to the "alchemy" system http://alchemy.cs.washington.edu/ they have developed which incorporates all of this logic.

The most obvious place where I think people would be interested in using this technology is in storing semantic information that is learned from text. My concern is that this might be too large of a network to handle even with the lifted belief prorogation but it certainly is an area worth research. My interest is how this could be used to learn and code grammar rules. I've been reading a bit about various HMM (http://acl.ldc.upenn.edu/acl2002/MAIN/pdfs/Main036.pdf) and Maximum Entropy Systems (http://acl.ldc.upenn.edu/A/A00/A00-2018.pdf) for parsing lately and what has struck me is that they mix statistics and automatic rule learning. I have always felt that the rule learning systems of the past were replaced with statistical models that, though they performed better, didn't recognize the advantages of the old rule based systems. The systems seem to encode the rules and learn patterns that override the basic statistics.

I think a technique like Markov logic could be used to take this to the next level. Typical Max Ent systems iterate over the learning text and select the rules that result in the highest probability of matching the test data. A Markov Logic system could recognize each rule or contradiction but wouldn't have to through out rules, it could just lower the probability of them being true. This seems like the best way to develop a grammar to me. Now if we can only figure out how to make it work on unsupervised data.

This is definitely an area I want to learn more about.

Thursday, October 15, 2009

Pattern Recognition in Speech and Natural Language Processing - Chapter 8

I've been reading Pattern Recognition in Speech and Natural Language Processing by Chou and Jang. I have to admit I wasn't really enjoying it much and skimmed much of the first 7 chapters. This wasn't so much that the book is bad, as the first 7 chapters are more about speech processing and my primary interest is in language processing.

That all changed when I got to chapter 8 and they started addressing HMM. I found the "Topic Classification" section rather interesting but was most interested in the section entitled "Unsupervised Topic Detection". In this part they discuss using inverse document frequency and regular word frequency to create a measurement of how likely a term is an important concept in the document. They then use this to create possible topics for the document. When they compared their results to human judges they found a very high (90%) relevence of the topics they chose this way. Pretty impressive as far as I am concerned.

I'm always interested in unsupervided techniques.

Wednesday, May 21, 2008

Response to comment...

Roger asked "How do you intend to verify whether you have understood a text?"

Good question!

The short answer is to look at the logical representation of the text in memory. It would be even better if I had an interface to be able to ASK questions of this logical representation. For instance, if the following sentence is read: "The man slowly climbed the stairs." Then we should be able to ask if the man climbed and the answer should be "yes". If we ask what the man climbed then the answer should be "the stairs". Additionally, I would like to see the system learn that men can climb stairs and that this can be done at varying speeds (or at least slowly).

Of course I have to learn to walk before I can fly...so there are more unanswered questions than answered ones right now. How will I query those facts? How will I represent the fact internally? How will I interpret the text into this internal representation? How will I parse the sentences before interpreting them? How will I recognize parts of speech? How will I learn those rules for recognition of parts of speech?

This is where I think I am now. I found with the Porter Stemmer how to recognize some words as verbs. Next I have to figure out recognize other parts of speech and what to do with the words I DON'T recognize. Maybe if it can get the list of words it doesn't understand down to a manageable level, it could ask a user for some information. Hopefully, by asking a user a few careful questions it will be able to learn rules that will allow it to categorize large quantities of unknown words. One thing I DON'T want to do is use some sort of preexisting knowledge of what parts of speech words are. I also don't want to train my application on test data and then have it only have that level of understanding. I would prefer to be able to make an application that, programmed with a core set of rules, would be able to build up its own dictionary and continuously refine its understanding with every bit of text it encounters.