Thursday, February 3, 2011

BioInformatics lecture and other stuff I'm doing

Saw a very interesting article on online education today: http://www.bbc.co.uk/news/business-11735404
I think that online education is the way of the future. The article mentions how the textbook industry is struggling with this trend toward online and pdf version. I think this is a good thing. In fact, I expect a lot of text books to go open source. For a long time I've been hearing that people claim you don't make money by writing a book anymore. You do it for the reputation for the joy of writing the book. I use this website frequently for textbooks http://www.freebookcentre.net/ With sites like this I don't know why we aren't too far from having all textbooks open source.

I've been listening to this online lecture: http://biochem218.stanford.edu/ I've been studying Machine Learning and Natural Language Processing for the past couple of years so its good to see another field where machine learning is ued. I'm only on the second lecture so far. The first one was REAL good on how many techniques are used in the medical field. The most interesting thing that I saw was how they use this thing called "Multiple Sequence Alignment". Seems very similar to semantic processing in natural language. They are looking for where else in a series a pattern they have found also matches. The purpose seems to be that if you find a binding site on DNA for a particular compound, you want to be able to search the string for other binding sites. These are other sites that have the same sequence patterns. In Natural Language, this could be useful when you find a syntactical pattern, you might want to find other similar patterns to aid interpretation. Lecture two is mostly how to do searches on various Literature searches.

I just finished "The World Jones Made" by Philip K. Dick last night. http://books.google.com/books?id=McAgAQAAIAAJ It was pretty good though I had to laugh at the way that he portrayed the surface of Venus as being more hospitable to life than Mars. I'm making a strong effort to read more fiction this year.

Monday I finished work on my implementation of a Multi Layer Neural Network in C#. I had gotten the Neural Network going last week with just one hidden level. I am testing the network by testing XOR. James Mathews was nice enough at this site: http://www.generation5.org/content/2001/xornet.asp to publish some trial weights and how they change after the first iteration so I have something to test against that I am sure doesn't have any mistakes. Over the weekend I fixed a bug that was caused by an error in my understanding. I wasn't aware that the bias of each node actually had a weight associated with it. This had the effect of the 1 XOR 1 test never really improved during my training. I then extended the code to have multiple hidden layers. Of course I broke the system again. Monday I realized that the problem was caused by a stupid coding error. When I updated the weights on the bias, I replaced a += with a = so the weights were being replaced instead of updated. Now that I have the Neural Network functioning I'm starting to manipulate the data I want to test to get it into a form I can play with.

I'm reading Pattern Recognition in Speech and Language Processing right now http://books.google.com/books?id=M1OgYGlpJn8C (yea I know its not open source like I mentioned above. I'm not sure if this was good book to pick to read now because it has a lot more speech stuff than I am usually interested in. However, until I get deep into the book I won't know if it will catch my interest and at the very least it can reinforce much of the stuff I already know. I'm only about 25 pages in so far so it is too early to tell much.

I saw a link on http://metaoptimize.com/qa today to a paper I want to read: "Unsupervised Semantic Parsing" http://alchemy.cs.washington.edu/papers/pdfs/poon-domingos09.pdf Hopefully I'll get some time to read it in the next few days. Also saw a link for "Topical Semantics of Twitter Links" http://rose.cs.ucla.edu/~cho/papers/WSDM11.pdf

No comments: