To be or not to be

September 6, 2008

Keyword Extraction for Contextual Advertisement – WWW 2008

Filed under: Information Extraction, Research — Tags: , , — tdas @ 7:51 pm

eBay Research Labs presented this interesting poster paper at the WWW 2008 conference, proposing a machine learning approach to extract keywords from the Web for contextual advertising. To automatically present the users with relevant and meaningful ads on web pages, a system has to extract the most representative keywords. This is a classic information extraction type problem, where top k keywords have to selected from a Web page based on some criterion. The papers also deals with the problem of keyword ambiguity and intent. The authors used typical content based features( TF score, Title, Phrase length, Capitalization, Position of keywords, etc.) together with eBay’s query log and category hierarchy to train a classifier. To address the problem of keyword ambiguity, the authors used the category information associated with each extracted keyword to understand the intent of the keyword in the scope of a particualr web page.

Comments :
+ Very simple and interesting paper talking about automatic keyword extraction techniques.
- Did not talk about phrases and how to deal with them.

July 23, 2008

Automatic HomePage Finding

For my Machine Learning course , I have to develop an intelligent system that can automatically identify Official Websites for search queries. For people interested in the details, you can read more about it here Project Proposal. I have to submit the completed project by August 5th, and I haven’t started implementing it. So, in any case I have decided to maintain a journal of the next few days leading to the completion of the project. I will jot down all the ideas, updates and everything else on this blog. Without further adieu, lets get the ball rolling …

Update : I successfully completed my project and handed it, on time :) . For the mathematically inclined, I achieved a ten-fold cross validation accuracy of 80.48%. For details, read the complete report.Automatic HomePage Identification

February 11, 2008

Information Extraction 101

Filed under: Information Extraction — Tags: , — tdas @ 2:48 am

Lately I have been fascinated with the topic of Information Extraction(IE) and its application towards the web data. More specifically, I am interested in extracting meaningful and structured information from unstructured text data(i.e. Web Pages) and the definition of IE seems to fit the bill. I found this lecture by Kamal Nigam about various Information Extraction techniques. For people, interested in IE this should be a great starting point.

Text Information Extraction – Kamal Nigam

Blog at WordPress.com.