eBay Research Labs presented this interesting poster paper at the WWW 2008 conference, proposing a machine learning approach to extract keywords from the Web for contextual advertising. To automatically present the users with relevant and meaningful ads on web pages, a system has to extract the most representative keywords. This is a classic information extraction type problem, where top k keywords have to selected from a Web page based on some criterion. The papers also deals with the problem of keyword ambiguity and intent. The authors used typical content based features( TF score, Title, Phrase length, Capitalization, Position of keywords, etc.) together with eBay’s query log and category hierarchy to train a classifier. To address the problem of keyword ambiguity, the authors used the category information associated with each extracted keyword to understand the intent of the keyword in the scope of a particualr web page.
Comments :
+ Very simple and interesting paper talking about automatic keyword extraction techniques.
- Did not talk about phrases and how to deal with them.