This poster paper was published in the recently concluded WWW 2008 conference in Beijing.The author suggests that spam blogs (splogs) form co-citation clusters because they share advertisement links between each other.The experiments were based on 691,674 blogs, collected during a week in 2007, from the Japanese blogosphere. The author reports that high out-degree blogs are usually spam blogs(95% of the time). An iterative spam traversal algorithm is implemented to extract spams using co-citation cluster analysis of known spam blogs(seed set).The seed set was automatically generated using high out-degree pages and pages containing adult and commercial keywords.
Comments :
* He does not mention what is meant by high out-degree.
* Method seems simple and intuitive, but more exhaustive experiments would provide more insight.
