Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction

@article{Marco2013ClusteringAD,
  title={Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction},
  author={Antonio Di Marco and Roberto Navigli},
  journal={Computational Linguistics},
  year={2013},
  volume={39},
  pages={709-754},
  url={https://api.semanticscholar.org/CorpusID:1775181}
}
Key to the approach is to first acquire the various senses of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced, which outperforms both Web clustering and search engines.

Retrieving web search results using Maxโ€“Max soft clustering for Hindi query

This is the first attempt to fuzzy IR for a query in Hindi language, experimental evaluations shows promising results.

Multilingual Word Sense Induction to Improve Web Search Result Clustering

Some preliminary ideas to exploit the multilingual Word Sense Induction method to Web search result clustering to improve the WSI results are given.

Neural Embedding Language Models in Semantic Clustering of Web Search Results

It is shown that in the task of semantically clustering search results, prediction-based models slightly but stably outperform traditional count-based ones, with the same training corpora.

Graph-Based Concept Clustering for Web Search Results

This paper proposes a method to cluster the web search results with high clustering quality using graph-based clustering with concept which extract from the external knowledge source, and compared the clustering results of this method with two well-known search results clustering techniques, Suffix Tree Clustering and Lingo.

PageRank-based Word Sense Induction within Web Search Results Clustering

The evaluation results show that PageRank-based sense induction achieves interesting results when compared to state-of-the-art content-based algorithms in the context of Web Search Results Clustering.

A Relative Study on Search Results Clustering Algorithms - K-means, Suffix Tree and LINGO

A comparative analysis is done on three common search results of clustering algorithms to study the performance of the web search engine using m ultiple test collections and evaluation measures.

Web Search Results Clustering Using Frequent Termset Mining

This work acquires the senses of a query by means of a word sense induction method that identify meanings as trees of closed frequent termsets mining and clusters the search results based on their lexical and semantic intersection with induced senses.

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

A self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering).

A HYBRID APPROACH FOR WEB SEARCH RESULT CLUSTERING BASED ON GENETIC ALGORITHM WITH K-MEANS

An efficient hybrid web search results clustering algorithm referred to as G-K-M is presented, whereby, K-means with a modified genetic algorithm is combined, whereby, the proposed approach demonstrates its significant advantages over traditional clustering.

A Novel Method for Clustering Web Search Results with Wikipedia Disambiguation Pages

A novel method to cluster search results of ambiguous query into topics about the query constructed from Wikipedia disambiguation pages (WDP) is proposed and a concept filtering method to filter semantically unrelated concepts in each topic is proposed.
...

Inducing Word Senses to Improve Web Search Result Clustering

This work first acquires the senses of a query by means of a graph-based clustering algorithm that exploits cycles in the co-occurrence graph of the query, then clusters the search results based on their semantic similarity to the induced word senses.

An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities

The clustering problem as a word sense discovery problem is reformalized as a unsupervised method and the modularity score of the discovered keyword community structure is used to measure page clustering necessity.

Graph-based Word Clustering using a Web Search Engine

An unsupervised algorithm for word clustering based on a word similarity measure by web counts, called Newman clustering, is proposed for efficiently identifying word clusters.

Web Search Clustering and Labeling with Hidden Topics

This article introduces a novel framework for clustering Web search results in Vietnamese which is able to cluster and label short snippets effectively in a topic-oriented manner without concerning whole Web pages.

Word Sense Induction & Disambiguation Using Hierarchical Random Graphs

The inferred hierarchical structures are applied to the problem of word sense disambiguation, where it is shown that the method performs significantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.

Web document clustering: a feasibility demonstration

To satisfy the stringent requirements of the Web domain, an incremental, linear time algorithm called Suffix Tree Clustering (STC) is introduced which creates clusters based on phrases shared between documents, showing that STC is faster than standard clustering methods in this domain.

Clustering Web Search Results with Maximum Spanning Trees

This work presents a novel method for clustering Web search results based on Word Sense Induction, which improves classical search result clustering methods in terms of both clustering quality and degree of diversification.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

Wikipedia has a much better coverage of search results, the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and associating Web pages to Wikipedia senses with simple and efficient algorithms can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings.

Word sense disambiguation in queries

A new approach to determine the senses of words in queries by using WordNet is presented, which has 100% applicability and 90% accuracy on the most recent robust track of TREC collection of 250 queries and the retrieval effectiveness is 7% better than the best reported result in the literature.

Information retrieval using word senses: root sense tagging approach

This paper proposes a new method using word senses in information retrieval: root sense tagging method that assigns coarse-grained word senses defined in WordNet to query terms and document terms by unsupervised way using co-occurrence information constructed automatically.
...