Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction

Antonio Di Marco; Roberto Navigli

DOI:10.1162/COLI_a_00148
Corpus ID: 1775181

Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction

@article{Marco2013ClusteringAD,
  title={Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction},
  author={Antonio Di Marco and Roberto Navigli},
  journal={Computational Linguistics},
  year={2013},
  volume={39},
  pages={709-754},
  url={https://api.semanticscholar.org/CorpusID:1775181}
}

Antonio Di MarcoRoberto Navigli
Published in International Conference on… 6 August 2013
Computer Science

Key to the approach is to first acquire the various senses of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced, which outperforms both Web clustering and search engines.

View on ACL

dsi.uniroma1.it

158 Citations

Highly Influential Citations

Background Citations

Methods Citations

Results Citations

Topics

Word Sense Induction B-MST Search Results Clustering HyperLex Web Search Result Clustering MORESQUE Query-senses Induced Senses SRC Systems Web Search

Retrieving web search results using Max–Max soft clustering for Hindi query

Amita JainD. TayalSudesh Yadav

Computer Science

International Journal of System Assurance…

2014

This is the first attempt to fuzzy IR for a query in Hindi language, experimental evaluations shows promising results.

Multilingual Word Sense Induction to Improve Web Search Result Clustering

Lorenzo AlbanoD. BeneventanoS. Bergamaschi

Computer Science

Sistemi Evoluti per Basi di Dati

2015

Some preliminary ideas to exploit the multilingual Word Sense Induction method to Web search result clustering to improve the WSI results are given.

Neural Embedding Language Models in Semantic Clustering of Web Search Results

Andrey KutuzovE. Kuzmenko

Computer Science

International Conference on Language Resources…

2016

It is shown that in the task of semantically clustering search results, prediction-based models slightly but stably outperform traditional count-based ones, with the same training corpora.

Graph-Based Concept Clustering for Web Search Results

S. JinaratC. HaruechaiyasakA. Rungsawang

Computer Science

2015

This paper proposes a method to cluster the web search results with high clustering quality using graph-based clustering with concept which extract from the external knowledge source, and compared the clustering results of this method with two well-known search results clustering techniques, Suffix Tree Clustering and Lingo.

[PDF]

PageRank-based Word Sense Induction within Web Search Results Clustering

Jose G. MorenoG. Dias

Computer Science

IEEE/ACM Joint Conference on Digital Libraries

2014

The evaluation results show that PageRank-based sense induction achieves interesting results when compared to state-of-the-art content-based algorithms in the context of Web Search Results Clustering.

A Relative Study on Search Results Clustering Algorithms - K-means, Suffix Tree and LINGO

R. MahalakshmiV. Praba

Computer Science

2013

A comparative analysis is done on three common search results of clustering algorithms to study the performance of the web search engine using m ultiple test collections and evaluation measures.

Web Search Results Clustering Using Frequent Termset Mining

Marek Kozłowski

Computer Science

Pattern Recognition and Machine Intelligence

2015

This work acquires the senses of a query by means of a word sense induction method that identify meanings as trees of closed frequent termsets mining and clusters the search results based on their lexical and semantic intersection with induced senses.

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

F. M. CecchiniMartin RiedlE. FersiniChris Biemann

Computer Science

Language Resources and Evaluation

2018

A self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering).

A HYBRID APPROACH FOR WEB SEARCH RESULT CLUSTERING BASED ON GENETIC ALGORITHM WITH K-MEANS

B. Al-AttarA. AllamiA. ImeerY. F. AlasadiNorwawiHawraa M. Kadhim

Computer Science

2021

An efficient hybrid web search results clustering algorithm referred to as G-K-M is presented, whereby, K-means with a modified genetic algorithm is combined, whereby, the proposed approach demonstrates its significant advantages over traditional clustering.

A Novel Method for Clustering Web Search Results with Wikipedia Disambiguation Pages

Zhi HuangZhendong NiuDonglei LiuW. NiuWei Wang

Computer Science

DASFAA Workshops

2015

A novel method to cluster search results of ambiguous query into topics about the query constructed from Wikipedia disambiguation pages (WDP) is proposed and a concept filtering method to filter semantically unrelated concepts in each topic is proposed.

Inducing Word Senses to Improve Web Search Result Clustering

Roberto NavigliG. Crisafulli

Computer Science

Conference on Empirical Methods in Natural…

2010

This work first acquires the senses of a query by means of a graph-based clustering algorithm that exploits cycles in the co-occurrence graph of the query, then clusters the search results based on their semantic similarity to the induced word senses.

An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities

Jiyang ChenOsmar R ZaianeR. Goebel

Computer Science

2008 IEEE/WIC/ACM International Conference on Web…

2008

The clustering problem as a word sense discovery problem is reformalized as a unsupervised method and the modularity score of the discovered keyword community structure is used to measure page clustering necessity.

Graph-based Word Clustering using a Web Search Engine

Y. MatsuoTakeshi SakakiKoki UchiyamaM. Ishizuka

Computer Science

Conference on Empirical Methods in Natural…

2006

An unsupervised algorithm for word clustering based on a word similarity measure by web counts, called Newman clustering, is proposed for efficiently identifying word clusters.

Web Search Clustering and Labeling with Hidden Topics

Cam-Tu NguyenX. PhanS. HoriguchiThu-Trang NguyenQuang-Thuy Ha

Computer Science

TALIP

2009

This article introduces a novel framework for clustering Web search results in Vietnamese which is able to cluster and label short snippets effectively in a topic-oriented manner without concerning whole Web pages.

Word Sense Induction & Disambiguation Using Hierarchical Random Graphs

Ioannis P. KlapaftisS. Manandhar

Computer Science, Linguistics

Conference on Empirical Methods in Natural…

2010

The inferred hierarchical structures are applied to the problem of word sense disambiguation, where it is shown that the method performs significantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.

Web document clustering: a feasibility demonstration

Oren ZamirOren Etzioni

Computer Science

Annual International ACM SIGIR Conference on…

1998

To satisfy the stringent requirements of the Web domain, an incremental, linear time algorithm called Suffix Tree Clustering (STC) is introduced which creates clusters based on phrases shared between documents, showing that STC is faster than standard clustering methods in this domain.

Clustering Web Search Results with Maximum Spanning Trees

Antonio Di MarcoRoberto Navigli

Computer Science

International Conference of the Italian…

2011

This work presents a novel method for clustering Web search results based on Word Sense Induction, which improves classical search result clustering methods in terms of both clustering quality and degree of diversification.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

Celina SantamaríaJulio GonzaloJ. Artiles

Computer Science

Annual Meeting of the Association for…

2010

Wikipedia has a much better coverage of search results, the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and associating Web pages to Wikipedia senses with simple and efficient algorithms can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings.

Word sense disambiguation in queries

Shuang LiuClement T. YuW. Meng

Computer Science

International Conference on Information and…

2005

A new approach to determine the senses of words in queries by using WordNet is presented, which has 100% applicability and 90% accuracy on the most recent robust track of TREC collection of 250 queries and the retrieval effectiveness is 7% better than the best reported result in the literature.

Information retrieval using word senses: root sense tagging approach

Sang-Bum KimHee-Cheol SeoHae-Chang Rim

Computer Science

Annual International ACM SIGIR Conference on…

2004

This paper proposes a new method using word senses in information retrieval: root sense tagging method that assigns coarse-grained word senses defined in WordNet to query terms and document terms by unsupervised way using co-occurrence information constructed automatically.

Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction

Topics

158 Citations

Retrieving web search results using Max–Max soft clustering for Hindi query

Multilingual Word Sense Induction to Improve Web Search Result Clustering

Neural Embedding Language Models in Semantic Clustering of Web Search Results

Graph-Based Concept Clustering for Web Search Results

PageRank-based Word Sense Induction within Web Search Results Clustering

A Relative Study on Search Results Clustering Algorithms - K-means, Suffix Tree and LINGO

Web Search Results Clustering Using Frequent Termset Mining

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

A HYBRID APPROACH FOR WEB SEARCH RESULT CLUSTERING BASED ON GENETIC ALGORITHM WITH K-MEANS

A Novel Method for Clustering Web Search Results with Wikipedia Disambiguation Pages

118 References

Inducing Word Senses to Improve Web Search Result Clustering

An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities

Graph-based Word Clustering using a Web Search Engine

Web Search Clustering and Labeling with Hidden Topics

Word Sense Induction & Disambiguation Using Hierarchical Random Graphs

Web document clustering: a feasibility demonstration

Clustering Web Search Results with Maximum Spanning Trees

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

Word sense disambiguation in queries

Information retrieval using word senses: root sense tagging approach

Related Papers