Details
Original language | English |
---|---|
Title of host publication | CIKM 2024 |
Subtitle of host publication | Proceedings of the 33rd ACM International Conference on Information and Knowledge Management |
Pages | 5323-5327 |
Number of pages | 5 |
ISBN (electronic) | 9798400704369 |
Publication status | Published - 21 Oct 2024 |
Event | 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States Duration: 21 Oct 2024 → 25 Oct 2024 |
Abstract
As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.
Keywords
- ad-hoc retrieval, data collection, diversity, intent dataset, ranking, user intents, web search
ASJC Scopus subject areas
- Business, Management and Accounting(all)
- General Business,Management and Accounting
- Decision Sciences(all)
- General Decision Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
CIKM 2024 : Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2024. p. 5323-5327.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Understanding the User
T2 - 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
AU - Anand, Abhijit
AU - Leonhardt, Jurek
AU - Venktesh, V.
AU - Anand, Avishek
N1 - Publisher Copyright: © 2024 Owner/Author.
PY - 2024/10/21
Y1 - 2024/10/21
N2 - As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.
AB - As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.
KW - ad-hoc retrieval
KW - data collection
KW - diversity
KW - intent dataset
KW - ranking
KW - user intents
KW - web search
UR - http://www.scopus.com/inward/record.url?scp=85210031597&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2408.17103
DO - 10.48550/arXiv.2408.17103
M3 - Conference contribution
AN - SCOPUS:85210031597
SP - 5323
EP - 5327
BT - CIKM 2024
Y2 - 21 October 2024 through 25 October 2024
ER -