Enhancing human annotation: Leveraging large language models and efficient batch processing

Oleg Zendel; J Shane Culpepper; Falk Scholer; Paul Thomas

Enhancing human annotation: Leveraging large language models and efficient batch processing

Oleg Zendel ,
J Shane Culpepper ,
Falk Scholer ,
Paul Thomas

Conference on Human Information Interaction and Retrieval | March 2024

Published by ACM Press

PDF

Download BibTex

Large language models (LLMs) are capable of assessing document and query characteristics, including relevance, and are now being used for a variety of different classification labeling tasks as well. This study explores how to use LLMs to classify an information need, often represented as a user query. In particular, our goal is to classify the cognitive complexity of the search task for a given “backstory”. Using 180 TREC topics and backstories, we show that GPT-based LLMs agree with human experts as much as other human experts. We also show that batching and ordering can significantly impact the accuracy of GPT-3.5, but rarely alters the quality of GPT-4 predictions. This study provides insights into the efficacy of large language models for annotation tasks normally completed by humans, and offers recommendations for other similar applications.