Data sets for spoken conversational search

Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval |

There is increasing interest in spoken conversational search—multi-turn interactions with a search engine, spoken in natural language—but until recently there was little public data to support research.

We describe our experiences building two data sets for spoken conversational search: the Microsoft Information-Seeking Conversation set (“MISC”) and the Spoken Conversational Search set (“SCSdata”). Each data set contains recordings of spoken interactions between two people collaborating on web search tasks, but relatively small differences in protocol have led to observably different data. We discuss some consequences of these differences, and describe attempts to reproduce analyses from one set to the other.