Downloads
ORCAS: Open Resource for Click Analysis in Search
April 2024
ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.
TREC Tip-of-the-Tongue Track
April 2024
Tip-of-the-tongue (ToT) known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier” (i.e., “It’s on the tip of my tongue…”). The TREC ToT track aims to…
TREC Deep Learning Track
April 2024
The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not…
Deep Language Networks
September 2023
We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call…
INTREPID
May 2023
INTREPID (stands for INTeractive learning via REPresentatIon Discovery) is a library that contains various interactive learning algorithms that learn a representation (or a latent state) from observational data in order to complete their tasks.
Protein sequence models
November 2021
Codebase for generative modeling of protein sequence and structure, including code for CNNs and GNNs and custom data handling code.
Tip of the Tongue Known Item Retrieval Dataset for Movie Identification
August 2021
The Tip of the Tongue (ToT) dataset is from the paper Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification. It is comprised of 758 question/answer pairs scraped from the website iRememberThisMovie.com between 2013 and 2018. These…
Python Reasoning Challenges
May 2021
A short Python Reasoning Challenge can replace an entire page of English describing a typical programming problem. The goal is to teach computers how to program. This OSS repository will contain a dataset of short Python challenges. Most of them…
Conformer-Kernel Model with Query Term Independence (TREC Deep Learning Quick Start)
March 2021
This is a quick start guide for the document ranking task in the TREC Deep Learning (TREC-DL) benchmark. If you are new to TREC-DL, then this repository may make it more convenient for you to download all the required datasets…