Tip of the Tongue Known Item Retrieval Dataset for Movie Identification
The Tip of the Tongue (ToT) dataset is from the paper Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification. It is comprised of 758 question/answer pairs scraped from the website iRememberThisMovie.com between 2013 and 2018. These question/answer pairs consist of REQUESTS, in which a user of the website describes a movie they have seen but whose title they have forgotten, and ANSWERS, which consist of different solutions to the request from other users of the website. We also attach Wikipedia/IMDB links for the films. We annotate the text of the REQUESTS on the sentence level using a handcrafted set of codes. This set of codes is used to identify trends in the data such as mentions of release/viewing dates, characters or locations remembered from the film, circumstances surrounding the viewing of the film, and others. A complete list of these codes (also available in table 1, 2 and 3 in section 4.3 of our paper) is presented below: Movie: Codes touching on the content of the movie Character: Describes a character Scene: Describes a scene Object: Describes a tangible object in a scene Location type: Describes a scene’s location type Plot summary: Describes the overall plot or premise Release date: Describes timeframe of movie release Visual style: Describes visual style (e.g., black and white, colour, CGI animation, etc.) Language: Describes the language spoken Regional Origin: Describes movie’s region of origin Specific location: Describes a scene’s specific location Quote/dialogue: Describes a quote from the movie Real person: Describes real person associated with movie Camera angle: Describes camera action Singular timeframe: Describes timeframe Multiple timeframe: Describes the passage of time in the movie Fictional person: Describes fictional person associated with movie (directly or indirectly) Actor nationality: Describes nationality or ethnicity associated with actor/actress Target audience: Describes movie’s target audience Compares music: Describes movie’s soundtrack Specific music: Describes specific song in the movie. Context: Codes touching on the context in which the movie was seen Temporal context: Describes when the movie was seen, either in absolute terms (e.g., around 2008) or relative terms (e.g., when I was a kid) Physical medium: References the physical medium associated with watching the movie (e.g., TV, theatre, VHS, etc.) Cross media: Describes exposure to movie through different media (e.g., trailer, DVD cover, poster, etc.) Contextual witness: Describes other people involved in the movie watching experience Physical location: Describes physical location where movie was watched Concurrent events: Describes events relevant to time period when movie was watched The following categories do not contain sub-codes Previous Search: Indicates that a previous attempt had been made to find the movie title Social: Indicates that the sentence is primarily a social nicety without content relating to the film Uncertainty: Indicates that the sentence contains language revealing uncertainty on the author’s part Opinion: Indicates that the sentence contains language conveying an opinion or judgement of the movie Emotion: Indicates that the sentence contains language conveying an emotion the movie made the author feel Relative Comparison: Indicates that the sentence contains language describing the movie using relative terms (such as comparisons to other movies, actors, locations, etc.)