The research community still feels the lack of a large-scale data for studying different aspects of search clarification.
All the datasets presented in this paper only demonstrate the queries from the en-US market.
MIMICS-Click
and MIMICS-ClickExplore
are based on user interaction in Bing.
MIMICS-Manual
is based on manual annotations of clarification paned by multiple trained annotators.
Create MIMICS, consists of three datasets:
MIMICS-Click
: includes over 400k unique queries with the associated clarification panes.MIMICS-ClickExplore
: is an exploration data and contains multiple clarification panes per query. It includes over 60k unique queries.MIMICS-Manual
: is a smaller dataset with manual annotations for clarifying questions, candidate answer sets, and the landing result page after clicking on individual candidate answers.only kept the queries for which a clarification pane was rendered in the search engine result page (SERP).
the clarification panes were solely generated based on the submitted queries, therefore they do not include session and personalized information
This resulted in 414,362 unique queries, each associated with exactly one clarification pane. Out of which 71,188 of clarifications have received positive clickthrough rates.
Although MIMICS-Click is a invaluable resource for learning to generate clarification and related research problems, it does not allow researchers to study some tasks, such as studying click bias in user interactions with clarification.
we used the top-m clarifications generated by our algorithms and presented them to different sets of users. The user interactions with multiple clarification panes for the same query at the same time period enable
comparison of these clarification panes
The resulted dataset contains 64,007 unique queries and 168,921 query-clarification pairs. Out of which, 89,441 query-clarification pairs received positive engagements.
Note that the sampling strategies for MIMICS-Click and MIMICS-ClickExplore are different which resulted in significantly more query-clarification pairs with low impressions in MIMICS-Click.
Click does not necessarily reflect all quality aspects. In addition, it can be biased for many reasons.
step1: asked the annotators to skim and review a few pages of the search results returned by Bing
step2: Each clarifying question is given a label 2 (Good), 1 (Fair), or 0 (Bad)(does not show the candidate answers to the annotators at this stage)
step3: the annotators were asked to judge the overall quality of the candidate answer set(2, 1, 0)
step4: Annotating the Landing SERP(the SERP after clicking one answer) Quality for Each Individual Candidate Answer
Note: in case of having a generic template instead of clarifying questions (i.e., “select one to refine your search”), we do not ask the annotators to provide a question quality labels.
MIMICS-Click
and MIMICS-ClickExplore
contain a three-level impression label per query-clarification pair
We study user engagements and manual quality labels with respect to query length
the average engagement increases as the queries get longer.
this is inconsistent with the manual annotations suggesting that single word queries have higher question quality, answer set quality, and also landing page quality
MIMICS-Click
and MIMICS-ClickExplore
datasets MIMICS-Click
and MIMICS-ClickExplore
both contain conditional click probability on each individual answer.
The entropy of this probabilistic distribution demonstrates how clicks are distributed across candidate answers.