.. |new| raw:: html
New
.. _guide_participant:
Participant guide
=================
.. note:: This guide is being updated as it is being used. Please tell us
what you think is missing. Our contact details can be found under :ref:`help`.
This guide is meant to be a practical guide to participating in the TREC OpenSearch competition. Since we deviate significantly from the typical TREC style
evaluation setup that most participants are likely to be familiar with, we will focus primarily on those differences.
Participating in the lab involves following these steps:
#. Read the `lab description `_ and :ref:`key` below. Make sure you're :ref:`help` when needed.
#. Sign up
#. Sign up for the `TREC OpenSearch mailinglist `_
#. `Register with the lab `_. You can do this at any moment.
#. Sign and send the lab agreement form. You will receive a link to this form.
#. Sign up for individual sites (use-cases) you want to obtain data for. You will receive a link by email to do so.
#. Implement your method as a client that can talk to the API. Examples are provided. See :ref:`method` below.
#. Run your client
#. The client you implement can use the train queries and historical clicks to learn
#. When a testing period starts (see `schedule `_), download test queries and submit your test runs. Again, the testing period will last for several weeks but there is no need (nor the possibility) to update runs.
#. If you take part in TREC OpenSearch, look at the `Guidelines `_ for the schedule, information on writing TREC OpenSearch working notes papers and the conference.
We hope that all steps but 3. and 4. are self explanatory. Below we detail
these two steps in Sections :ref:`method` and :ref:`running` respectively.
Schedule
--------
The schedule of different test rounds can be found on the `Guidelines `_
page on our website.
.. _key:
Key Concepts
------------
First some key concepts some of which may come as a surprise and that you
will need to be aware of. These points all surfaced in discussions with
participants. If you think something is missing or if something could be
explained better or in more detail: please let us know!
Please, read the `lab description `_
for a general idea of what the lab is about.
Run
In TREC OpenSearch terminology, a `run` is a ranking of documents for
one query.
Frequent queries and offline processing
We use frequent queries because these allow participants of the lab to
prepare their runs offline. Since these queries are frequent, users
are likely to issue them again at which point a run from participants
is presented. The major advantage of this approach is that we do not
require participants to respond to a query within a few milliseconds.
The down side is that we only consider frequent (head) queries.
Train and test queries
Train queries are there for you to train your system on. Feedback is
provided for these queries. Test queries on the other hand, are there
to evaluate your system. For these queries, you can not change your
runs during a testing period and you will not obtain feedback for test
queries. Outcomes are computed per testing period for test queries. While
for train queries, outcomes are continuously updated.
No server required
Participants do not need to implement nor run a server for serving search
results to users. This overhead would be a prohibitive burden and is
lifted by our design that uses head queries for which rankings can be
pre-computed.
Feedback is *not* immediate
Feedback comes from real users. That means that real users have to enter
a query that is part of the lab into the search box on the site. They
then have to click a link and this click has to be fed back into the API.
There is bound to be a significant delay between submitting a run and
the feedback becoming available.
Feedback is noisy
Feedback, such as clicks, can not be used as if it were relevance
judgments. Users click for many reasons. For instance, if a ranking shown
is really bad, users may start clicking on all links in the rank out of
despair in which case a click actually signals negative relevance.
Interleaving
Your ranking may not be shown to users directly, it can be interleaved with
the current production system of the site. This means that only about half
the documents shown to a users actually come from your ranking. The other
half comes from the production ranking.
This is generally done for two reasons: it allows pairwise comparisons
between your ranking and the sites ranking. But also, it reduces the risk
of showing bad rankings to users.
Simulations
Besides real clicks from real users, we provide simulated clicks. While
these defy the whole purpose of the living lab setup, they do provide a
more constant stream than real clicks do. This may be useful for debugging
purposes. On the dashboard, simulations are marked with a robot symbol.
Evaluate multiple systems
It is possible to evaluate multiple ranking algorithms (called `systems`)
per participant. Traffic per query is evenly spread across participants,
so if you upload runs for multiple systems, you will receive less feedback
per run. It is not possible to upload runs for multiple systems per test
query during test periods.
.. _scenarios:
Usage Scenarios
---------------
TREC OpenSearch 2017 focuses on academic search. Visit the `Sites `_ page on the website to learn more about
the participating academic search engines.
.. _method:
Implement a Client
------------------
We advise you to first familiarize yourself with the :ref:`api-participants`.
Code that implements a client that talks to this API should approximately take
the following logical steps:
#. Obtain queries
#. For each query, obtain a doclist, a list of candidate documents
#. For each document in these doclists, obtain the content of the documents
(if any, some uses cases such as Seznam only provides feature vectors as
part of the doclist).
#. Create runs, using your ranking algorithm.
#. Upload runs
#. Wait a while to give users a change to interact with your run
#. Download feedback
#. Potentially update your run and repeat from 5.
Examples that implement the above steps are included in the code repository
which can be found here: https://bitbucket.org/living-labs/ll-api/
What follows is a *very minimal* example of the above steps. But it should get
you up and running. While we used Python, there is no such requirement for you.
You are free to use any client that communicate with our API.
Note that this really is a very basic example that is purely exploitative.
It sorts documents only by their click counts. While this may be a reasonable
baseline, it has a huge risk of getting stuck in local optima (unseen documents
never have a change to be clicked). Plus, this approach does not look at the
content of document nor at relevance signals (features). Therefore, it will
not generalize to unseen queries. Nevertheless, it illustrates how to
communicate with the TREC OpenSearch API.
After going through this example, you can get more information by
looking at the :ref:`api-participants`.
Initialize
~~~~~~~~~~
We start of with some imports and definitions. We import HTTPBasicAuth, because
authentication is done via HTTP basic authentication: you should supply your key
as username, while the password should be left empty. Replace :code:`KEY` with your own participant key.
.. sourcecode:: python
import requests
import json
import time
import random
import datetime # needed for timestamp
from requests.auth import HTTPBasicAuth
HOST = "http://api.trec-open-search.org/api/v2"
KEY = "ABCDEF123456"
QUERYENDPOINT = "participant/query"
DOCENDPOINT = "participant/doc"
DOCLISTENDPOINT = "participant/doclist"
RUNENDPOINT = "participant/run"
FEEDBACKENDPOINT = "participant/feedback"
HEADERS = {'content-type': 'application/json'}
Obtain Queries
~~~~~~~~~~~~~~
As a participant, you request frequently-issued queries from a site, in order to create
rankings for them. Frequently-issued queries are likely to re-occur and
yield click results in the future. With every request, supply your key
as username via HTTP basic authentication, while leaving the password empty.
See also :http:get:`/api/v2/participant/query`.
.. sourcecode:: python
def get_queries():
r = requests.get("/".join([HOST, QUERYENDPOINT]), headers=HEADERS, auth=HTTPBasicAuth(KEY, ''))
if r.status_code != requests.codes.ok:
print r.text
r.raise_for_status()
return r.json()
queries = get_queries()
Obtain Doclists
~~~~~~~~~~~~~~~
A site has an unranked list of candidate documents for every query. The :code:`get_doclist` method receives the list of documents for one query from the server. The documents for all queries are then stored in the `runs` dictionary.
See also :http:get:`/api/v2/participant/doclist/(qid)`.
.. sourcecode:: python
def get_doclist(qid):
r = requests.get("/".join([HOST, DOCLISTENDPOINT, qid]), headers=HEADERS, auth=HTTPBasicAuth(KEY, ''))
if r.status_code != requests.codes.ok:
print r.text
r.raise_for_status()
return r.json()
runs = {}
for query in queries["queries"]:
qid = query["qid"]
runs[qid] = get_doclist(qid)
Obtain Feedback and Update Runs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you combine the code above with the following code, the result is a minimal TREC OpenSearch participant. It uploads a ranking to the server which is purely based on the number of clicks a document has received. The content of the documents, which can be received using the `doc` command (:http:get:`/api/v2/participant/doc/(docid)` ), is not taken into account.
A loop makes sweeps over all queries. For every query, it asks for feedback, updates the ranking and uploads the ranking. You can see that a modified version of the :code:`runs` object is uploaded, which has been received from the site at an earlier stage. The `doclist` is changed to the order of the new ranking. Furthermore, the object is appended with a `runid` field. The `runid` is mandatory, but purely used for your own bookkeeping. In this case, the `runid` is the timestamp of the current ranking update sweep, so it could be used later to identify the time a certain ranking was updated.
See also :http:get:`/api/v2/participant/feedback/(qid)` and :http:put:`/api/v2/participant/run/(qid)`
.. sourcecode:: python
def get_feedback(qid):
r = requests.get("/".join([HOST, FEEDBACKENDPOINT, qid]),
headers=HEADERS, auth=HTTPBasicAuth(KEY, ''))
time.sleep(random.random())
if r.status_code != requests.codes.ok:
print r.text
r.raise_for_status()
return r.json()
while True:
# Refresh timestamp when new update of all query rankings
# is started
timestamp = datetime.datetime.now().isoformat()
for query in queries["queries"]:
qid = query["qid"]
feedbacks = get_feedback(qid)
clicks = dict([(doc['docid'], 0) for doc in runs[qid]['doclist']])
for feedback in feedbacks['feedback']:
for doc in feedback["doclist"]:
if doc["clicked"] and doc["docid"] in clicks:
clicks[doc["docid"]] += 1
runs[qid]['doclist'] = [{'docid': docid}
for docid, _ in
sorted(clicks.items(),
key=lambda x: x[1],
reverse=True)]
runs[qid]['runid'] = timestamp
r = requests.put("/".join([HOST, RUNENDPOINT, qid]),
data=json.dumps(runs[qid]), headers=HEADERS, auth=HTTPBasicAuth(KEY, ''))
if r.status_code != requests.codes.ok:
print r.text
r.raise_for_status()
time.sleep(random.random())
.. _running:
Running a Client
----------------
Once you implemented your ranking algorithm to compete in the form
of a client that communicates with our API, you can run your during the whole
training period. After that, you will have the change to download the test
queries for which you can then upload your runs. For this, you will have 24
hours after downloading the test queries. After these 24 hours, the API
will start evaluating your runs using live data. And at that point, there
will be no way for participants to update their rankings anymore.
Review the :ref:`api-participants` for more information.
.. _help:
Getting Help
------------
We do our best to run everything smoothly, but given that this is the first
year and the first lab of its kind, you may hit some bumps.
Please let us know if you have any problems.
- `File an issue `_ if
you think something is wrong with the API.
- Sign up for the `TREC OpenSearch mailinglist `_
- Write an email to the organizers: trec-os-organizers (at) googlegroups.com
If you report issues or ask questions, please provide as many details as you can!
- What API endpoint where you calling?
- What was response?
- What was the HTTP status?
- Was there any stacktrace? Please send it along.
- (How) can you reproduce the problem?
If you are contacting the organizers, it is fine to share a full
HTTP request to the API including your API-key. However, please do not share
this key publicly, such as on the mailinglist.
Citation
--------
If you use the API, please refer to `this paper `_: ::
@inproceedings{balog2016overview,
title={Overview of the trec 2016 open search track},
author={Balog, Krisztian and Schuth, Anne and Tavakolpoursaleh, N and Schaer, P and Chuang, PY and Wu, J and Giles, CL},
booktitle={Proceedings of the Twenty-Fifth Text REtrieval Conference (TREC 2016). NIST},
year={2016}
}