treeherder/docs/accessing_data.md

4.2 KiB

Accessing Treeherder's data

Treeherder's data can be accessed via:

REST API

Treeherder provides a REST API which can be used to query for all the push, job, and performance data it stores internally. For a browsable interface, see:

https://treeherder.mozilla.org/docs/

Python Client

We provide a library, called treeherder-client, to simplify interacting with the REST API. It is maintained inside the Treeherder repository, but you can install your own copy from PyPI using pip:

pip install treeherder-client

It will install a module called thclient that you can access, for example:

from thclient import TreeherderClient

By default the production Treeherder API will be used, however this can be overridden by passing a server_url argument to the TreeherderClient constructor:

# Treeherder production
client = TreeherderClient()

# Treeherder stage
client = TreeherderClient(server_url='https://treeherder.allizom.org')

# Local vagrant instance
client = TreeherderClient(server_url='http://localhost:8000')

The Python client has some convenience methods to query the Treeherder API.

Here's a simple example which prints the start timestamp of all the jobs associated with the last 10 pushes on mozilla-central:

from thclient import TreeherderClient

client = TreeherderClient()

pushes = client.get_pushes('mozilla-central') # gets last 10 by default
for pushes in pushes:
    jobs = client.get_jobs('mozilla-central', push_id=pushes['id'])
    for job in jobs:
        print job['start_timestamp']

When using the Python client, don't forget to set up logging in the caller so that any API error messages are output, like so:

import logging

logging.basicConfig()

For verbose output, pass level=logging.DEBUG to basicConfig().

User Agents

When interacting with Treeherder's API, you must set an appropriate User Agent header (rather than relying on the defaults of your language/library) so that we can more easily track API feature usage, as well as accidental abuse. Default scripting User Agents will receive an HTTP 403 response (see bug 1230222 for more details).

If you are using the Python Client, an appropriate User Agent is set for you. When using the Python requests library, the User Agent can be set like so:

r = requests.get(url, headers={'User-Agent': ...})

GraphQL API

This API is a work in progress. A browsable interface is available at:

https://treeherder.mozilla.org/graphql

Redash

Mozilla's Redash instance at https://sql.telemetry.mozilla.org is configured to use Treeherder's read-only MySQL RDS replica as a data source. Users with LDAP credentials can find Treeherder's data under the Treeherder data source and cross-reference it with other data sets available there.

ActiveData

ActiveData imports Treeherder's production data into its Elasticsearch cluster. See the getting started with ActiveData guide for more details.

Direct database access

If there are any use-cases that are not possible via one of the above, we can provide read-only access to Treeherder's production MySQL RDS replica. Please file an infrastructure bug requesting that someone from the Treeherder team grant access to the read-only replica.

Alternatively if write access is required, we can create a temporary RDS instance from a production database snapshot.