зеркало из https://github.com/mozilla/data-docs.git
1.5 KiB
1.5 KiB
Creating Your Own Dataset to Query in re:dash
- Create a spark notebook that does the transformations you need, either on raw data (using Dataset API) or on parquet data
- Output the results of that to an s3 location, usually
telemetry-parquet/user/$YOUR_DATASET/v$VERSION_NUMBER/submission_date=$YESTERDAY/
. This would partition by submission_date, meaning each day this runs and is outputted to a new location in s3. Do NOT put the submission_date in the parquet file as well! A column name cannot also be the name of a partition. - Using this template,
open a bug to load the dataset in Presto with the following attributes:
- Assigned to :robotblake
- Title: "Add Dataset to Presto"
- Content: Location of the dataset and the desired table name