Граф коммитов

492 Коммитов

Автор SHA1 Сообщение Дата
kik-kik f579e95fd9
# bug(1854406): Addressing installation problems affecting consumers of this repo. (#277)
* updated setup.py and explitly specifying dependencies versions + removed user_scm_version flag

* Tweaked CI configuration to comment out any potential publishing steps

* trying to fix CI build

* fixing linting error

* Tweaked CI configuration to comment out any potential publishing steps
2023-11-09 11:04:15 +01:00
Jeff Klukas 4ebd638bea Final deprecation/archive docs (#275)
After this is merged, we can set the repo as archived.
2019-12-27 13:08:28 -05:00
William Lachance 4eab35735a Add deprecation notice to docs 2019-11-06 15:27:19 -05:00
William Lachance 49f6f3a7cf Remove python 2.7 circle test 2019-11-06 15:27:19 -05:00
William Lachance ad3e2d4692 Fix flake8 nit 2019-11-06 15:27:19 -05:00
William Lachance b3372a8542 Attempt to fix circle build 2019-11-06 15:27:19 -05:00
William Lachance 3a1f932844 Add a deprecation notice when people try to import this library 2019-11-06 15:27:19 -05:00
Tim D. Smith 09ddf1ec7d Document that callables receive sanitized values (#243)
Explain that replacing `key="a-value"` with `key=lambda x: x
in ("a-value", "b-value")` will not work.
2018-12-06 11:31:00 -04:00
Tim Smith 2f030ed5bc Use threads instead of processes in Dataset.summaries
Dataset.summaries uses a concurrent.futures.ProcessPoolExecutor to fetch multiple files from S3 at once.
ProcessPoolExecutor uses multiprocessing underneath, which defaults to using fork() on Unix.
Using fork() is dangerous and prone to deadlocks: https://codewithoutrules.com/2018/09/04/python-multiprocessing/

This is a possible source of observed deadlocks during calls to Dataset.records.

Using threads should not be a performance regression since the operation we're parallelizing over is network-bound,
not CPU-bound, so there should not be much contention for the GIL.
2018-11-28 12:53:14 -05:00
Jeff Klukas fb68074459 Get test suite working with latest moto 2018-11-06 09:18:57 -05:00
Jeff Klukas 5f7c4fe9c2 Small fixes for docs-deploy CI job 2018-11-06 09:18:57 -05:00
Jeff Klukas 679b89bd26 Fix some errors in API docs and link to DTMO 2018-11-02 11:14:58 -04:00
Jeff Klukas 3b50362eb7 Test docs and deploy to gh-pages branch
We should definitely be testing docs as part of CI to make sure they build,
which is addressed here. But this change also explores what it could look
like to publish docs to Github Pages rather than ReadTheDocs, so that we can
avoid the additional developer friction of understanding that service and
maintaining user permissions there.

The gh-pages docs are live at:
http://mozilla.github.io/python_moztelemetry/

A few features I notice missing that ReadTheDocs provides:

- hosting multiple versions of the docs, though we don't look to be using this
- download links for PDF, HTML, and Epub
- "Edit on GitHub" links; the gh-pages rendered version links to the content
  on the gh-pages branch rather than on master

The above features are indeed nice to have. RTD is also a fairly
python-specific tool, so if we do value hosting API docs for our projects,
the technique here is a bit more transferable.

This PR is mostly intended to provoke discussion.
I'm totally fine if we decide to close this.
2018-11-02 09:18:51 -04:00
William Lachance e25b34f884
Use python3 for readthedocs (#238) 2018-11-01 10:30:20 -04:00
Jeff Klukas 197684236f Bug 1341566 Regenerate protobuf classes
Also adds documentation on how to regenerate the classes in the future.
2018-10-31 10:32:22 -04:00
Mark Reid 79db9acf9c
Update docstring to clarify SparkContext vs. SparkSession (#236)
* Update docstring to clarify SparkContext vs. SparkSession

* Ignore new pycodestyle W504 rule

It's in pycodestyle's default ignore list since it's mutually
exclusive with the existing W503 rule.
2018-10-31 10:01:42 -03:00
Jeff Klukas dc66033b18 Pass expected argument to --cov
`./bin/test tests/heka/` was running all tests rather than just those in the
target directory because `--cov` was eating the directory, assuming it was
the argument of where we should do code coverage.
2018-09-20 16:49:12 -04:00
Jeff Klukas 4e4a94675f Pin boto3 version for tests due to moto incompatibility 2018-09-14 10:13:40 -04:00
Jeff Klukas 9075ec4775 Use a named docker volume for tox cache
This should avoid the problem of files being created with root
ownership on Linux hosts
2018-09-14 10:13:40 -04:00
Frank Bertsch 61fddfd863
Increase pandas version (#225)
Spark 2.3 requires Pandas 0.19.2 for some operations.
See bug 1484715
2018-08-24 14:05:28 -05:00
Erin 6323d1cadf Bug 1429902 - Dataset.dataframe method (#227)
Allow loading a Dataset that uses `select` directly into a Dataframe
2018-08-22 13:20:43 -07:00
Jeff Klukas 939ed4de26 Deploy a universal wheel for both Python 2 and Python 3 2018-08-17 15:42:05 -04:00
Anthony Miyaguchi 915b964fc7 Remove python bytecode in source directories 2018-08-17 14:19:28 -04:00
Jeff Klukas 8acb376e37 Copy source into image rather than mount in container
@acmiyaguchi reported that with the source mounted into the container,
cache files were written to the local file system that then
couldn't be removed without sudo on an Ubuntu host.

This change should make sure all cache files are written inside the
container so they don't hit the local filesystem.
2018-08-17 14:19:28 -04:00
Jeff Klukas 9bf1383ec4 Document a bit about testing outside docker 2018-08-17 14:19:28 -04:00
Jeff Klukas 4d6280c04f Bug 1477808 Move from TravisCI to CircleCI 2.0 2018-08-17 14:19:28 -04:00
Jeff Klukas c9bc94dcb2 Add Dataset max_concurrency parameter
On CircleCI's docker infrastructure, cpu_count returns 32,
even though the container is limited to 2 virtual CPUs;
this caused a high number of spawned processes that caused
test timeouts.

We set max_concurrency low for tests so they can complete quickly
on CircleCI.
2018-08-17 14:19:28 -04:00
Jeff Klukas d4c8319878 Explicitly mark spark_context fixture as session-scoped
We also alter some quotes solely for consistency.
2018-08-17 14:19:28 -04:00
Jeff Klukas 3a8a6fbf07 Fix some nondeterministic tests
Sorting was not always stable when collecting dataframes.
2018-08-17 14:19:28 -04:00
Anthony Miyaguchi 7436a76a83
Reorganize `Dataset.record` for clarity (#222) 2018-08-16 09:40:44 -07:00
Erin 410198b1f7 Bug 1463877 - partition strategy (#217) 2018-08-06 11:17:20 -07:00
Daniel Thorn 4148c5f64d
Fix .travis.yml to only deploy from python 3.6 (#214) 2018-07-12 12:09:58 -07:00
Daniel Thorn a38e07ebd3
Add support for python3 (#208)
and make python3 default in docker
2018-07-12 09:39:45 -07:00
pyup-bot 51edf77b7c Update sphinx-rtd-theme from 0.2.4 to 0.4.0 2018-06-08 11:36:00 -04:00
William Lachance 5175938186 Bug 1463885 - Disrecommend get_one_ping_per_client (#212)
* Update docs to more clearly mark deprecated methods as... deprecated

* Bug 1463885 - Disrecommend get_one_ping_per_client even more
2018-06-08 09:43:18 -03:00
Anthony Miyaguchi 8d3813c321 Drop invalid content fields 2018-05-25 10:43:11 -07:00
Anthony Miyaguchi 543ada4266 Add test for string decoding; fix unreachable code 2018-05-25 10:43:11 -07:00
Anthony Miyaguchi 3e48c04a51 Bug 1447851 - Add content decoding for landfill
Documents from landfill will be decoded directly into their string
representation. The logic for _parse_heka_record is generally
unnecessary because fields are not extracted when dumped to landfill.
2018-05-25 10:43:11 -07:00
Wesley Dawson 238d9cbb71 WIP Add support for landfill messages 2018-05-25 10:43:11 -07:00
William Lachance 2a1765fbfc Remove hbase reference in docs (#186) 2018-04-27 14:58:40 -05:00
Frank Bertsch e99644f41a Change deploy acct to frank (#203) 2018-04-02 13:19:20 -07:00
Daniel Thorn d9c90375e4
Port client sampling to python (#202) 2018-04-02 10:09:33 -07:00
Frank Bertsch 215f9bef1c
Remove aurora scalar definitions (#200) 2018-03-08 09:27:24 -06:00
Rob Hudson d77eb9c121 Ignore current bare except statements
These will need to be fixed at some point but shouldn't block this
change.
2018-02-12 11:24:34 -08:00
Rob Hudson 4d74d853ce Bug 1376905 - Run flake8 when testing locally 2018-02-12 11:24:34 -08:00
Alessio Placitelli 069a938c47
Use parse_scalars.py instead of custom code (#194)
* Use parse_scalars.py instead of custom code

This additionally removes the REQUIRED_FIELDS and
OPTIONAL_FIELDS dictionaries: these checks would
be performed by the parse_scalars.py library with
|strict_type_checks=True|. However, for server side
computation, we're usually disabling this to be
backward compatible with older registry formats.

* Make the updater script refresh all the dependencies
2018-02-09 14:16:13 +01:00
Teon L Brooks fe22cc096a FIX typo 2018-02-07 10:20:13 -05:00
Anthony Miyaguchi a9f74c76cc
Add test for extracted fields that are nested (#189)
Flattened fields are remapped into the proper place in the payload. Add
a test for nested fields.
2018-01-03 14:31:03 -08:00
Ryan Leake 88588aba1c Bug 1419761 - Rename "histogram_tools.py" to "parse_histograms.py" (#188)
* Bug 1419761 - Rename "histogram_tools.py" to "parse_histograms.py"

Rename histogram_tools and update_histogram_tools
Replace instances of "histogram_tools" with "parse_histograms"
Fixes bug 1419761

* Bug 1419761 - Rename "histogram_tools.py" to "parse_histograms.py"

- Retain update_parse_histograms permissions
- Add note of file name change to README

* Change file permissions of update_parse_histograms
2017-11-24 14:41:08 -04:00
Daniel Thorn df30af85f8 Bug 1414582 - Print warning for "sample" (#187)
Print a warning when using the "sample" functionality of Dataset
Fixes Bug 1414582
2017-11-10 17:19:03 -04:00