python_moztelemetry

Граф коммитов

Автор	SHA1	Сообщение	Дата
kik-kik	f579e95fd9	# bug(1854406): Addressing installation problems affecting consumers of this repo. (#277 ) * updated setup.py and explitly specifying dependencies versions + removed user_scm_version flag * Tweaked CI configuration to comment out any potential publishing steps * trying to fix CI build * fixing linting error * Tweaked CI configuration to comment out any potential publishing steps	2023-11-09 11:04:15 +01:00
Jeff Klukas	4ebd638bea	Final deprecation/archive docs (#275 ) After this is merged, we can set the repo as archived.	2019-12-27 13:08:28 -05:00
William Lachance	4eab35735a	Add deprecation notice to docs	2019-11-06 15:27:19 -05:00
William Lachance	49f6f3a7cf	Remove python 2.7 circle test	2019-11-06 15:27:19 -05:00
William Lachance	ad3e2d4692	Fix flake8 nit	2019-11-06 15:27:19 -05:00
William Lachance	b3372a8542	Attempt to fix circle build	2019-11-06 15:27:19 -05:00
William Lachance	3a1f932844	Add a deprecation notice when people try to import this library	2019-11-06 15:27:19 -05:00
Tim D. Smith	09ddf1ec7d	Document that callables receive sanitized values (#243 ) Explain that replacing `key="a-value"` with `key=lambda x: x in ("a-value", "b-value")` will not work.	2018-12-06 11:31:00 -04:00
Tim Smith	2f030ed5bc	Use threads instead of processes in Dataset.summaries Dataset.summaries uses a concurrent.futures.ProcessPoolExecutor to fetch multiple files from S3 at once. ProcessPoolExecutor uses multiprocessing underneath, which defaults to using fork() on Unix. Using fork() is dangerous and prone to deadlocks: https://codewithoutrules.com/2018/09/04/python-multiprocessing/ This is a possible source of observed deadlocks during calls to Dataset.records. Using threads should not be a performance regression since the operation we're parallelizing over is network-bound, not CPU-bound, so there should not be much contention for the GIL.	2018-11-28 12:53:14 -05:00
Jeff Klukas	fb68074459	Get test suite working with latest moto	2018-11-06 09:18:57 -05:00
Jeff Klukas	5f7c4fe9c2	Small fixes for docs-deploy CI job	2018-11-06 09:18:57 -05:00
Jeff Klukas	679b89bd26	Fix some errors in API docs and link to DTMO	2018-11-02 11:14:58 -04:00
Jeff Klukas	3b50362eb7	Test docs and deploy to gh-pages branch We should definitely be testing docs as part of CI to make sure they build, which is addressed here. But this change also explores what it could look like to publish docs to Github Pages rather than ReadTheDocs, so that we can avoid the additional developer friction of understanding that service and maintaining user permissions there. The gh-pages docs are live at: http://mozilla.github.io/python_moztelemetry/ A few features I notice missing that ReadTheDocs provides: - hosting multiple versions of the docs, though we don't look to be using this - download links for PDF, HTML, and Epub - "Edit on GitHub" links; the gh-pages rendered version links to the content on the gh-pages branch rather than on master The above features are indeed nice to have. RTD is also a fairly python-specific tool, so if we do value hosting API docs for our projects, the technique here is a bit more transferable. This PR is mostly intended to provoke discussion. I'm totally fine if we decide to close this.	2018-11-02 09:18:51 -04:00
William Lachance	e25b34f884	Use python3 for readthedocs (#238 )	2018-11-01 10:30:20 -04:00
Jeff Klukas	197684236f	Bug 1341566 Regenerate protobuf classes Also adds documentation on how to regenerate the classes in the future.	2018-10-31 10:32:22 -04:00
Mark Reid	79db9acf9c	Update docstring to clarify SparkContext vs. SparkSession (#236 ) * Update docstring to clarify SparkContext vs. SparkSession * Ignore new pycodestyle W504 rule It's in pycodestyle's default ignore list since it's mutually exclusive with the existing W503 rule.	2018-10-31 10:01:42 -03:00
Jeff Klukas	dc66033b18	Pass expected argument to --cov `./bin/test tests/heka/` was running all tests rather than just those in the target directory because `--cov` was eating the directory, assuming it was the argument of where we should do code coverage.	2018-09-20 16:49:12 -04:00
Jeff Klukas	4e4a94675f	Pin boto3 version for tests due to moto incompatibility	2018-09-14 10:13:40 -04:00
Jeff Klukas	9075ec4775	Use a named docker volume for tox cache This should avoid the problem of files being created with root ownership on Linux hosts	2018-09-14 10:13:40 -04:00
Frank Bertsch	61fddfd863	Increase pandas version (#225 ) Spark 2.3 requires Pandas 0.19.2 for some operations. See bug 1484715	2018-08-24 14:05:28 -05:00
Erin	6323d1cadf	Bug 1429902 - Dataset.dataframe method (#227 ) Allow loading a Dataset that uses `select` directly into a Dataframe	2018-08-22 13:20:43 -07:00
Jeff Klukas	939ed4de26	Deploy a universal wheel for both Python 2 and Python 3	2018-08-17 15:42:05 -04:00
Anthony Miyaguchi	915b964fc7	Remove python bytecode in source directories	2018-08-17 14:19:28 -04:00
Jeff Klukas	8acb376e37	Copy source into image rather than mount in container @acmiyaguchi reported that with the source mounted into the container, cache files were written to the local file system that then couldn't be removed without sudo on an Ubuntu host. This change should make sure all cache files are written inside the container so they don't hit the local filesystem.	2018-08-17 14:19:28 -04:00
Jeff Klukas	9bf1383ec4	Document a bit about testing outside docker	2018-08-17 14:19:28 -04:00
Jeff Klukas	4d6280c04f	Bug 1477808 Move from TravisCI to CircleCI 2.0	2018-08-17 14:19:28 -04:00
Jeff Klukas	c9bc94dcb2	Add Dataset max_concurrency parameter On CircleCI's docker infrastructure, cpu_count returns 32, even though the container is limited to 2 virtual CPUs; this caused a high number of spawned processes that caused test timeouts. We set max_concurrency low for tests so they can complete quickly on CircleCI.	2018-08-17 14:19:28 -04:00
Jeff Klukas	d4c8319878	Explicitly mark spark_context fixture as session-scoped We also alter some quotes solely for consistency.	2018-08-17 14:19:28 -04:00
Jeff Klukas	3a8a6fbf07	Fix some nondeterministic tests Sorting was not always stable when collecting dataframes.	2018-08-17 14:19:28 -04:00
Anthony Miyaguchi	7436a76a83	Reorganize `Dataset.record` for clarity (#222 )	2018-08-16 09:40:44 -07:00
Erin	410198b1f7	Bug 1463877 - partition strategy (#217 )	2018-08-06 11:17:20 -07:00
Daniel Thorn	4148c5f64d	Fix .travis.yml to only deploy from python 3.6 (#214 )	2018-07-12 12:09:58 -07:00
Daniel Thorn	a38e07ebd3	Add support for python3 (#208 ) and make python3 default in docker	2018-07-12 09:39:45 -07:00
pyup-bot	51edf77b7c	Update sphinx-rtd-theme from 0.2.4 to 0.4.0	2018-06-08 11:36:00 -04:00
William Lachance	5175938186	Bug 1463885 - Disrecommend get_one_ping_per_client (#212 ) * Update docs to more clearly mark deprecated methods as... deprecated * Bug 1463885 - Disrecommend get_one_ping_per_client even more	2018-06-08 09:43:18 -03:00
Anthony Miyaguchi	8d3813c321	Drop invalid content fields	2018-05-25 10:43:11 -07:00
Anthony Miyaguchi	543ada4266	Add test for string decoding; fix unreachable code	2018-05-25 10:43:11 -07:00
Anthony Miyaguchi	3e48c04a51	Bug 1447851 - Add content decoding for landfill Documents from landfill will be decoded directly into their string representation. The logic for _parse_heka_record is generally unnecessary because fields are not extracted when dumped to landfill.	2018-05-25 10:43:11 -07:00
Wesley Dawson	238d9cbb71	WIP Add support for landfill messages	2018-05-25 10:43:11 -07:00
William Lachance	2a1765fbfc	Remove hbase reference in docs (#186 )	2018-04-27 14:58:40 -05:00
Frank Bertsch	e99644f41a	Change deploy acct to frank (#203 )	2018-04-02 13:19:20 -07:00
Daniel Thorn	d9c90375e4	Port client sampling to python (#202 )	2018-04-02 10:09:33 -07:00
Frank Bertsch	215f9bef1c	Remove aurora scalar definitions (#200 )	2018-03-08 09:27:24 -06:00
Rob Hudson	d77eb9c121	Ignore current bare except statements These will need to be fixed at some point but shouldn't block this change.	2018-02-12 11:24:34 -08:00
Rob Hudson	4d74d853ce	Bug 1376905 - Run flake8 when testing locally	2018-02-12 11:24:34 -08:00
Alessio Placitelli	069a938c47	Use parse_scalars.py instead of custom code (#194 ) * Use parse_scalars.py instead of custom code This additionally removes the REQUIRED_FIELDS and OPTIONAL_FIELDS dictionaries: these checks would be performed by the parse_scalars.py library with \|strict_type_checks=True\|. However, for server side computation, we're usually disabling this to be backward compatible with older registry formats. * Make the updater script refresh all the dependencies	2018-02-09 14:16:13 +01:00
Teon L Brooks	fe22cc096a	FIX typo	2018-02-07 10:20:13 -05:00
Anthony Miyaguchi	a9f74c76cc	Add test for extracted fields that are nested (#189 ) Flattened fields are remapped into the proper place in the payload. Add a test for nested fields.	2018-01-03 14:31:03 -08:00
Ryan Leake	88588aba1c	Bug 1419761 - Rename "histogram_tools.py" to "parse_histograms.py" (#188 ) * Bug 1419761 - Rename "histogram_tools.py" to "parse_histograms.py" Rename histogram_tools and update_histogram_tools Replace instances of "histogram_tools" with "parse_histograms" Fixes bug 1419761 * Bug 1419761 - Rename "histogram_tools.py" to "parse_histograms.py" - Retain update_parse_histograms permissions - Add note of file name change to README * Change file permissions of update_parse_histograms	2017-11-24 14:41:08 -04:00
Daniel Thorn	df30af85f8	Bug 1414582 - Print warning for "sample" (#187 ) Print a warning when using the "sample" functionality of Dataset Fixes Bug 1414582	2017-11-10 17:19:03 -04:00

1 2 3 4 5 ...

492 Коммитов Все ветки Поиск

492 Коммитов

Все ветки