add example test_settings so I do not have to dig through old machines next time

2016-06-05 00:47:08 -04:00 · 2016-06-05 00:47:08 -04:00 · c595fc75f8
--- a/.editorconfig
+++ b/.editorconfig
@ -0,0 +1,36 @@
+# EditorConfig helps developers define and maintain consistent
+# coding styles between different editors and IDEs
+# editorconfig.org
+
+root = true
+
+[*]
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+indent_style = space
+indent_size = 4
+
+[*.js]
+indent_style = tab
+indent_size = 4
+
+[*.json]
+indent_style = tab
+indent_size = 4
+
+[*.hbs]
+indent_style = space
+indent_size = 4
+
+[*.css]
+indent_style = tab
+indent_size = 4
+
+[*.html]
+indent_style = tab
+indent_size = 4
+
+[*.{diff,md}]
+trim_trailing_whitespace = false
--- a/README.md
+++ b/README.md
@ -29,7 +29,7 @@ then install requirements:
    cd Bugzilla-ETL
    pip install -r requirements.txt

-**WARNING: ```pip install Bugzilla-ETL``` does not work** - I have been unable to get Pip to install resource files consistently across platforms and Python versions.
+**WARNING: `pip install Bugzilla-ETL` does not work** - I have been unable to get Pip to install resource files consistently across platforms and Python versions.

 Installation with PyPy
 ----------------------
@ -49,45 +49,45 @@ Despite my Windows example, the equivalent must be done in Linux.
 Setup
 -----

-You must prepare a ```settings.json``` file to reference the resources,
+You must prepare a `settings.json` file to reference the resources,
 and its filename must be provided as an argument in the command line.
 Examples of settings files can be found in [resources/settings](resources/settings)

 ### Inter-Run State ###

 Bugzilla-ETL keeps local run state in the form of two files:
-```first_run_time``` and ```last_run_time```.  These are both parameters
-in the ``settings.json``` file.
+`first_run_time` and `last_run_time`.  These are both parameters
+in the ``settings.json` file.

-  * ```first_run_time``` is written only if it does not exist, and triggers a full ETL refresh.  Delete this file if you want to create a new ES index and start ETL from the beginning.
-  * ```last_run_time``` is recorded whenever there has been a successful ETL.  This file will not exist until the initial full ETL has completed successfully.  Deleteing this file should have no net effect, other than making the program work harder then it should.
+  * `first_run_time` is written only if it does not exist, and triggers a full ETL refresh.  Delete this file if you want to create a new ES index and start ETL from the beginning.
+  * `last_run_time` is recorded whenever there has been a successful ETL.  This file will not exist until the initial full ETL has completed successfully.  Deleteing this file should have no net effect, other than making the program work harder then it should.

 ### Alias Analysis ###

 You will require an alias file that matches the various email addresses that users have over time.  This analysis is neccesary for proper CC list history and patch review history.  [More on alias analysis](https://wiki.mozilla.org/Auto-tools/Projects/PublicES#Alias_Analysis).

-  * Make an ```alias_analysis_settings.json``` file.  Which can be the same main ETL settings.json file.
-  * The ```param.alias_file.key``` can be ```null```, or set to a AES256 key of your choice.
+  * Make an `alias_analysis_settings.json` file.  Which can be the same main ETL settings.json file.
+  * The `param.alias_file.key` can be `null`, or set to a AES256 key of your choice.
  * Run [alias_analysis.py](https://github.com/klahnakoski/Bugzilla-ETL/blob/master/resources/scripts/alias_analysis.bat)


 Running bz_etl.py
 ------------------

-Asuming your ```settings.json``` file is in ```~/Bugzilla_ETL```:
+Asuming your `settings.json` file is in `~/Bugzilla_ETL`:

    cd ~/Bugzilla_ETL

    pypy bzETL\bz_etl.py --settings=settings.json

-Use ```--help``` for more options, and see [example command line script](resources/scripts/bz_etl.bat)
+Use `--help` for more options, and see [example command line script](resources/scripts/bz_etl.bat)

 Got it working?
 ---------------

 The initial ETL will take over two hours.  If you want something
-quicker to confirm your configuration is correct, use ```--reset
--quick``` arguments on the command line.   This will limit ETL
+quicker to confirm your configuration is correct, use `--reset
+--quick` arguments on the command line.   This will limit ETL
 to the first 1000, and last 1000 bugs.

    cd ~/Bugzilla_ETL
@ -97,7 +97,7 @@ Using Cron
 ----------

 Bugzilla-ETL is meant to be triggered by cron; usually every 10 minutes.
-Bugzilla-ETL limits itself to only one instance *per ```settings.json```
+Bugzilla-ETL limits itself to only one instance *per `settings.json`
 file*:  That way, if more then one instance is accidentally run, the
 subsequent instances will do no work and shutdown cleanly.

@ -109,8 +109,8 @@ The Git clone will include test code.  You can run those tests, but you must...
  * Have MySQL installed (no Bugzilla schema required)
  * Have timezone database installed ([instructions](./tests/resources/mySQL/README.md))
  * Have an ElasticSearch (v 0.90) cluster to hold the test results
-  * A complete ```test_settings.json``` file to point to the resources ([example](./resources/settings/test_settings.json))
-  * Use pypy for 4x the speed: ```pypy .\tests\test_etl.py --settings=test_settings.json```
+  * A complete `test_settings.json` file to point to the resources ([example](./resources/settings/test_settings.json))
+  * Use pypy for 4x the speed: `pypy .\tests\test_etl.py --settings=test_settings.json`

 Upgrades
 --------
@ -124,8 +124,8 @@ There may be enhancements from time to time.  To get them
 After upgrading the code, you may want to trigger a full ETL.  To do this,
 you may either

-1.  run ```bz_etl.py``` with the ```--reset--``` flag directly, or
-2.  remove the ```first_run_time``` file (and the next cron event will tigger a full ETL)
+1.  run `bz_etl.py` with the `--reset` flag directly, or
+2.  remove the `first_run_time` file (and the next cron event will trigger a full ETL)


 More on ElasticSearch
@ -133,5 +133,5 @@ More on ElasticSearch

 If you are new to ElasticSearch, I recommend using [ElasticSearch Head](https://github.com/mobz/elasticsearch-head)
 for getting cluster status, current schema definitions, viewing individual
-records, and more.  Clone it off of GitHub, and open the ```index.html``` file
+records, and more.  Clone it off of GitHub, and open the `index.html` file
 from in your browser.  Here are some alternate [instructions](http://mobz.github.io/elasticsearch-head/).
--- a/tests/resources/config/test_settings.json
+++ b/tests/resources/config/test_settings.json
@ -0,0 +1,164 @@
+{
+	"production_es": {
+		"description": "pointer to es with known good results",
+		"host": "http://elasticsearch7.metrics.scl3.mozilla.com",
+		"port": "9200",
+		"index": "bugs",
+		"type": "bug_version",
+		"debug": true
+	},
+	"public_bugs_reference": {
+		"description": "pointer to es with known good *public* results",
+        "filename": "./tests/resources/public_bugs_reference_es.json"
+	},
+	"public_comments_reference": {
+		"description": "pointer to es with known good public comments",
+        "filename": "./tests/resources/public_comments_reference_es.json"
+	},
+	"private_bugs_reference": {
+		"description": "pointer to es with known good results",
+        "filename": "./tests/resources/private_bugs_reference_es.json"
+	},
+	"private_comments_reference": {
+		"description": "pointer to es with known good private comments",
+		"filename": "./tests/resources/private_comments_reference_es.json"
+	},
+	"candidate": {
+		"description": "pointer to es with test results",
+		"filename": "./tests/results/test_results.json",
+		"host": "http://localhost",
+		"port": "9200",
+		"index": "test_bugs",
+		"type": "bug_version"
+	},
+	"fake": {
+		//FOR TESTING JSON CREATION, NO NEED FOR REAL ES
+		"bugs": {
+			"filename": "./tests/results/test_bugs.json"
+		},
+		"comments": {
+			"filename": "./tests/results/test_comments.json"
+		}
+	},
+	"real": {
+		//FOR TESTING INCREMENTAL ETL (AND GENERAL INTERACTION WITH A REAL ES)
+		"bugs": {
+			"host": "http://localhost",
+			"port": "9200",
+			"index": "test_bugs",
+			"type": "bug_version",
+			"schema_file": "./resources/json/bug_version.json",
+			"debug": true
+		},
+		"comments": {
+			"host": "http://localhost",
+			"port": "9200",
+			"index": "test_comments",
+			"type": "bug_version",
+			"schema_file": "./resources/json/bug_comments.json",
+			"debug": true
+		}
+	},
+	"param": {
+		"increment": 10000,
+		"bugs": [
+			384,
+			1108,
+			1045,
+			1046,
+			1157,
+			1877,
+			1865,
+			1869,
+			2586,
+			3140,
+			6810,
+			9622,
+			10575,
+			11040,
+			12911,
+			67742,
+			96421,
+			123203,
+			178960,
+			367518,
+			457765,
+			458397,
+			471427,
+			544327,
+			547727,
+			643420,
+			692436,
+			726635,
+			813650
+			// 1165765 VERY LONG short_desc
+			// 1007019 does not have bug_status, or component, or product
+			// 372836 (REVIEW FLAGS TEST)
+			// 13534 (REVIEW MOVES TO OTHER PERSON)
+			// 393845  added blocking1.9+ twice
+			// 671185 *many* review requests
+			// 937428 whitespace after comma in user story, complex diff
+			// 248970 another cutoff review request
+		],
+		"alias_increment": 1000000,
+		"alias_file": {
+			"path": "./resources/json/bugzilla_aliases.json",
+			"comment": "key is only meant to keep the aliases out of clear text.  Aliases are public as per https://www.mozilla.org/en-US/privacy/policies/websites/",
+   			"key": "Some+SHA512+key++++++++++++++++++++++++++++="
+		},
+		"temp_dir": "./tests/resources",
+		"errors": "./tests/results/errors",
+		"allow_private_bugs": true,
+		"last_run_time": "./tests/results/last_run_time.txt",
+		"first_run_time": "./tests/results/first_run_time.txt",
+        "look_back": 3600000 //1hour
+	},
+	"bugzilla": {
+		"filename": "./tests/resources/sql/small_bugzilla.sql",
+		"preamble": "from https://github.com/klahnakoski/Bugzilla-ETL",
+		"host": "localhost",
+		"port": 3306,
+		"schema": "test_bugzilla",
+		"expires_on": 1372867005000,
+		"debug": false,
+        "username": "username",
+        "password": "password"
+	},
+	"bugzilla2": {
+		"host": "klahnakoski-es.corp.tor1.mozilla.com",
+		"port": 3306,
+		"schema": "bugzilla2",
+		"expires_on": 1372867005000,
+		"debug": true,
+		"username": "username",
+        "password": "password"
+	},
+	"constants": {
+		"pyLibrary.env.http.default_headers": {
+			"Referer": "https://wiki.mozilla.org/BMO/ElasticSearch"
+		}
+	},
+	"debug": {
+		"profile": false,
+		"trace": false,
+		"log": [
+			{
+				"class": "logging.handlers.RotatingFileHandler",
+				"filename": "./tests/results/logs/test_etl.log",
+				"maxBytes": 10000000,
+				"backupCount": 200,
+				"encoding": "utf8"
+			},
+			{
+				"log_type": "stream",
+				"stream": "sys.stdout"
+			},
+			{
+				"log_type": "elasticsearch",
+				"host": "http://localhost",
+				"index": "debug",
+				"type": "bz_etl"
+			}
+		]
+	}
+}