inferno/doc/election.rst

123 строки
5.0 KiB
ReStructuredText

Example 2 - Campaign Finance
============================
In this example we'll introduce a few more key Inferno concepts:
* Inferno rules with **multiple keysets**
* **field_transforms**: input data casting
* **parts_preprocess**: a pre-map hook
* **parts_postprocess**: a post-reduce hook
* **column_mappings**: rename columns post-reduce
Inferno Rule
------------
An Inferno rule to query the 2012 presidential campaign finance data.
(``inferno/example_rules/election.py``):
.. literalinclude:: ../inferno/example_rules/election.py
:emphasize-lines: 44-49, 58, 61, 66, 69
Input
-----
Make sure `Disco <http://discoproject.org/>`_ is running::
diana@ubuntu:~$ disco start
Master ubuntu:8989 started
Get the 2012 presidential campaign finance
`data <ftp://ftp.fec.gov/FEC/Presidential_Map/2012/P00000001/P00000001-ALL.zip>`_
(from the `FEC <http://www.fec.gov/disclosurep/PDownload.do>`_)::
diana@ubuntu:~$ head -4 P00000001-ALL.txt
cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt...
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",250...
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",50...
C00410118,"P20002978","Bachmann, Michelle","BLEVINS, DARONDA","PIGGOTT","AR","724548253","NONE","RETIRED",250...
Place the input data in `Disco's Distributed Filesystem <http://discoproject.org/doc/disco/howto/ddfs.html>`_ (DDFS)::
diana@ubuntu:~$ ddfs chunk gov:chunk:presidential_campaign_finance:2012-03-19 ./P00000001-ALL.txt
created: disco://localhost/ddfs/vol0/blob/1c/P00000001-ALL_txt-0$533-86a6d-ec842
Verify that the data is in DDFS::
diana@ubuntu:~$ ddfs xcat gov:chunk:presidential_campaign_finance:2012-03-19 | head -3
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",250...
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",50...
C00410118,"P20002978","Bachmann, Michelle","BLEVINS, DARONDA","PIGGOTT","AR","724548253","NONE","RETIRED",250...
Contributions by Candidate
--------------------------
Run the contributions by candidate job::
diana@ubuntu:~$ inferno -i election.presidential_2012.by_candidate
2012-03-19 Processing tags: ['gov:chunk:presidential_campaign_finance']
2012-03-19 Started job presidential_2012@533:87210:81a1b processing 1 blobs
2012-03-19 Done waiting for job presidential_2012@533:87210:81a1b
2012-03-19 Finished job presidential_2012@533:87210:81a1b
The output in CSV::
candidate,count,amount
gingrich newt,27740,9271750.98
obama barack,292400,81057578.81
paul ron,87697,15435762.37
romney mitt,58420,55427338.84
santorum rick,9382,3351439.54
The output as a table:
+----------------+---------+-----------------+
| Candidate | Count | Amount |
+================+=========+=================+
| Obama Barack | 292,400 | $ 81,057,578.81 |
+----------------+---------+-----------------+
| Romney Mitt | 58,420 | $ 55,427,338.84 |
+----------------+---------+-----------------+
| Paul Ron | 87,697 | $ 15,435,762.37 |
+----------------+---------+-----------------+
| Gingrich Newt | 27,740 | $ 9,271,750.98 |
+----------------+---------+-----------------+
| Santorum Rick | 9,382 | $ 3,351,439.54 |
+----------------+---------+-----------------+
Contributions by Occupation
---------------------------
Run the contributions by occupation job::
diana@ubuntu:~$ inferno -i election.presidential_2012.by_occupation > occupations.csv
2012-03-19 Processing tags: ['gov:chunk:presidential_campaign_finance']
2012-03-19 Started job presidential_2012@533:87782:c7c98 processing 1 blobs
2012-03-19 Done waiting for job presidential_2012@533:87782:c7c98
2012-03-19 Finished job presidential_2012@533:87782:c7c98
The output::
diana@ubuntu:~$ grep retired occupations.csv
retired,gingrich newt,8810,2279602.27
retired,obama barack,74465,15086766.92
retired,paul ron,9373,1800563.88
retired,romney mitt,12798,6483596.24
retired,santorum rick,1752,421952.98
The output as a table:
+------------+---------------+--------+-----------------+
| Occupation | Candidate | Count | Amount |
+============+===============+========+=================+
| Retired | Obama Barack | 74,465 | $ 15,086,766.92 |
+------------+---------------+--------+-----------------+
| Retired | Romney Mitt | 12,798 | $ 6,483,596.24 |
+------------+---------------+--------+-----------------+
| Retired | Gingrich Newt | 8,810 | $ 2,279,602.27 |
+------------+---------------+--------+-----------------+
| Retired | Paul Ron | 9,373 | $ 1,800,563.88 |
+------------+---------------+--------+-----------------+
| Retired | Santorum Rick | 1,752 | $ 421,952.98 |
+------------+---------------+--------+-----------------+