зеркало из https://github.com/mozilla/inferno.git
This commit is contained in:
Родитель
6707f2c745
Коммит
d935982359
|
@ -1,7 +1,7 @@
|
||||||
Example 1 - Count Last Names
|
Example 1 - Count Last Names
|
||||||
============================
|
============================
|
||||||
|
|
||||||
The canonical map/reduce example: count the occurrences of words in a
|
The canonical map/reduce example: **count** the occurrences of words in a
|
||||||
document. In this case, we'll count the occurrences of last names in a data
|
document. In this case, we'll count the occurrences of last names in a data
|
||||||
file containing lines of json.
|
file containing lines of json.
|
||||||
|
|
||||||
|
@ -41,7 +41,7 @@ In this case, we'll be tagging our data file as **example:chunk:users**.
|
||||||
:scale: 75 %
|
:scale: 75 %
|
||||||
:alt: tag_name -> [blob1, blob2, blob3]
|
:alt: tag_name -> [blob1, blob2, blob3]
|
||||||
|
|
||||||
Make sure `disco <http://discoproject.org/>`_ is running::
|
Make sure `Disco <http://discoproject.org/>`_ is running::
|
||||||
|
|
||||||
diana@ubuntu:~$ disco start
|
diana@ubuntu:~$ disco start
|
||||||
Master ubuntu:8989 started
|
Master ubuntu:8989 started
|
||||||
|
@ -77,9 +77,9 @@ the next.
|
||||||
The input step of an Inferno map/reduce job is responsible for parsing and
|
The input step of an Inferno map/reduce job is responsible for parsing and
|
||||||
readying the input data for the map step.
|
readying the input data for the map step.
|
||||||
|
|
||||||
If you're using Inferno's built in keyset map/reduce functionality, this
|
If you're using Inferno's built in **keyset** map/reduce functionality,
|
||||||
step mostly amounts to transforming your CSV or JSON input into python
|
this step mostly amounts to transforming your CSV or JSON input into
|
||||||
dictionaries.
|
python dictionaries.
|
||||||
|
|
||||||
The default Inferno input reader is **chunk_csv_keyset_stream**, which is
|
The default Inferno input reader is **chunk_csv_keyset_stream**, which is
|
||||||
intended for CSV data that was placed in DDFS using the ``ddfs chunk``
|
intended for CSV data that was placed in DDFS using the ``ddfs chunk``
|
||||||
|
@ -89,6 +89,18 @@ the next.
|
||||||
**map_input_stream** to use the **chunk_json_keyset_stream** reader in
|
**map_input_stream** to use the **chunk_json_keyset_stream** reader in
|
||||||
your Inferno rule instead.
|
your Inferno rule instead.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
:emphasize-lines: 3,4
|
||||||
|
|
||||||
|
InfernoRule(
|
||||||
|
name='last_names_json',
|
||||||
|
source_tags=['example:chunk:users'],
|
||||||
|
map_input_stream=chunk_json_keyset_stream,
|
||||||
|
parts_preprocess=[count],
|
||||||
|
key_parts=['last'],
|
||||||
|
value_parts=['count'],
|
||||||
|
)
|
||||||
|
|
||||||
Example data transition during the **input** step:
|
Example data transition during the **input** step:
|
||||||
|
|
||||||
.. image:: input.png
|
.. image:: input.png
|
||||||
|
@ -109,6 +121,18 @@ the next.
|
||||||
relevant key and value parts by declaring **key_parts** and **value_parts**
|
relevant key and value parts by declaring **key_parts** and **value_parts**
|
||||||
in your Inferno rule.
|
in your Inferno rule.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
:emphasize-lines: 6,7
|
||||||
|
|
||||||
|
InfernoRule(
|
||||||
|
name='last_names_json',
|
||||||
|
source_tags=['example:chunk:users'],
|
||||||
|
map_input_stream=chunk_json_keyset_stream,
|
||||||
|
parts_preprocess=[count],
|
||||||
|
key_parts=['last'],
|
||||||
|
value_parts=['count'],
|
||||||
|
)
|
||||||
|
|
||||||
Example data transition during the **map** step:
|
Example data transition during the **map** step:
|
||||||
|
|
||||||
.. image:: map.png
|
.. image:: map.png
|
||||||
|
@ -126,7 +150,7 @@ the next.
|
||||||
Inferno's default **reduce_function** is the **keyset_reduce**. It will sum
|
Inferno's default **reduce_function** is the **keyset_reduce**. It will sum
|
||||||
the value parts yielded by the map step, grouped by the key parts.
|
the value parts yielded by the map step, grouped by the key parts.
|
||||||
|
|
||||||
In this example, we're only summing one value: the ``count``. You can
|
In this example, we're only summing one value (the ``count``). You can
|
||||||
define and sum many value parts, as you'll see :doc:`here </election>` in
|
define and sum many value parts, as you'll see :doc:`here </election>` in
|
||||||
the next example.
|
the next example.
|
||||||
|
|
||||||
|
@ -159,10 +183,10 @@ the next.
|
||||||
:scale: 60 %
|
:scale: 60 %
|
||||||
:alt: reduce -> output
|
:alt: reduce -> output
|
||||||
|
|
||||||
Example Rule
|
Inferno Rule
|
||||||
------------
|
------------
|
||||||
|
|
||||||
The inferno map/reduce rule (inferno/example_rules/names.py)::
|
The Inferno map/reduce rule (``inferno/example_rules/names.py``)::
|
||||||
|
|
||||||
from inferno.lib.rule import chunk_json_keyset_stream
|
from inferno.lib.rule import chunk_json_keyset_stream
|
||||||
from inferno.lib.rule import InfernoRule
|
from inferno.lib.rule import InfernoRule
|
||||||
|
|
|
@ -4,14 +4,14 @@ Example 2 - Campaign Finance
|
||||||
Rule
|
Rule
|
||||||
----
|
----
|
||||||
|
|
||||||
The inferno map/reduce rule (inferno/example_rules/election.py):
|
The Inferno map/reduce rule (``inferno/example_rules/election.py``):
|
||||||
|
|
||||||
.. literalinclude:: ../inferno/example_rules/election.py
|
.. literalinclude:: ../inferno/example_rules/election.py
|
||||||
|
|
||||||
Input
|
Input
|
||||||
-----
|
-----
|
||||||
|
|
||||||
Make sure `disco <http://discoproject.org/>`_ is running::
|
Make sure `Disco <http://discoproject.org/>`_ is running::
|
||||||
|
|
||||||
diana@ubuntu:~$ disco start
|
diana@ubuntu:~$ disco start
|
||||||
Master ubuntu:8989 started
|
Master ubuntu:8989 started
|
||||||
|
@ -29,7 +29,7 @@ Place the input data in `disco's distributed filesystem <http://discoproject.org
|
||||||
diana@ubuntu:~$ ddfs chunk gov:chunk:presidential_campaign_finance:2012-03-19 ./P00000001-ALL.txt
|
diana@ubuntu:~$ ddfs chunk gov:chunk:presidential_campaign_finance:2012-03-19 ./P00000001-ALL.txt
|
||||||
created: disco://localhost/ddfs/vol0/blob/1c/P00000001-ALL_txt-0$533-86a6d-ec842
|
created: disco://localhost/ddfs/vol0/blob/1c/P00000001-ALL_txt-0$533-86a6d-ec842
|
||||||
|
|
||||||
Verify that the data is in DDFS::
|
Verify that the data is in DDFS'::
|
||||||
|
|
||||||
diana@ubuntu:~$ ddfs xcat gov:chunk:presidential_campaign_finance:2012-03-19 | head -3
|
diana@ubuntu:~$ ddfs xcat gov:chunk:presidential_campaign_finance:2012-03-19 | head -3
|
||||||
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",250...
|
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",250...
|
||||||
|
|
Загрузка…
Ссылка в новой задаче