This commit is contained in:
d1ana 2012-03-29 12:44:40 -04:00
Родитель ad51a71ba6
Коммит 5e7165803d
2 изменённых файлов: 16 добавлений и 6 удалений

Просмотреть файл

@ -46,12 +46,12 @@ Make sure `disco <http://discoproject.org/>`_ is running::
diana@ubuntu:~$ disco start
Master ubuntu:8989 started
Toss the input data into DDFS::
Place the input data in DDFS::
diana@ubuntu:~$ ddfs chunk example:chunk:users ./data.txt
created: disco://localhost/ddfs/vol0/blob/99/data_txt-0$533-406a9-e50
Verify that the data is in ddfs::
Verify that the data is in DDFS::
diana@ubuntu:~$ ddfs xcat example:chunk:users | head -2
{"first":"Homer", "last":"Simpson"}
@ -120,7 +120,17 @@ the next.
**Reduce**
Example data transition during the **reduce** step:
The reduce step of an Inferno map/reduce job is responsible for summarizing
the results of your map/reduce query.
Inferno's default **reduce_function** is the **keyset_reduce**. It will sum
the value parts yielded by the map step, grouped by the key parts.
In this example, we're only summing one value: the ``count``. You can
define and sum many value parts, as you'll see :doc:`here </election>` in
the next example.
Example data transition during the **reduce** step:
.. image:: reduce.png
:height: 600px
@ -135,7 +145,7 @@ the next.
defaults to the **keyset_result** processor which simply uses a CSV writer
to print the results from the reduce step to standard out.
Other common ``result_processor`` use cases include: populating a cache,
Other common result processor use cases include: populating a cache,
persisting to a database, writing back to
`DDFS <http://discoproject.org/doc/howto/ddfs.html>`_ or
`DiscoDB <http://discoproject.org/doc/contrib/discodb/discodb.html>`_, etc.

Просмотреть файл

@ -24,12 +24,12 @@ The 2012 presidential campaign finance data (from the `FEC <http://www.fec.gov/d
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",50...
C00410118,"P20002978","Bachmann, Michelle","BLEVINS, DARONDA","PIGGOTT","AR","724548253","NONE","RETIRED",250...
Toss the input data into `disco's distributed filesystem <http://discoproject.org/doc/howto/ddfs.html>`_ (ddfs)::
Place the input data in `disco's distributed filesystem <http://discoproject.org/doc/howto/ddfs.html>`_ (DDFS)::
diana@ubuntu:~$ ddfs chunk gov:chunk:presidential_campaign_finance:2012-03-19 ./P00000001-ALL.txt
created: disco://localhost/ddfs/vol0/blob/1c/P00000001-ALL_txt-0$533-86a6d-ec842
Verify that the data is in ddfs::
Verify that the data is in DDFS::
diana@ubuntu:~$ ddfs xcat gov:chunk:presidential_campaign_finance:2012-03-19 | head -3
C00410118,"P20002978","Bachmann, Michelle","HARVEY, WILLIAM","MOBILE","AL","366010290","RETIRED","RETIRED",250...