зеркало из https://github.com/mozilla/inferno.git
This commit is contained in:
Родитель
d7b62b97b3
Коммит
9c9dceb5c9
|
@ -11,29 +11,29 @@ Input
|
|||
Here's our input data::
|
||||
|
||||
diana@ubuntu:~$ cat data.txt
|
||||
{"first_name":"Homer", "last_name":"Simpson"}
|
||||
{"first_name":"Manjula", "last_name":"Nahasapeemapetilon"}
|
||||
{"first_name":"Herbert", "last_name":"Powell"}
|
||||
{"first_name":"Ruth", "last_name":"Powell"}
|
||||
{"first_name":"Bart", "last_name":"Simpson"}
|
||||
{"first_name":"Apu", "last_name":"Nahasapeemapetilon"}
|
||||
{"first_name":"Marge", "last_name":"Simpson"}
|
||||
{"first_name":"Janey", "last_name":"Powell"}
|
||||
{"first_name":"Maggie", "last_name":"Simpson"}
|
||||
{"first_name":"Sanjay", "last_name":"Nahasapeemapetilon"}
|
||||
{"first_name":"Lisa", "last_name":"Simpson"}
|
||||
{"first_name":"Maggie", "last_name":"Términos"}
|
||||
{"first":"Homer", "last":"Simpson"}
|
||||
{"first":"Manjula", "last":"Nahasapeemapetilon"}
|
||||
{"first":"Herbert", "last":"Powell"}
|
||||
{"first":"Ruth", "last":"Powell"}
|
||||
{"first":"Bart", "last":"Simpson"}
|
||||
{"first":"Apu", "last":"Nahasapeemapetilon"}
|
||||
{"first":"Marge", "last":"Simpson"}
|
||||
{"first":"Janey", "last":"Powell"}
|
||||
{"first":"Maggie", "last":"Simpson"}
|
||||
{"first":"Sanjay", "last":"Nahasapeemapetilon"}
|
||||
{"first":"Lisa", "last":"Simpson"}
|
||||
{"first":"Maggie", "last":"Términos"}
|
||||
|
||||
DDFS
|
||||
----
|
||||
|
||||
The first step is to place this file in
|
||||
`Disco's Distributed Filesystem <http://discoproject.org/doc/howto/ddfs.html>`_ (DDFS).
|
||||
Once placed in DDFS, this file is referred to by Disco as a **blob**.
|
||||
Once placed in DDFS, a file is referred to by Disco as a **blob**.
|
||||
DDFS is a tag-based filesystem. Instead of organizing files into directories,
|
||||
you **tag** a collection of blobs with a **tag_name** for lookup later.
|
||||
|
||||
I this case, we'll be tagging our data file as **example:chunk:users**.
|
||||
In this case, we'll be tagging our data file as **example:chunk:users**.
|
||||
|
||||
.. image:: tag_blobs.png
|
||||
:height: 300px
|
||||
|
@ -54,8 +54,8 @@ Toss the input data into DDFS::
|
|||
Verify that the data is in ddfs::
|
||||
|
||||
diana@ubuntu:~$ ddfs xcat example:chunk:users | head -2
|
||||
{"first_name":"Homer", "last_name":"Simpson"}
|
||||
{"first_name":"Manjula", "last_name":"Nahasapeemapetilon"}
|
||||
{"first":"Homer", "last":"Simpson"}
|
||||
{"first":"Manjula", "last":"Nahasapeemapetilon"}
|
||||
|
||||
Inferno Rule
|
||||
------------
|
||||
|
@ -145,7 +145,7 @@ The inferno map/reduce rule (inferno/example_rules/names.py)::
|
|||
source_tags=['example:chunk:users'],
|
||||
map_input_stream=chunk_json_keyset_stream,
|
||||
parts_preprocess=[count],
|
||||
key_parts=['last_name'],
|
||||
key_parts=['last'],
|
||||
value_parts=['count'],
|
||||
),
|
||||
]
|
||||
|
@ -163,7 +163,7 @@ Run the last name counting map/reduce job::
|
|||
|
||||
The output::
|
||||
|
||||
last_name,count
|
||||
last,count
|
||||
Nahasapeemapetilon,3
|
||||
Powell,3
|
||||
Simpson,5
|
||||
|
|
|
@ -13,22 +13,23 @@ valid JSON, etc.
|
|||
**people.json**
|
||||
::
|
||||
|
||||
{"first_name":"Homer", "last_name":"Simpson"}
|
||||
{"first_name":"Manjula", "last_name":"Nahasapeemapetilon"}
|
||||
{"first_name":"Herbert", "last_name":"Powell"}
|
||||
{"first_name":"Ruth", "last_name":"Powell"}
|
||||
{"first_name":"Bart", "last_name":"Simpson"}
|
||||
{"first_name":"Apu", "last_name":"Nahasapeemapetilon"}
|
||||
{"first_name":"Marge", "last_name":"Simpson"}
|
||||
{"first_name":"Janey", "last_name":"Powell"}
|
||||
{"first_name":"Maggie", "last_name":"Simpson"}
|
||||
{"first_name":"Sanjay", "last_name":"Nahasapeemapetilon"}
|
||||
{"first_name":"Lisa", "last_name":"Simpson"}
|
||||
{"first_name":"Maggie", "last_name":"Términos"}
|
||||
{"first":"Homer", "last":"Simpson"}
|
||||
{"first":"Manjula", "last":"Nahasapeemapetilon"}
|
||||
{"first":"Herbert", "last":"Powell"}
|
||||
{"first":"Ruth", "last":"Powell"}
|
||||
{"first":"Bart", "last":"Simpson"}
|
||||
{"first":"Apu", "last":"Nahasapeemapetilon"}
|
||||
{"first":"Marge", "last":"Simpson"}
|
||||
{"first":"Janey", "last":"Powell"}
|
||||
{"first":"Maggie", "last":"Simpson"}
|
||||
{"first":"Sanjay", "last":"Nahasapeemapetilon"}
|
||||
{"first":"Lisa", "last":"Simpson"}
|
||||
{"first":"Maggie", "last":"Términos"}
|
||||
|
||||
**people.csv**
|
||||
::
|
||||
|
||||
first,last
|
||||
Homer,Simpson
|
||||
Manjula,Nahasapeemapetilon
|
||||
Herbert,Powell
|
||||
|
@ -81,7 +82,7 @@ Here's what a similar query using Inferno would look like:
|
|||
name='last_names_csv',
|
||||
source_tags=['example:chunk:users'],
|
||||
parts_preprocess=[count],
|
||||
key_parts=['last_name'],
|
||||
key_parts=['last'],
|
||||
value_parts=['count'],
|
||||
)
|
||||
|
||||
|
@ -89,7 +90,7 @@ Here's what a similar query using Inferno would look like:
|
|||
|
||||
diana@ubuntu:~$ inferno -i names.last_names_csv
|
||||
|
||||
last_name,count
|
||||
last,count
|
||||
Nahasapeemapetilon,3
|
||||
Powell,3
|
||||
Simpson,5
|
||||
|
|
Загрузка…
Ссылка в новой задаче