This commit is contained in:
d1ana 2012-03-29 11:31:47 -04:00
Родитель 9fc84de183
Коммит 06e23b2c9c
1 изменённых файлов: 9 добавлений и 9 удалений

Просмотреть файл

@ -77,17 +77,17 @@ the next.
The input step of an Inferno map/reduce job is responsible for parsing and
readying the input data for the map step.
The default Inferno input reader is **chunk_csv_keyset_stream**, meaning
that it can automatically stream lines from a CSV file that is placed in
DDFS using the ``ddfs chunk`` command.
If you're using Inferno's built in keyset map/reduce functionality, this
step mostly amounts to transforming your CSV or JSON input into python
dictionaries.
If the input is lines of JSON, you could set the **map_input_stream** to
use the **chunk_json_keyset_stream** reader instead.
The default Inferno input reader is **chunk_csv_keyset_stream**, which is
intended for CSV data that was placed in DDFS using the ``ddfs chunk``
command.
Either way, if you're using Inferno's built in keyset map/reduce
functionality, this step mostly amounts to transforming your CSV/JSON
input, etc into python dictionaries for further processing in the
**map** step.
If the input data is lines of JSON, you would set the
**map_input_stream** to use the **chunk_json_keyset_stream** reader
instead.
Example data transition during the **input** step::