azure-data-lake-store-python

Microsoft Azure Data Lake Store Filesystem Library for Python

Перейти к файлу

chdevala 122936d574 Update History.rst releasing version 0.0.17		2017-09-21 17:51:01 -07:00
.github	Address PR feedback.	2017-07-06 14:00:13 -07:00
azure	Merge pull request #197 from chdevala/master	2017-09-21 17:47:40 -07:00
docs	Bug fixes	2017-01-17 15:40:55 -08:00
samples	Merge from dev.	2017-05-30 19:17:21 -07:00
tests	Merge pull request #197 from chdevala/master	2017-09-21 17:47:40 -07:00
.coveragerc	Merge First release into master.	2016-11-21 15:34:40 -08:00
.gitignore	added .idea directory to gitignore	2017-07-27 13:50:08 -07:00
.travis.yml	fix chmod	2017-09-05 18:13:26 -07:00
GUIDANCE.md	Revising README.rst	2017-08-01 18:46:40 -07:00
HISTORY.rst	Update History.rst	2017-09-21 17:51:01 -07:00
License.txt	Merge First release into master.	2016-11-21 15:34:40 -08:00
MANIFEST.in	New packaging system	2017-04-19 13:30:02 -07:00
Milestones.md	Merge First release into master.	2016-11-21 15:34:40 -08:00
README.rst	Merge pull request #196 from chdevala/master	2017-09-21 17:17:54 -07:00
appveyor.yml	Enable auto pypi deployment	2017-01-18 12:34:50 -08:00
azure-data-lake-store-python.pyproj	Merge Version 0.0.5 from dev	2017-03-03 13:09:26 -08:00
azure-data-lake-store-python.pyproj.user	Update to version 0.0.6	2017-03-22 14:07:59 -07:00
azure-data-lake-store-python.sln	Merge First release into master.	2016-11-21 15:34:40 -08:00
azure_bdist_wheel.py	New packaging system	2017-04-19 13:30:02 -07:00
dev_requirements.txt	Merge First release into master.	2016-11-21 15:34:40 -08:00
setup.cfg	New packaging system	2017-04-19 13:30:02 -07:00
setup.py	New packaging system	2017-04-19 13:30:02 -07:00

README.rst

Microsoft Azure Data Lake Store Filesystem Library for Python
=============================================================

.. image:: https://travis-ci.org/Azure/azure-data-lake-store-python.svg?branch=dev
    :target: https://travis-ci.org/Azure/azure-data-lake-store-python
.. image:: https://coveralls.io/repos/github/Azure/azure-data-lake-store-python/badge.svg?branch=master
    :target: https://coveralls.io/github/Azure/azure-data-lake-store-python?branch=master

This project is the Python filesystem library for Azure Data Lake Store.

INSTALLATION
============

To install from source instead of pip (for local testing and development):

.. code-block:: bash

    > pip install -r dev_requirements.txt
    > python setup.py develop

Usage: Sample Code
==================

To play with the code, here is a starting point:

.. code-block:: python

    from azure.datalake.store import core, lib, multithread
    token = lib.auth(tenant_id, username, password)
    adl = core.AzureDLFileSystem(token, store_name=store_name)

    # typical operations
    adl.ls('')
    adl.ls('tmp/', detail=True)
    adl.ls('tmp/', detail=True, invalidate_cache=True)
    adl.cat('littlefile')
    adl.head('gdelt20150827.csv')

    # file-like object
    with adl.open('gdelt20150827.csv', blocksize=2**20) as f:
        print(f.readline())
        print(f.readline())
        print(f.readline())
        # could have passed f to any function requiring a file object:
        # pandas.read_csv(f)

    with adl.open('anewfile', 'wb') as f:
        # data is written on flush/close, or when buffer is bigger than
        # blocksize
        f.write(b'important data')

    adl.du('anewfile')

    # recursively download the whole directory tree with 5 threads and
    # 16MB chunks
    multithread.ADLDownloader(adl, "", 'my_temp_dir', 5, 2**24)

Progress can be tracked using a callback function in the form `track(current, total)`
When passed, this will keep track of transferred bytes and be called on each complete chunk.

Here's an example using the Azure CLI progress controller as the `progress_callback`:

.. code-block:: python

    from cli.core.application import APPLICATION

    def _update_progress(current, total):
        hook = APPLICATION.get_progress_controller(det=True)
        hook.add(message='Alive', value=current, total_val=total)
        if total == current:
            hook.end()

    ...
    ADLUploader(client, destination_path, source_path, thread_count, overwrite=overwrite,
            chunksize=chunk_size,
            buffersize=buffer_size,
            blocksize=block_size,
            progress_callback=_update_progress)

This will output a progress bar to the stdout:

.. code-block:: bash

    Alive[#########################                                       ]  40.0881%
    
    Finished[#############################################################]  100.0000%

Usage: Command Line Sample
==========================

To interact with the API at a higher-level, you can use the provided
command-line interface in "samples/cli.py". You will need to set
the appropriate environment variables as described above to connect to the
Azure Data Lake Store. Below is a simple sample, with more details beyond.


.. code-block:: bash

    python samples\cli.py ls -l

Execute the program without arguments to access documentation.

To start the CLI in interactive mode, run "python samples/cli.py"
and then type "help" to see all available commands (similiar to Unix utilities):

.. code-block:: bash

    > python samples/cli.py
    azure> help

    Documented commands (type help <topic>):
    ========================================
    cat    chmod  close  du      get   help  ls     mv   quit  rmdir  touch
    chgrp  chown  df     exists  head  info  mkdir  put  rm    tail

    azure>


While still in interactive mode, you can run "ls -l" to list the entries in the
home directory ("help ls" will show the command's usage details). If you're not
familiar with the Unix/Linux "ls" command, the columns represent 1) permissions,
2) file owner, 3) file group, 4) file size, 5-7) file's modification time, and
8) file name.

.. code-block:: bash

    > python samples/cli.py
    azure> ls -l
    drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
    azure> ls -l --human-readable
    drwxrwx--- 0123abcd 0123abcd   0B Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1M Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd  36B Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd   0B Aug 03 13:46 tmp
    azure>


To download a remote file, run "get remote-file [local-file]". The second
argument, "local-file", is optional. If not provided, the local file will be
named after the remote file minus the directory path.

.. code-block:: bash

    > python samples/cli.py
    azure> ls -l
    drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
    azure> get xyz.csv
    2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
    2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
    2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
    2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
    azure>


It is also possible to run in command-line mode, allowing any available command
to be executed separately without remaining in the interpreter.

For example, listing the entries in the home directory:

.. code-block:: bash

    > python samples/cli.py ls -l
    drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
    >


Also, downloading a remote file:

.. code-block:: bash

    > python samples/cli.py get xyz.csv
    2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
    2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
    2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
    2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
    >

Tests
=====

For detailed documentation about our test framework, please visit the 
`tests folder <https://github.com/ro-joowan/azure-data-lake-store-python/tree/master/tests>`__.

Need Help?
==========

Be sure to check out the Microsoft Azure `Developer Forums on Stack Overflow <http://go.microsoft.com/fwlink/?LinkId=234489>`__
if you have trouble with the provided code. Most questions are tagged `azure and python <https://stackoverflow.com/questions/tagged/azure+python>`__.


Contribute Code or Provide Feedback
===================================

If you would like to become an active contributor to this project please
follow the instructions provided in `Microsoft Azure Projects Contribution Guidelines <http://azure.github.io/guidelines/>`__. 
Furthermore, check out `GUIDANCE.md <https://github.com/Azure/azure-data-lake-store-python/blob/master/GUIDANCE.md>`__ 
for specific information related to this project.

If you encounter any bugs with the library please file an issue in the
`Issues <https://github.com/Azure/azure-data-lake-store-python/issues>`__
section of the project.


Code of Conduct
===============
This project has adopted the `Microsoft Open Source Code of Conduct <https://opensource.microsoft.com/codeofconduct/>`__. 
For more information see the `Code of Conduct FAQ <https://opensource.microsoft.com/codeofconduct/faq/>`__ or contact 
`opencode@microsoft.com <mailto:opencode@microsoft.com>`__ with any additional questions or comments.