added additional credential provider

2019-07-29 12:55:13 +02:00 · 2019-07-29 12:55:13 +02:00 · d35c19732b
--- a/README.md
+++ b/README.md
@ -14,16 +14,16 @@ For more information see the [Code of Conduct FAQ](https://opensource.microsoft.
 contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

 # What is it?
-Workspace is here to manage all your connection string and datasets in a central location, a *resources.yaml* at the root of your git repository without any credentials. The provided library will enable you to retrieve corresponding credentials from multiple sources (e.g. Azure Key Vault, Amazon Key Management Service, Python KeyRing, environment variables, ...).
+_Workspace_ was created to manage all your authoring time service connection strings and *dataset* references in a *resources.yaml*
+usually located at the root of your git repository.
+The provided libraries aims to simplify access by automating authentication and natural integration with service specific SDKs.
+Futhermore we aim for tooling support (e.g. list storage accounts in VSCode). 

-Some motivation: considering real-life projects multiple personas (e.g. devs, data scientists, data engineers) are collaborating and sometimes roles overlap. Each persona has a set of tools that are tailored toward the role (e.g. VS Code to devs, AzureML Workspace/Azure Databricks for data scientists, ...). Today we use git to at least move source code artifiacts between them, but each toolset/environment has it's own notion of service and data connections (or more broadly resources). 
-And that's where this project comes in. We define a common location and semantic in a file assumed to be located in the root of your git repository called *resources.yaml*. One complication in the story are credentials, which we definitely don't want to put into our beloved git repository. 
-
-This project provides a set of tools in multiple languages (Python, JavaScript and C# to start with), which aims to offer parsing, credential and convenience support to the respective language users.
-
-Thus Python users will get easy access functions for data (e.g. from an Azure Storage Blob to a Pandas data frame) vs C# will get download support to enable unit test scenarios.
-
-As we go along we're actively working with toolset owners (e.g. VSCode extensions) to enable support for *resources.yaml*. 
+# Features
+* Simplify authentication
+* Enable resource sharing between Notebooks and team members
+* Improve service specific SDK discoverability
+* Organize resources using arbitrary hierarchies

 # How to get started
 Put your resources into *resources.yaml* (see sample below).
@ -53,6 +53,31 @@ var ws = new workspace.Workspace()
 ws['csv1'].download()
 ```

+# Examples
+## Data Science
+Larger projects require multiple notebooks -> share data set specification + authentication between notebooks
+
+## Development (C#)
+Within unit tests larger files are used and stored on an Azure Storage account, they can be looked up using this tool.
+In general cloud resource is simplified as authentication is performed using the currently logged in user.
+
+# The loooonger story
+Some motivation: considering real-life projects multiple personas (e.g. devs, data scientists, data engineers) are collaborating and sometimes roles overlap. Each persona has a set of tools that are tailored toward the role (e.g. VS Code to devs, AzureML Workspace/Azure Databricks for data scientists, ...). Today we use git to at least move source code artifiacts between them, but each toolset/environment has it's own notion of service and data connections (or more broadly resources). 
+And that's where this project comes in. We define a common location and semantic in a file assumed to be located in the root of your git repository called *resources.yaml*. One complication in the story are credentials, which we definitely don't want to put into our beloved git repository. 
+
+This project provides a set of tools in multiple languages (Python, JavaScript and C# to start with), which aims to offer parsing, credential and convenience support to the respective language users.
+
+Thus Python users will get easy access functions for data (e.g. from an Azure Storage Blob to a Pandas data frame) vs C# will get download support to enable unit test scenarios.
+
+As we go along we're actively working with toolset owners (e.g. VSCode extensions) to enable support for *resources.yaml*. 
+
+# Development principals
+*
+
+## Python
+* Service SDK libraries are imported at time of usage (e.g. resource.get_client())
+
+
 # Development

 # Python
--- a/python/pyworkspace/base.py
+++ b/python/pyworkspace/base.py
@ -46,13 +46,18 @@ class Resource(yaml.YAMLObject):
        if hasattr(self, 'credentialstore'):
            return [self.credentialstore]
        else:
-            from .credentialprovider import CredentialProvider, KeyRingCredentialProvider, EnvironmentCredentialProvider
+            from .credentialprovider import CredentialProvider, EnvironmentCredentialProvider
+            from .python.keyring import KeyRingCredentialProvider
+            from .python.jupyterlab_credentialstore import JupyterLabCredentialStore

            return [ # *self.get_workspace().get_all_of_type(CredentialProvider),
+                    JupyterLabCredentialStore(),
                    KeyRingCredentialProvider(),
                    EnvironmentCredentialProvider()]
                
    def get_name(self) -> str:
+        # allow the name to be overwritten from the parent 
+        # return getattr(self, 'name', self._Workspace__name)
        return self._Workspace__name
    
    def get_path(self) -> List[str]:
--- a/python/pyworkspace/credentialprovider.py
+++ b/python/pyworkspace/credentialprovider.py
@ -11,19 +11,6 @@ class AzureCredentialProvider(Resource):
        # return dbutils.notebook.entry_point.getDbutils().notebook().getContext().adlsAadToken().get() # 


-class KeyRingCredentialProvider(CredentialProvider):
-    yaml_tag = u'!python.keyring'
-    def get_secret(self, key, **kwargs):
-        try:
-            # conditional import
-            import keyring
-
-            # TODO: unclear if pyworkspace is good value here
-            # see https://pypi.org/project/keyring/#api-interface
-            return keyring.get_password('pyworkspace', key)
-        except:
-            return None
-
 class EnvironmentCredentialProvider(CredentialProvider):
    yaml_tag = u'!env'
    def get_secret(self, key, **kwargs):
--- a/python/pyworkspace/python/init.py
+++ b/python/pyworkspace/python/init.py
@ -0,0 +1,3 @@
+from .sqlalchemy import *
+from .keyring import *
+from .jupyterlab_credentialstore import *
--- a/python/pyworkspace/python/jupyterlab_credentialstore.py
+++ b/python/pyworkspace/python/jupyterlab_credentialstore.py
@ -0,0 +1,12 @@
+from ..credentialprovider import CredentialProvider
+
+# https://towardsdatascience.com/the-jupyterlab-credential-store-9cc3a0b9356 
+class JupyterLabCredentialStore(CredentialProvider):
+    yaml_tag = u'!python.jupyter.credentialstore'
+    def get_secret(self, key, **kwargs):
+        try:
+            import kernel_connector as kc
+
+            return kc.get_credential(key)
+        except:
+            return None
--- a/python/pyworkspace/python/keyring.py
+++ b/python/pyworkspace/python/keyring.py
@ -0,0 +1,14 @@
+from ..credentialprovider import CredentialProvider
+
+class KeyRingCredentialProvider(CredentialProvider):
+    yaml_tag = u'!python.keyring'
+    def get_secret(self, key, **kwargs):
+        try:
+            # conditional import
+            import keyring
+
+            # TODO: unclear if pyworkspace is good value here
+            # see https://pypi.org/project/keyring/#api-interface
+            return keyring.get_password('pyworkspace', key)
+        except:
+            return None
--- a/python/pyworkspace/python/sqlalchemy.py
+++ b/python/pyworkspace/python/sqlalchemy.py
@ -0,0 +1,40 @@
+from ..base import KeyNotFoundException, Resource
+import sys
+
+class SQLAlchemy(Resource):
+    yaml_tag = u'!python.sql.connection'
+
+    # TODO: is this generic enough to port beyond python? e.g. ADO.Net?
+    def __init__(self, drivername, username, host, port, database, query):
+        self.drivername = drivername
+        self.username = username
+        self.host = host
+        self.port = port
+        self.database = database
+        self.query = query
+
+    def get_client_lazy(self):
+        import sqlalchemy as sql
+
+        try:
+            password = self.get_secret()
+        except KeyNotFoundException as ex:
+            print("Warning: {}. Continuing without.".format(ex), file=sys.stderr)
+            password = None
+            pass
+
+        url = sql.engine.url.URL(**self.get_params(), password=password)
+        return sql.create_engine(url)
+
+# TODO: is this too far?
+class SQLAlchemyQuery(Resource):
+    yaml_tag = u'!python.sql.query'
+
+    def __init__(self, query, datasource):
+        self.datasource = datasource
+        self.query = query
+
+    def to_pandas_dataframe(self):
+        import pandas.io.sql as psql
+
+        return psql.read_sql(self.query, con=self.datasource.get_client())
--- a/python/pyworkspace/workspace.py
+++ b/python/pyworkspace/workspace.py
@ -9,6 +9,7 @@ from .azure.cognitiveservice import *
 from .base import *
 from .datasource import *
 from .credentialprovider import *
+from .python import *

 class Workspace:
    def __init__(self, path: str=None, content: str=None):
@ -66,6 +67,26 @@ class Workspace:
        # setup root links to avoid back reference to credential provider
        self.visit(setup_links)

+    def visit_resource(self, 
+              action: Callable[[yaml.YAMLObject, List[str], str], Any],
+              path: List[str],
+              node: Any,
+              name: str) -> List:
+
+        ret = []
+
+        # execute action for the reousrce
+        v = action(node, path, name) 
+        if v is not None:
+            ret.append(v)
+
+        # recurse into yaml objects to support nested data defs
+        for k, n in node.__dict__.items():
+            if isinstance(n, yaml.YAMLObject):
+                ret.extend(self.visit_resource(action, [*path, name], node=n, name=k))
+
+        return ret
+
    def visit(self,
              action: Callable[[yaml.YAMLObject, List[str], str], Any],
              path: List[str] = [],
@ -78,9 +99,7 @@ class Workspace:
            if isinstance(n, dict):
                ret.extend(self.visit(action, [*path, k], n))
            elif isinstance(n, yaml.YAMLObject):
-                v = action(n, path, k) 
-                if v is not None:
-                    ret.append(v)
+                ret.extend(self.visit_resource(action, path, node=n, name=k))

        return ret

--- a/python/requirements.txt
+++ b/python/requirements.txt
--- a/python/setup.py
+++ b/python/setup.py
@ -10,8 +10,8 @@ with open(path.join(here, 'README.md'), encoding='utf-8') as f:
    long_description = f.read()

 setup(name='pyworkspace', 
-      version='0.1'
-      description='Manages your cloud resources across multiple tooling environments.',
+      version='0.1',
+      description='Manages your cloud resources across multiple executing environments.',
      url='http://github.com/Microsoft/Workspace',
      author='Markus Cozowicz',
      author_email='marcozo@microsoft.com',
@ -31,6 +31,9 @@ setup(name='pyworkspace',
      python_requires='>=3',
      install_requires=['pyyaml'],
      extras_require={ 
-        'test': ['azureml-dataprep[pandas]', 'azure-keyvault'],
+        'test': ['azureml-dataprep[pandas]', 
+                 'azure-keyvault', 
+                 'sqlalchemy', 
+                 'keyring'],
      },
      packages=find_packages())
--- a/python/tests/azure/graph/resources.yaml
+++ b/python/tests/azure/graph/resources.yaml
@ -0,0 +1,8 @@
+sub1:
+  !azure.subscription
+  suscriptionid: 03909a66-bef8-4d52-8e9a-a346604e0902
+
+  # scenario
+  # be on Azure Databricks
+  # - get ARM authentication token using magic call
+  # - find 
--- a/python/tests/python/sql/resources.yaml
+++ b/python/tests/python/sql/resources.yaml
@ -0,0 +1,19 @@
+
+query1:
+  !python.sql.query
+  # sometimes I need a name here to lookup credentials.
+  # instead of assuming a fixed name, support whatever and just query for 'connection' type
+  datasource: 
+    !python.sql.connection
+    # in-memory database
+    drivername: sqlite
+    # from https://docs.sqlalchemy.org/en/13/core/engines.html
+    # drivername – the name of the database backend. This name will correspond to a module in sqlalchemy/databases or a third party plug-in.
+    # username – The user name.
+    # password – database password.
+    # host – The name of the host.
+    # port – The port number.
+    # database – The database name.
+    # query – A dictionary of options to be passed to the dialect and/or the DBAPI upon connect.
+  query: SELECT * FROM table1
+
--- a/python/tests/python/sql/test_alchemy.py
+++ b/python/tests/python/sql/test_alchemy.py
@ -0,0 +1,23 @@
+import pyworkspace
+import pytest
+import os
+
+@pytest.fixture
+def test_subdir():
+    # change to tests/ subdir so we can resolve the yaml
+    os.chdir(os.path.dirname(os.path.abspath(__file__)))
+
+def test_sql_alchemy(test_subdir):
+    ws = pyworkspace.Workspace()
+
+    # os.remove('deleteme_test_alchemy.db')
+
+    query1 = ws['query1']
+
+    # test fixture setup
+    engine = query1.datasource.get_client()
+    engine.execute("CREATE TABLE table1(col1 VARCHAR(255))")
+    engine.execute("INSERT INTO table1 VALUES('abc')")
+
+    # actual usage
+    assert query1.to_pandas_dataframe().iloc[0][0] == 'abc'