added additional credential provider

This commit is contained in:
Markus Cozowicz 2019-07-29 12:55:13 +02:00
Родитель 52d017af73
Коммит d35c19732b
13 изменённых файлов: 187 добавлений и 29 удалений

Просмотреть файл

@ -14,16 +14,16 @@ For more information see the [Code of Conduct FAQ](https://opensource.microsoft.
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
# What is it?
Workspace is here to manage all your connection string and datasets in a central location, a *resources.yaml* at the root of your git repository without any credentials. The provided library will enable you to retrieve corresponding credentials from multiple sources (e.g. Azure Key Vault, Amazon Key Management Service, Python KeyRing, environment variables, ...).
_Workspace_ was created to manage all your authoring time service connection strings and *dataset* references in a *resources.yaml*
usually located at the root of your git repository.
The provided libraries aims to simplify access by automating authentication and natural integration with service specific SDKs.
Futhermore we aim for tooling support (e.g. list storage accounts in VSCode).
Some motivation: considering real-life projects multiple personas (e.g. devs, data scientists, data engineers) are collaborating and sometimes roles overlap. Each persona has a set of tools that are tailored toward the role (e.g. VS Code to devs, AzureML Workspace/Azure Databricks for data scientists, ...). Today we use git to at least move source code artifiacts between them, but each toolset/environment has it's own notion of service and data connections (or more broadly resources).
And that's where this project comes in. We define a common location and semantic in a file assumed to be located in the root of your git repository called *resources.yaml*. One complication in the story are credentials, which we definitely don't want to put into our beloved git repository.
This project provides a set of tools in multiple languages (Python, JavaScript and C# to start with), which aims to offer parsing, credential and convenience support to the respective language users.
Thus Python users will get easy access functions for data (e.g. from an Azure Storage Blob to a Pandas data frame) vs C# will get download support to enable unit test scenarios.
As we go along we're actively working with toolset owners (e.g. VSCode extensions) to enable support for *resources.yaml*.
# Features
* Simplify authentication
* Enable resource sharing between Notebooks and team members
* Improve service specific SDK discoverability
* Organize resources using arbitrary hierarchies
# How to get started
Put your resources into *resources.yaml* (see sample below).
@ -53,6 +53,31 @@ var ws = new workspace.Workspace()
ws['csv1'].download()
```
# Examples
## Data Science
Larger projects require multiple notebooks -> share data set specification + authentication between notebooks
## Development (C#)
Within unit tests larger files are used and stored on an Azure Storage account, they can be looked up using this tool.
In general cloud resource is simplified as authentication is performed using the currently logged in user.
# The loooonger story
Some motivation: considering real-life projects multiple personas (e.g. devs, data scientists, data engineers) are collaborating and sometimes roles overlap. Each persona has a set of tools that are tailored toward the role (e.g. VS Code to devs, AzureML Workspace/Azure Databricks for data scientists, ...). Today we use git to at least move source code artifiacts between them, but each toolset/environment has it's own notion of service and data connections (or more broadly resources).
And that's where this project comes in. We define a common location and semantic in a file assumed to be located in the root of your git repository called *resources.yaml*. One complication in the story are credentials, which we definitely don't want to put into our beloved git repository.
This project provides a set of tools in multiple languages (Python, JavaScript and C# to start with), which aims to offer parsing, credential and convenience support to the respective language users.
Thus Python users will get easy access functions for data (e.g. from an Azure Storage Blob to a Pandas data frame) vs C# will get download support to enable unit test scenarios.
As we go along we're actively working with toolset owners (e.g. VSCode extensions) to enable support for *resources.yaml*.
# Development principals
*
## Python
* Service SDK libraries are imported at time of usage (e.g. resource.get_client())
# Development
# Python

Просмотреть файл

@ -46,13 +46,18 @@ class Resource(yaml.YAMLObject):
if hasattr(self, 'credentialstore'):
return [self.credentialstore]
else:
from .credentialprovider import CredentialProvider, KeyRingCredentialProvider, EnvironmentCredentialProvider
from .credentialprovider import CredentialProvider, EnvironmentCredentialProvider
from .python.keyring import KeyRingCredentialProvider
from .python.jupyterlab_credentialstore import JupyterLabCredentialStore
return [ # *self.get_workspace().get_all_of_type(CredentialProvider),
JupyterLabCredentialStore(),
KeyRingCredentialProvider(),
EnvironmentCredentialProvider()]
def get_name(self) -> str:
# allow the name to be overwritten from the parent
# return getattr(self, 'name', self._Workspace__name)
return self._Workspace__name
def get_path(self) -> List[str]:

Просмотреть файл

@ -11,19 +11,6 @@ class AzureCredentialProvider(Resource):
# return dbutils.notebook.entry_point.getDbutils().notebook().getContext().adlsAadToken().get() #
class KeyRingCredentialProvider(CredentialProvider):
yaml_tag = u'!python.keyring'
def get_secret(self, key, **kwargs):
try:
# conditional import
import keyring
# TODO: unclear if pyworkspace is good value here
# see https://pypi.org/project/keyring/#api-interface
return keyring.get_password('pyworkspace', key)
except:
return None
class EnvironmentCredentialProvider(CredentialProvider):
yaml_tag = u'!env'
def get_secret(self, key, **kwargs):

Просмотреть файл

@ -0,0 +1,3 @@
from .sqlalchemy import *
from .keyring import *
from .jupyterlab_credentialstore import *

Просмотреть файл

@ -0,0 +1,12 @@
from ..credentialprovider import CredentialProvider
# https://towardsdatascience.com/the-jupyterlab-credential-store-9cc3a0b9356
class JupyterLabCredentialStore(CredentialProvider):
yaml_tag = u'!python.jupyter.credentialstore'
def get_secret(self, key, **kwargs):
try:
import kernel_connector as kc
return kc.get_credential(key)
except:
return None

Просмотреть файл

@ -0,0 +1,14 @@
from ..credentialprovider import CredentialProvider
class KeyRingCredentialProvider(CredentialProvider):
yaml_tag = u'!python.keyring'
def get_secret(self, key, **kwargs):
try:
# conditional import
import keyring
# TODO: unclear if pyworkspace is good value here
# see https://pypi.org/project/keyring/#api-interface
return keyring.get_password('pyworkspace', key)
except:
return None

Просмотреть файл

@ -0,0 +1,40 @@
from ..base import KeyNotFoundException, Resource
import sys
class SQLAlchemy(Resource):
yaml_tag = u'!python.sql.connection'
# TODO: is this generic enough to port beyond python? e.g. ADO.Net?
def __init__(self, drivername, username, host, port, database, query):
self.drivername = drivername
self.username = username
self.host = host
self.port = port
self.database = database
self.query = query
def get_client_lazy(self):
import sqlalchemy as sql
try:
password = self.get_secret()
except KeyNotFoundException as ex:
print("Warning: {}. Continuing without.".format(ex), file=sys.stderr)
password = None
pass
url = sql.engine.url.URL(**self.get_params(), password=password)
return sql.create_engine(url)
# TODO: is this too far?
class SQLAlchemyQuery(Resource):
yaml_tag = u'!python.sql.query'
def __init__(self, query, datasource):
self.datasource = datasource
self.query = query
def to_pandas_dataframe(self):
import pandas.io.sql as psql
return psql.read_sql(self.query, con=self.datasource.get_client())

Просмотреть файл

@ -9,6 +9,7 @@ from .azure.cognitiveservice import *
from .base import *
from .datasource import *
from .credentialprovider import *
from .python import *
class Workspace:
def __init__(self, path: str=None, content: str=None):
@ -66,6 +67,26 @@ class Workspace:
# setup root links to avoid back reference to credential provider
self.visit(setup_links)
def visit_resource(self,
action: Callable[[yaml.YAMLObject, List[str], str], Any],
path: List[str],
node: Any,
name: str) -> List:
ret = []
# execute action for the reousrce
v = action(node, path, name)
if v is not None:
ret.append(v)
# recurse into yaml objects to support nested data defs
for k, n in node.__dict__.items():
if isinstance(n, yaml.YAMLObject):
ret.extend(self.visit_resource(action, [*path, name], node=n, name=k))
return ret
def visit(self,
action: Callable[[yaml.YAMLObject, List[str], str], Any],
path: List[str] = [],
@ -78,9 +99,7 @@ class Workspace:
if isinstance(n, dict):
ret.extend(self.visit(action, [*path, k], n))
elif isinstance(n, yaml.YAMLObject):
v = action(n, path, k)
if v is not None:
ret.append(v)
ret.extend(self.visit_resource(action, path, node=n, name=k))
return ret

Просмотреть файл

Просмотреть файл

@ -10,8 +10,8 @@ with open(path.join(here, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(name='pyworkspace',
version='0.1'
description='Manages your cloud resources across multiple tooling environments.',
version='0.1',
description='Manages your cloud resources across multiple executing environments.',
url='http://github.com/Microsoft/Workspace',
author='Markus Cozowicz',
author_email='marcozo@microsoft.com',
@ -31,6 +31,9 @@ setup(name='pyworkspace',
python_requires='>=3',
install_requires=['pyyaml'],
extras_require={
'test': ['azureml-dataprep[pandas]', 'azure-keyvault'],
'test': ['azureml-dataprep[pandas]',
'azure-keyvault',
'sqlalchemy',
'keyring'],
},
packages=find_packages())

Просмотреть файл

@ -0,0 +1,8 @@
sub1:
!azure.subscription
suscriptionid: 03909a66-bef8-4d52-8e9a-a346604e0902
# scenario
# be on Azure Databricks
# - get ARM authentication token using magic call
# - find

Просмотреть файл

@ -0,0 +1,19 @@
query1:
!python.sql.query
# sometimes I need a name here to lookup credentials.
# instead of assuming a fixed name, support whatever and just query for 'connection' type
datasource:
!python.sql.connection
# in-memory database
drivername: sqlite
# from https://docs.sqlalchemy.org/en/13/core/engines.html
# drivername – the name of the database backend. This name will correspond to a module in sqlalchemy/databases or a third party plug-in.
# username – The user name.
# password – database password.
# host – The name of the host.
# port – The port number.
# database – The database name.
# query – A dictionary of options to be passed to the dialect and/or the DBAPI upon connect.
query: SELECT * FROM table1

Просмотреть файл

@ -0,0 +1,23 @@
import pyworkspace
import pytest
import os
@pytest.fixture
def test_subdir():
# change to tests/ subdir so we can resolve the yaml
os.chdir(os.path.dirname(os.path.abspath(__file__)))
def test_sql_alchemy(test_subdir):
ws = pyworkspace.Workspace()
# os.remove('deleteme_test_alchemy.db')
query1 = ws['query1']
# test fixture setup
engine = query1.datasource.get_client()
engine.execute("CREATE TABLE table1(col1 VARCHAR(255))")
engine.execute("INSERT INTO table1 VALUES('abc')")
# actual usage
assert query1.to_pandas_dataframe().iloc[0][0] == 'abc'