Update doc links
This commit is contained in:
Родитель
1f398c1c8d
Коммит
7343b9cfa5
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 126 KiB |
|
@ -1,8 +1,6 @@
|
|||
# Frequently Asked Questions
|
||||
This section covers frequently asked questions and will grow over time. We appreciate your feedback to continuously improve our toolkit and let it grow over time!
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
### How do I join the Microsoft organization on GitHub?
|
||||
This documentation helps you to connect your personal GitHub with the Microsoft organization. You will be able to access Microsoft-internal GitHub repositories after successfully connecting them.
|
||||
|
||||
|
@ -10,9 +8,9 @@ This documentation helps you to connect your personal GitHub with the Microsoft
|
|||
2. Save the reset codes to be able to reset the 2-factor authentication one day.
|
||||
3. [Here](https://repos.opensource.microsoft.com/) you can link your GitHub account to the Microsoft organization afterwards. You have to give Microsoft the authorization to access your GitHub (e.g. Microsoft will ask you if your GitHub account uses 2-factor authentication).
|
||||
4. After successfully linking the accounts, you can join the GitHub organization "microsoft" in the "Available Microsoft GitHub organizations" section.<br>
|
||||
![Available Organizations](../.attachments/github-orga.jpg)
|
||||
![Available Organizations](.attachments/github-orga.jpg)
|
||||
5. Your GitHub profile should now list "microsoft" among your organizations.<br>
|
||||
![GitHub Profile](../.attachments/github-orga-ms.png)
|
||||
![GitHub Profile](.attachments/github-orga-ms.png)
|
||||
6. You can now access the [Verseagility GitHub-repository](https://github.com/microsoft/verseagility)
|
||||
|
||||
### How to add support for a new languages?
|
||||
|
|
|
@ -31,7 +31,7 @@ Following technical framework is recommended and will help you to succeed:
|
|||
## Software Architecture
|
||||
The toolkit is based on the following architecture. Basically, it is possible to run the toolkit locally, yet we highly recommend you to use the Microsoft Azure ecosystem.
|
||||
|
||||
![Architecture diagram](./.attachments/architecture-versagility-v1.png)
|
||||
![Architecture diagram](./.attachments/architecture-verseagility-v1-2.png)
|
||||
|
||||
## Verseagility-Setup
|
||||
The step-by-step approach for getting started with the NLP toolkit can be found [here](docs/Verseagility-Setup.md). It helps you to set up your environment, guides you through the training process and gives you some advice for potential customizations.
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Setup of your Demo WebApp
|
||||
This documentation helps you to deploy your own demo for the Verseagility-toolkit to an Azure WebApp.
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
## Approaches
|
||||
There are different approaches of possible demo environment setups.
|
||||
|
@ -92,7 +92,7 @@ For updating purpose after your first deployment, change `$create` to `$FALSE`.
|
|||
In order to protect your demo website from external users and to avoid misuse, we recommend you to activate an Azure Active Directory (AAD) authentication. The following steps help you to register your WebApp in the AAD you are working in. Authentication methods like a public Microsoft/Outlook.com-account also work, in case you want to make it useable for a client.
|
||||
|
||||
1. Go to your resource group where you have just deployed the demo app and enter the App Service resource. In this case, we click on "_nlp-demo-app_".
|
||||
![AAD Setup](../../.attachments/demo-aad-app.png)
|
||||
![AAD Setup](../.attachments/demo-aad-app.png)
|
||||
|
||||
2. Click on _"Authentication / Authorization"_ on the left menu.
|
||||
|
||||
|
@ -103,12 +103,12 @@ In order to protect your demo website from external users and to avoid misuse, w
|
|||
5. Click on _"Azure Active Directory"_ below to enter the configuration environment.
|
||||
|
||||
6. Select _"Express"_ and click on _"Azure AD App"_ to search for your application.
|
||||
![AAD Setup](../../.attachments/demo-aad-choice.png)
|
||||
![AAD Setup](../.attachments/demo-aad-choice.png)
|
||||
|
||||
7. Search for your app name, choose the respective result and click _"OK"_.![AAD Setup](../../.attachments/demo-aad-reg.png)
|
||||
7. Search for your app name, choose the respective result and click _"OK"_.![AAD Setup](../.attachments/demo-aad-reg.png)
|
||||
|
||||
8. Back on the previous page of (6), click _"OK"_ again.
|
||||
|
||||
9. You should now be back at the starting point where your configuration should look like the one you see below. Click _"Save"_ to confirm your settings. ![AAD Setup](../../.attachments/demo-aad-mainpage.png)
|
||||
9. You should now be back at the starting point where your configuration should look like the one you see below. Click _"Save"_ to confirm your settings. ![AAD Setup](../.attachments/demo-aad-mainpage.png)
|
||||
|
||||
10. After couple of minutes, when accessing your web demo, you should now be redirected to an AAD login page. ![AAD Setup](../../.attachments/demo-aad-login.png)
|
||||
10. After couple of minutes, when accessing your web demo, you should now be redirected to an AAD login page. ![AAD Setup](../.attachments/demo-aad-login.png)
|
|
@ -1,7 +1,7 @@
|
|||
# Demo Guide
|
||||
This documentation serves as guideline how to present the Verseagility demo the best way.
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
The demo/toolkit VERSEAGILITY serves to demonstrate end-to end capabilities for Natural Language Processing use cases. Following features are currently covered in the standard version:
|
||||
- Text Classification
|
||||
|
@ -43,4 +43,4 @@ To enable a good demo experience, we recommend you to type some text on the fly
|
|||
| __Body__ | Text: Hallo! Vor einigen jahren habe ich in einem PC- Geschäft meinen PC gekauft. Nach einiger Zeit ist auf dem Monitor unten rechts einen Text erschienen: Windows aktivieren. Im Geschäft wurde das System wieder aktiviert. Das Geschäft ist aber schon zu. Pleite oder so. Keine Ahnung. Wie kann ich nun das System Aktivieren? Ich habe ja meinen PC legal gekauft. Microsoft und Bill Gates ich bitte euch um Hilfe! | In the pre-Chromium Edge, I had a customize (and, IIRC, pinned) list of quick links in the New Tab page. This the row of 7 icon-decorated boxes. I kept them in a specific order for the purpose of muscle memory. I just got updated to chromium, and while the conversion did maintain my links, it rearranged them in an order that seems to make no sense (not alphabetical, although it may have reverted to most-used). Who moved my cheese? Can I get them back in the order that my muscles remember? |
|
||||
|
||||
### Visual Documentation
|
||||
![Demo Preview](../../.attachments/demo-preview.PNG)
|
||||
![Demo Preview](../.attachments/demo-preview.PNG)
|
|
@ -1,11 +1,11 @@
|
|||
# Data Preparation Pipeline
|
||||
This documentation helps you to understand the data preparation pipeline, how to bring your own data into the system and describes the required file structure.
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
The purpose of the pipeline is twofold: 1) It automatically processes supported documents that are put into the "data" container of the created Azure Storage. 2) It creates a REST endpoint that can be used to convert documents (like images, PDFs) into text.
|
||||
|
||||
1. If you put files in the storage account "data" container, the files are processed and put in Cosmos DB following the standardized output format. For testing purpose you may use this [Sample CSV-file](../../.attachments/sample.csv) or a [Sample JSON-file](../../.attachments/sample.json). Have a look at the standard JSON structure supported by Verseagility:
|
||||
1. If you put files in the storage account "data" container, the files are processed and put in Cosmos DB following the standardized output format. For testing purpose you may use this [Sample CSV-file](../.attachments/sample.csv) or a [Sample JSON-file](../.attachments/sample.json). Have a look at the standard JSON structure supported by Verseagility:
|
||||
```json
|
||||
{
|
||||
"question": {
|
||||
|
@ -29,11 +29,11 @@ The purpose of the pipeline is twofold: 1) It automatically processes supported
|
|||
}
|
||||
```
|
||||
|
||||
![Data Container](../../.attachments/data_container.png)
|
||||
![Data Container](../.attachments/data_container.png)
|
||||
|
||||
What happens in the background is that the "processor-function" gets triggered. This function takes the newly added blob and parses it based on the content/MIME-type of the blob. To tell the function how a certain document has to be handled, you must write your own parser, parsers are stored in the /parsers directory. By default, three parsers are already implemented (CSV, JSON and default). If you take a look at e.g. csv-parser, you see that the job of the parser is to map content of the CSV file to an object called "OutputSchema". OutputSchema is the document schema that will be used to store the document in Cosmos DB (of course you can also adopt the schema to fit your needs). For example, if your CSV file contains a list of documents and the ID in the first column (index 0), the number of views for this document in the second column and the URI to the document in the third column, the parser iterates over all rows and maps all columns to the right OutputSchema properties:
|
||||
|
||||
![Document Mapping](../../.attachments/mapping.png)
|
||||
![Document Mapping](../.attachments/mapping.png)
|
||||
|
||||
Note: If you want to link attachments to the documents in the CSV file, create a container in the storage account named after the document ID. The pre-implemented parser will search in the storage for any container that has the same ID as the document. If it finds one, it links all blobs in the container as attachments to the document.
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
# Webservice Deployment Guide
|
||||
In the following documentation, you will find a step-by-step guide on how to bring your trained models into production.
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
## Deployment Steps
|
||||
After finishing the training rounds, your models are ready to be deployed as an endpoint. During this process, a new Azure _Azure Container Instance_ is going to be created in your resource group and it will also be registered in your Azure Machine Learning workspace. They are configured with 2GB of RAM and 1 CPU (minimum performance), but you can scale this up by changing the values in `service.py`. For demo purpose, these values are sufficient. You will initiate the deployment from the root directory of the repository using the command line. <br>
|
||||
|
@ -18,24 +18,24 @@ The deployment may take a while. You will find the deployed models in your Azure
|
|||
|
||||
You can get your keys by clicking on the respective real-time endpoint name. In the _Consume_, you need to copy the REST Endpoint URL (e.g. http://29efd3ea-2d9d-4c54-99b0-89a099f7aa80.westeurope.azurecontainer.io/score). Further, you need to get the _Primary Key_ for your endpoint scoring (e.g. `Nm8MKM8FT8KshjtPGNfKioNxkltg3pKx`).
|
||||
|
||||
![Deployed Endpoints](../../.attachments/deploy-endpoints.PNG)
|
||||
![Deployed Endpoints](../.attachments/deploy-endpoints.PNG)
|
||||
|
||||
## Endpoint Test with Postman
|
||||
Get [Postman](https://www.postman.com/downloads/) and install it on your local machine. There is a prepared example request for the English endpoint, which you can download [here](../../.attachments/postman-request.json).
|
||||
Get [Postman](https://www.postman.com/downloads/) and install it on your local machine. There is a prepared example request for the English endpoint, which you can download [here](../.attachments/postman-request.json).
|
||||
|
||||
1. Open Postman and import the file you just downloaded by clicking on _"Import"_ and _"Import File"_.<br>
|
||||
![Postman - Import JSON](../../.attachments/postman-import.png)
|
||||
![Postman - Import JSON](../.attachments/postman-import.png)
|
||||
|
||||
2. In the tab _"Collections"_ you will find the example request _"NLP API"_.
|
||||
|
||||
3. In case you want to test your own endpoint, replace the example URL `http://29efd3ea-2d9d-4c54-99b0-89a099f7aa80.westeurope.azurecontainer.io/score` with your custom one. Further, insert your authorization key in the _"Headers"_ section right after the word `Bearer` (which you have to keep).<br>
|
||||
![Postman - Edit Keys](../../.attachments/postman-setkey.png)
|
||||
![Postman - Edit Keys](../.attachments/postman-setkey.png)
|
||||
|
||||
4. Switch the tab to _"Body"_ and edit the values for _"subject"_ and _"body"_.
|
||||
![Postman - Edit Keys](../../.attachments/postman-settext.png)
|
||||
![Postman - Edit Keys](../.attachments/postman-settext.png)
|
||||
|
||||
5. Click _"Submit"_ and wait for the result, which you will find below.
|
||||
![Postman - Edit Keys](../../.attachments/postman-result.png).
|
||||
![Postman - Edit Keys](../.attachments/postman-result.png).
|
||||
|
||||
## Batch Scoring
|
||||
If you have multiple files or a whole dataset to be scored, you can find a batch scoring Jupyter notebook in the GitHub repository, located in the subfolder `notebook` as `Score - Batch Scoring of Model Endpoint.ipynb`. With that, you will be able to export your predictions as comma-separated values. Further, classification report and confusion matrix based on ScikitLearn are integrated.
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
The following section covers aspects of data pre-processing used for different tasks. Some pre-processing steps are universal, while others are task specific. They are split accordingly.
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
## Universal Steps
|
||||
- Placeholders
|
|
@ -22,7 +22,7 @@ This page describes how you get and set the right Azure resource keys to make Ve
|
|||
- The main page of Azure Machine Learning has all the information you need, starting with the name of your resource, which you have to insert in the `aml-ws-name`.
|
||||
- Secondly, you have to insert your subscription ID as stated in the screenshot below in the second row `aml-ws-sid`.
|
||||
- Also, you have to insert the name of your resource group in the line `aml-ws-rg`.
|
||||
- ![Azure Machine Learning Keys](../../.attachments/keys-aml.png)
|
||||
- ![Azure Machine Learning Keys](../.attachments/keys-aml.png)
|
||||
|
||||
- __Text Analytics Service__
|
||||
- At the main page called _Quick start_, copy the following elements:
|
||||
|
@ -30,12 +30,12 @@ This page describes how you get and set the right Azure resource keys to make Ve
|
|||
- Mark and copy the _Endpoint_ name of the text analytics service as stated in the frame and insert it in the `text-analytics-name` of your file. You do not need the whole URL, the name is sufficient.
|
||||
- Please verify with the icon you see below that you are actually using the Text Analytics resource instead of the Computer Vision.
|
||||
|
||||
- ![Text Analytics Keys](../../.attachments/keys-ta.png)
|
||||
- ![Text Analytics Keys](../.attachments/keys-ta.png)
|
||||
- __Cosmos DB__
|
||||
- After entering your _Cosmos DB_ resource, click on _Keys_ in the left menu.
|
||||
- From there, copy the resource name from the _URI_-field, which matches to the name of the resource. Similarly to the _Text Analytics Service_, you do not need the entire URL, the name is sufficient. Insert it in the `cosmos-db-name` of your file.
|
||||
- Last but not least, copy the _Primary Key_ and set it in the `cosmos-db-key` row.
|
||||
- ![Cosmos DB Keys](../../.attachments/keys-cdb.png)
|
||||
- ![Cosmos DB Keys](../.attachments/keys-cdb.png)
|
||||
|
||||
4. Your file should look similarly to this one (the values below are random):
|
||||
```
|
|
@ -12,7 +12,7 @@ This part of the documentation helps you to set up Verseagility in our own subsc
|
|||
|
||||
|
||||
# Overview
|
||||
![image.png](../.attachments/architecture-versagility-v1.png)
|
||||
![image.png](../.attachments/architecture-verseagility-v1-2.png)
|
||||
There are three main components to Verseagility:
|
||||
1. Data processing pipeline
|
||||
1. Task modeling
|
|
@ -1,7 +1,7 @@
|
|||
# Environment Setup
|
||||
This documentation helps you to clone the repository to your local machine, install the environment and deploy all the necessary resources to your Azure subscription using an ARM-template. Therefore, the manual effort is minimized and the integrity is ensured.
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
## Repository Setup
|
||||
0. Make sure you fulfill all criteria described in the [requirements](../Verseagility-Setup.md) section.
|
||||
|
@ -35,7 +35,7 @@ to
|
|||
[![Deploy to Azure](https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/1-CONTRIBUTION-GUIDE/images/deploytoazure.png)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fchristian-vorhemus%2Ffunction-app%2Fmaster%2Fazuredeploy.json)
|
||||
|
||||
2. Make sure you wait until all resources have been deployed successfully before you proceed with the next step. You can verify this by checking the notification bell at the top menu whether any errors occurred. The resource group should have the following components:
|
||||
![Azure Resources](../../.attachments/azure-resources.png)
|
||||
![Azure Resources](../.attachments/azure-resources.png)
|
||||
- [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning/)
|
||||
- Accelerator for Machine Learning projects
|
||||
- Central component for model training and deployments
|
||||
|
@ -57,7 +57,7 @@ to
|
|||
- Central file storage for data input, prepared Azure Machine learning data etc.)
|
||||
|
||||
3. After the deployment has finished (approx. 30 minutes) as a workaround for now, you have to add the Function host key as an environment variable manually. Therefore, access your Function App, click on "_App keys_" in the left menu, click on "_default_" and copy the key from the menu which is going to appear on the right hand side. After copying, click on "_Configuration_" and insert the key in the field named "_FunctionHostKey_" in the function (if the variable is already there, replace the value) and click "_Save_".<br>
|
||||
![Get Function Keys](../../.attachments/function-getkeys.png)
|
||||
![Get Function Keys](../.attachments/function-getkeys.png)
|
||||
|
||||
4. You may now proceed with customizing the pipeline as explained [here](Customize-Pipeline.md).
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
# Classification / Training
|
||||
This part of the documentation serves as guideline for the model training process. The data is automatically going to be pulled from the Cosmos DB. Your project language is the respective search criteria, so all English documents are going to be incorporated into the training process if you choose English as your project language and so on.
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
## Initiate the Training
|
||||
After setting up your projects in the previous pages, you are now ready to train your models. This training step incorporates the classification, named entity recognition and question/answering models all in one.
|
||||
|
@ -24,13 +24,13 @@ After setting up your projects in the previous pages, you are now ready to train
|
|||
## Access your experiment in Azure Machine Learning
|
||||
After running the classification training step, you will find your experiments in Azure Machine Learning.
|
||||
|
||||
1. Clicking on "_Experiments_" on the left side, you will find your experiment being registered with the respective name and language shortcut. In case you initiate further training rounds for the same language, they will all be registered under the same experiment. Click on the respective experiments to get insights on running an, failed and finished experiment rounds. <br><br>![Azure ML Experiments](../../.attachments/classification-aml-experiments.PNG) <br><br>
|
||||
1. Clicking on "_Experiments_" on the left side, you will find your experiment being registered with the respective name and language shortcut. In case you initiate further training rounds for the same language, they will all be registered under the same experiment. Click on the respective experiments to get insights on running an, failed and finished experiment rounds. <br><br>![Azure ML Experiments](../.attachments/classification-aml-experiments.PNG) <br><br>
|
||||
|
||||
2. There, you will get a detailed view on earlier training rounds. <br><br>![Azure ML Experiments](../../.attachments/classification-aml-experiments-en.PNG)
|
||||
2. There, you will get a detailed view on earlier training rounds. <br><br>![Azure ML Experiments](../.attachments/classification-aml-experiments-en.PNG)
|
||||
|
||||
3. After the training round has successfully been finished, enter the `Models` menu on the left hand side. There you will find models which successfully have been trained.
|
||||
|
||||
![AML Models](../../.attachments/aml-models.PNG)
|
||||
![AML Models](../.attachments/aml-models.PNG)
|
||||
|
||||
4. For every task like classification, NER etc., a separate model is going to be registered. For example, `msforum_de-model-1` stands for a successfully trained classification model, `msforum_de-model-4` for question/answering. Depending on your pre-defined tasks, you should wait until all your models appear here before you proceed with the model deployment.
|
||||
|
|
@ -5,7 +5,7 @@ The toolkit supports and includes different approaches and frameworks for recogn
|
|||
- FARM/Transformer Custom NER
|
||||
- Basic approaches (like regex and term lists)
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
The central components can be found in the script `code/ner.py`.
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
# Question Answering
|
||||
This page is devoted to the question-answering component of the NLP toolkit and describes how the answer suggestions are being ranked during runtime. Please keep in mind that this component of the toolkit requires a large amount of potential answers for each text that has been trained along with the input texts in order to
|
||||
|
||||
[[_TOC_]]
|
||||
|
||||
|
||||
## Ranking Algorithm
|
||||
The current version of Verseagility supports the Okapi BM25 information retrieval algorithm to sort historical question answer pairs by relevance. BM25 is a ranking approach used by search engines to estimate the relevance of a document to a given search query, such as a text or document. This is implemented using the [gensim library](https://radimrehurek.com/gensim/summarization/bm25.html). The ranking framework is accessed by the file `code/rank.py`.
|
Загрузка…
Ссылка в новой задаче