This commit is contained in:
Kabir Khan 2019-04-05 12:13:39 -07:00
Родитель 3cae5bb603
Коммит 0b44c83e71
4 изменённых файлов: 265 добавлений и 3 удалений

268
README.md
Просмотреть файл

@ -1,5 +1,267 @@
# Azure Functions Skills Extractor
# Introduction
The Skills Extractor API takes unstructured text and returns a list of
Skills contained in that text.
## What is a Skill?
The Taxonomies the API pulls from primarily consist of concepts and tools related to technology.
Example Skills:
Machine Learning, Artificial Intelligence, PyTorch, Business, Advertising
## Skill Sources
We pull skills and technologies from many open online sources and use build [record linkage](https://github.com/dedupeio/dedupe-examples/tree/master/record_linkage_example) models to conflate skills and categories across each source into a single [knowledge graph](https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f).
* [Coursera](https://www.coursera.org/gsi/)
* [Microsoft Academic Graph](https://academic.microsoft.com/topics)
* [LinkedIn Learning](https://www.linkedin.com/learning/me/skills)
* [GitHub Featured Topics](https://github.com/topics)
* [StackShare Tools](https://stackshare.io/categories)
* [Class Central Subjects Github](https://github.com/classcentral/online-course-taxonomy)
* [ONET Online Hot Technology Index](https://www.onetonline.org/search/hot_tech/)
* [ACM Classifications](https://dl.acm.org/ccs/ccs_flat.cfm)
## Follow instructions here
https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-function-linux-custom-image#create-a-resource-group
# Use Cases
* ### Pull out skills from descriptions of open Jobs at your company
* ### Extract skills from Learning Content that your company creates
# Quickstart: Extract Skills for your data in Azure Search using a Custom Cognitive Skill
### **This tutorial assumes you already have an instance of Azure Search running. See the documentation here to get started with Azure Search https://docs.microsoft.com/en-us/azure/search/**
> If you're unfamiliar with Azure Search Cognitive Skills you can read more about them here:
https://docs.microsoft.com/en-us/azure/search/cognitive-search-concept-intro
## Prerequisites
Before running this sample, you must have the following:
* Install [Azure Core Tools version 2.x](functions-run-local.md#v2).
* Install the [Azure CLI]( /cli/azure/install-azure-cli). This article requires the Azure CLI version 2.0 or later. Run `az --version` to find the version you have.
You can also use the [Azure Cloud Shell](https://shell.azure.com/bash).
## Deploy Docker Container to Azure Functions using Azure CLI
The first step is deploy your own instance of the Skills Extractor Azure Function.
You can build and host the container yourself from this repo or use the prebuilt container on Docker Hub: `kabirkhan14/azure-functions-skills-extractor`
> For a more complete walktrhough of deploying Azure Functions, Follow instructions here:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-function-linux-custom-image#create-a-resource-group
## Create a resource group
```azurecli-interactive
az group create --name myResourceGroup --location westus
```
## Create an Azure Storage account
```azurecli-interactive
az storage account create --name <storage_name> --location westus --resource-group myResourceGroup --sku Standard_LRS
```
## Create an App Service Plan for Linux
```azurecli-interactive
az appservice plan create --name myAppServicePlan --resource-group myResourceGroup --sku B1 --is-linux
```
## Create and deploy the custom image
The function app hosts the execution of your functions. Create a function app from a Docker Hub image by using the az functionapp create command.
```azurecli-interactive
az functionapp create --name <app_name> --storage-account <storage_name> --resource-group myResourceGroup \
--plan myAppServicePlan --deployment-container-image-name kabirkhan14/azure-functions-skills-extractor
```
## Configure the function app
The function needs the connection string to connect to the default storage account. The following command will add the necessary `AzureWebJobsStorage` setting to your function application settings
```azurecli-interactive
storageConnectionString=$(az storage account show-connection-string --resource-group myResourceGroup --name <storage_name> --query connectionString --output tsv)
az functionapp config appsettings set --name <app_name> \
--resource-group myResourceGroup \
--settings AzureWebJobsDashboard=$storageConnectionString \
AzureWebJobsStorage=$storageConnectionString
```
# Create a new Cognitive Skill in Azure Search
> For the full documentation on integrating custom cognitive skills see:
https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface
## Add a `skills` field to your search index
Open your search index in the Azure Portal and add a new field with a skills field.
You can also update your search index using the Azure Search REST API docs here:
https://docs.microsoft.com/en-us/azure/search/search-what-is-an-index
![Skills Field](docs/media/skills_field.jpg)
## Get the Function Url from the Azure Portal
Open the [Azure Portal](https://portal.azure.com) and navigate to the function app you created to in the previous step.
Click Get Function URL
![Click the Get Function Url Button](docs/media/get_function_url_btn.jpg)
And copy the Function URL
![Click the Get Function Url Button](docs/media/function_url.jpg)
## Create a new skillset in Azure Search that points to your function url
Change content `"source": "document/content"` to the attribute that has the most text in for each document in your search index.
For Example: `"source": "document/description"` if you have a `description` attribute for each document in your search index.
```http
PUT https://[servicename].search.windows.net/skillsets/extractSkills?api-version=2017-11-11-Preview
api-key: [admin key]
Content-Type: application/json
```
```json
{
"skills": [
...,
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Extract skills from each document",
"uri": "[Your function url here]",
"batchSize": 100,
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "skills",
"targetName": "skills"
}
]
}
]
}
```
## Create a new indexer to map the `skills` output field to a search index
```http
PUT https://[servicename].search.windows.net/indexers/[indexer name]?api-version=2017-11-11-Preview
api-key: [admin key]
Content-Type: application/json
```
```json
{
"name": "myIndexer",
"dataSourceName": "myDataSource",
"targetIndexName": "myIndex",
"skillsetName": "extractSkills",
"outputFieldMappings": [
{
"sourceFieldName": "/document/skills",
"targetFieldName": "skills"
}
]
}
```
## Check your search index and you should see a new skills array
```http
PUT https://[servicename].search.windows.net/indexes/[indexer name]/docs?api-version=2017-11-11-Preview&search=machine learning
api-key: [admin key]
Content-Type: application/json
```
## Real World Example Response for skills extracted from a job description from the Microsoft Careers Page
```json
{
"@odata.context": "https://microsoft-careers.search.windows.net/indexes('[indexer name]')/$metadata#docs(*)",
"value": [
{
"@search.score": 1.5600916,
"job_id": 588309,
"title": "Applied Machine Learning Developer",
"description": "The Xbox Advanced Technology Group (ATG) provides deep technical consultation and integration services to game developers world-wide as they create cutting-edge experiences for Microsoft game platforms and services. Our team also does advanced research and incubation and we are seeking an Applied Machine Learning Developer to help us bring Machine Learning techniques to gaming. If you have a strong ML/graphics background and can roll up your sleeves to do the engineering work required to ship high quality ML models in production, this is a unique opportunity to tackle challenging problems in the gaming and graphics space! Responsibilities Develop high-performance machine learning systems for gaming ML scenarios and graphics techniques.Develop optimized GPU kernels for deep learning inference via DirectX and DirectML and other graphics technologies.Do analysis on performance on internal and 3PP machine learning models and propose improvements and optimizations.Develop infrastructure as required to enable high performance ML in gaming.Act as an expert in applied machine learning to serve the needs of Game Developers Qualifications Required Qualifications: Masters or PhD or equivalent experience in relevant discipline (Machine Learning, Computer Science, Engineering)3+ years of relevant work experienceExperience in GPU programming (DirectX, Vulkan, OpenCL, etc) and performance modelling, profiling and optimization. Preferred Qualifications:Game Development experience - one or more shipped titles.Experience with Azure cloud services.Experience with Mobile/Tablet machine learning developmentMicrosoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form. Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work. XGAXGCATG#gamingjobs",
"category": "Engineering",
"subCategory": "Software Engineering",
"country": "United States",
"state": "Washington",
"id": "513c8222-8004-d30b-8993-4c759fb05c9b",
"skills": [
"machine-learning",
"engineering",
"deep-learning",
"inference",
"computer-science",
"game-development",
"azure",
"mobile"
]
}
...
]
}
```
## Troubleshooting
* ### If you get timeout errors while running the indexer in Azure Search, change `batchSize` in the skillset definition to a lower number (e.g. 50)
For Example:
```http
PUT https://[servicename].search.windows.net/skillsets/extractSkills?api-version=2017-11-11-Preview
api-key: [admin key]
Content-Type: application/json
```
```json
{
"skills": [
...,
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Extract skills from each document",
"uri": "[Your function url here]",
"batchSize": 50,
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "skills",
"targetName": "skills"
}
]
}
]
}
```

Двоичные данные
docs/media/function_url.jpg Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 22 KiB

Двоичные данные
docs/media/get_function_url_btn.jpg Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 7.2 KiB

Двоичные данные
docs/media/skills_field.jpg Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 9.5 KiB