31 KiB
Executable File
Conversational AI (CAI) Advanced Processing Service
CAI Advanced Processing Service is a collection of modules, wrapped in multiple APIs that help you to enrich your conversational AI applications in these three fields:
- Validation
- Identification
- Authentication
Validation | Identification | Authentication | ||
---|---|---|---|---|
Definition | Entity extraction + data preparation for backend processing | Identify a user or asset | Identify a user or asset | |
Example | - License plate format is valid - Customer number format is valid |
- License plate is known in backend system - Customer number exists in customer database |
- Combination of validated entities from user input which are verified with authentication database | |
Technological approaches |
- Via pre/post processing functions based on entities (speech + text input) | - Via pre/post processing functions based on entities (speech + text input) | - Via Authentication processing functions based on entities (speech input) - Via Oauth/Identity Provider validated login (e.g. AAD, etc.) |
General approach and goals
The main field of use are intelligent applications with text-and speech input, such as chat bots or voice bots.
- Provide a modular and extendable pre/post processing service
- Support more flexible user input options in conversational scenarios
- Enable extended, context-based understanding of user input
- Take a channel-based approach where appropriate
- Process/UX Flow Best Practices for common scenarios
High Level Architecture
The image and description below define the general architecture of the VIA system with the respective layers, which are separated in API Layer, Module Layer, Import Layer.
API Layer
- Contains function-specific Logic (e.g. Business rules like IBAN structure) and the html request and response handling when using an http-triggered function.
Module Layer
- Contains re-usable Processing Logic (e.g. Spelling Resolver). A module can be based on phython classes or functions.
Import Layer
- Stores data assets required for the processing (e.g. spelling dictionaries, spelling rules, address table). These can also be split by languages, if support for multiple languages is required.
Tech Stack
The section below describes the teck stacks on which the API collection is built upon.
Azure Functions
The basis of the API collection is an Azure Functions component, which is a serverless infrastructure type offered on Microsoft Azure. It acts as webservice and can be triggered as REST-API. Basically, it is available in multiple setups such as C#, JavaScript and Python - in this case we use Python and recommend to use the Python 3.8 runtime. The minimum scale level should be either App Service or ideally Premium Plan (instead of Consumtion/serverless, due to low scaleability level, for testing purposes it is sufficient tho). The description to the respective plans can be found here. Depending on your scale, a Function, a storage account and an App Service Plan is deployed in your subscription when creating the resource intially. Further, we prefer the Code-based version over a custom Docker-container (see differences in the Azure Functions documentation).
Our recommended setup can be deployed to your subscription using the following template:
Python
We recommend using Python >= 3.7. On top of the Python base installation, some further packages are required to serve the purpose of the API collection. These are listed in the requirements.txt
with the respective version numbers. When deploying the service, it will automatically be used for transferring and installing it.
Other than that, we leverage spaCy and Azure Table Storage as additional frameworks and components.
Module Definition
The single modules that are used across multiple APIs are all stored in the subfolder modules
and will be described below. They may also use another module itself when being accessed by an API.
license_plate_recognizer.py
Functionality:
- Extracts, resolves and validates car license plates
- Easily extendible to support more languages by adding word dictionaries in pre-defined structures
Dependencies:
- External services or libraries:
- Language Understanding Service (LUIS) for extracting license plates from a string
- Modules
resolve_spelling.py
pattern_matcher.py
Functionality:
- Matching of known string patterns to an input string
Dependencies:
- External services or libraries:
- spaCy, required named entity recognition files/models are stored in the
assets/
folder
- spaCy, required named entity recognition files/models are stored in the
request_table.py
Functionality:
- Request or push data records from/to an Azure Table Storage
Dependencies:
- External services or libraries:
- Azure Table Storage account on Azure
resolve_spelling.py
Functionality:
- Advanced text cleaning
- Resolves spelling alphabets in a string as well as letter/number multiplications
Dependencies:
- External services or libraries:
- for Email validator service it needs Language Understanding Service (LUIS) for extracting license plates from a string
similarity_score.py
Functionality:
- Calculate similarity of an input string compared to a ground truth string
- Provides different levels of similarity, ranging from exact match to Levenshtein distance and phonetic matching
Dependencies:
- None
API Documentation
The following section describes the implementation of the services provided as APIs. They can be seen as wrappers around the individual modules described above and each include a component for accepting a request, supplementary business logic around the modules, and returning the results as structured JSON.
Health Check API
Check Health Status of Azure Function, e.g. used by Application Insights or your monitoring system.
URL : /HealthCheck/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : None
Data constraints None, no request parameter or bodies are required
Header constraints API key may be passed via header
Success Response
HTTP-response as text: Healthcheck executed successfully.
Error Response
No response
License Plate Recognizer API
URL : /LicensePlateRecognizer/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : LUIS app information and keys, see Get Your Keys for instructions
Data constraints
{
"query": "[0-500 chars]",
"locale": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"region": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
}
Note that locale
stands for language, region
for country. Optional values - if not passed, de
is set respectively by default.
Example: de
for German license plate recognizer business logic, en
if input language is English.
Header constraints API key may be passed via header
Success Responses
Condition : Data provided, correct app information set and LUIS information is valid.
Code : 200 OK
Content example : Response will reflect back the input sentence, the extracted entity from LUIS and the resolved license plate.
# Successfully extracted license plate
{
"id": 1,
"query": "das ist stuttgart a wie anton dora 22",
"cplQuery": "das ist S-AD22",
"cplEntities": [
{
"entity": "S-AD22",
"type": "licensePlate",
"entitySplit": {
"fullAdminDistrict": "stuttgart",
"adminDistrict": "s",
"letterCombination": "a d",
"numberCombination": "22",
"extra": "",
"ambiguous": false
}
}
],
"entities": [
{
"type": "platenumber",
"text": "stuttgart a wie anton dora 22",
"startIndex": 8,
"length": 29,
"score": 0.99148196,
"modelTypeId": 1,
"modelType": "Entity Extractor",
"recognitionSources": [
"model"
]
}
],
"topScoringIntent": "LicensePlate",
"logs": [
"[INFO] - Set params -> region: de, language: de."
]
}
# If no entity could be extracted
{
"id": 1,
"query": "puh das hab ich gerade nicht zur hand",
"cplQuery": "puh das hab ich gerade nicht zur hand",
"cplEntities": [],
"entities": {},
"topScoringIntent": "None",
"logs": [
"[INFO] - Set params -> region: de, language: de.",
"[WARNING] - No entity could be extracted"
]
}
Error Responses
Condition : If provided data is invalid, e.g. locale/region not supported.
Code : 400 BAD REQUEST
Content example :
[ERROR] Locale not supported
Condition : If no query string (utterance from conversation, which may include a license plate) has been passed.
Code : 400 BAD REQUEST
Content example :
[ERROR] Received a blank request. Please pass a value using the defined format. Example: \{'query':'AB C 1234'\}
Spelling Resolver API
URL : /SpellingResolver/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : None
Data constraints
{
"query": "[0-500 chars]",
"locale": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"convertnumbers": "[true/false, default is true]",
"convertsymbols": "[true/false, default is true]",
"convertmultiplications": "[true/false, default is true]",
"additional_symbols": "[a dictionary, example: {"at":"@", "dash": "-"}, default: {}]",
"allowed_symbols": "[a list, example: ["_", "-", "@", "." ], default: []]",
"extra_specials": "[a list, example: {'werden':'verden'}, default: {}",
"extra_spelling_alphabet": "[a dictionary, example: {'daimler':'d'}, default: null]",
"locale": "de"
}
Note that locale
stands for language. Optional value - if not passed, de
is set by default.
Example: de
for German license plate recognizer business logic, en
if input language is English.
Header constraints API key may be passed via header
Success Responses
Condition : Data provided, correct app information set and LUIS information is valid.
Code : 200 OK
Content example : Response will reflect back the input sentence, as well as the resolved content.
{
"original": "karl heinrich 33 22",
"resolved": "k h 33 22",
"resolved_nospace": "kh3322",
"first_letters": "k h 33 22",
"first_letters_nospace": "kh3322"
}
Error Responses
Condition : If provided data is invalid, e.g. locale/region not supported.
Code : 400 BAD REQUEST
Content example :
[ERROR] Received request with invalid or not supported locale.
Condition : If no query string (utterance from conversation, which may include a license plate) has been passed.
Code : 400 BAD REQUEST
Content example :
[ERROR] Received a blank request. Please pass a value using the defined format. Example: \{'query':'karl heinrich 33 22'\}
VIN Resolver API
URL : /VINResolver/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : LUIS app information and keys, see Get Your Keys for instructions
Data constraints
{
"query": "[0-500 chars]",
"expectedwmi": ["WMI","2WM"],
"locale": "[2-character language code, e.g. de, en, es (cut off after two characters)]"
}
Note that locale
stands for language. Optional values - if not passed, de
is set respectively by default.
Header constraints API key may be passed via header
Success Responses
Condition : Data provided, correct app information set and LUIS information is valid.
Code : 200 OK
Content example : Response will reflect back the input sentence, the extracted entity from LUIS, the information on whether the WMI
is matched with the expectedwmi
list, wether the VIN is valid based on validation rules and details on the VIN
.
# Successfull exvaluation
{
"query": "das ist 2WMCGH3B2CES5C8T2",
"vinQuery": "2wmcgh3b2ces5c8t2",
"validvin": false,
"expectedwmi": true,
"vindetails": {
"region": "north_america",
"country": "Canada",
"validvin": false,
"year": 1982,
"make": "Western Star",
"manufacturer": "Western Star",
"is_pre_2010": true,
"wmi": "2WM",
"vds": "CGH3B2",
"vis": "CES5C8T2",
"vsn": "S5C8T2",
"less_than_500_built_per_year": false
},
"entities": {
"vin": [
"2WMCGH3B2CES5C8T2"
],
"$instance": {
"vin": [
{
"type": "vin",
"text": "2WMCGH3B2CES5C8T2",
"startIndex": 8,
"length": 17,
"score": 0.99916714,
"modelTypeId": 1,
"modelType": "Entity Extractor",
"recognitionSources": [
"model"
]
}
]
}
},
"topScoringIntent": "VINResolver"
}
"entities" returns LUIS response.
# If no entity could be extracted
{
"query": "das ist ",
"validvin": false,
"vindetails": {},
"entities": {},
"message": [
"[WARNING] - No entity could be extracted"
]
}
Error Responses
Condition : If provided data is invalid, e.g. locale not supported.
Code : 400 BAD REQUEST
Content example :
[ERROR] Locale not supported
Condition : If no query string (utterance from conversation, which may include a license plate) has been passed.
Code : 400 BAD REQUEST
Content example :
[ERROR] Received a blank request. Please pass a value using the defined format. Example: {'query':'das ist 2A4GM684X6R632476', 'expectedwmi': ['WDC'],'locale': 'de'}
AtributeValidator API
URL : /AttributeValidator/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : for email validator, LUIS app information and keys, see Get Your Keys for instructions
Data constraints AtributeValidator API suppurt the following attributes:
-
address
-
street_in_city
-
zip
-
iban
-
email
the request body for AtributeValidator is different depending on the attribute. -
For address attribute, the request body is:
{
"region": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"module": "address",
"values": {
"zip": "99999",
"city": "berlin",
"street": "jordanstrasse",
"number": 10
}
}
- For street_in_city attribute, the request body is (Prüft Straße in ZIP):
{
"region": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"module": "street_in_city",
"values": {
"zip": "99999",
"city": "berlin",
"street": "jordanstrasse",
"number": 10
}
}
- For zip attribute, the request body is:
{
"region": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"module": "zip",
"values": {
"zip": "99999",
"city": "duesseldorf "
}
}
- For iban attribute, the request body is (Prüft IBAN):
{
"region": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"module": "iban",
"values": {
"iban": "DE12345678901234567890"
}
}
- For email attribute, the request body is (Prüft E-Mail):
{
"region": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"locale": "[2-character language code, e.g. de, en, es (cut off after two characters)]",
"module": "email",
"values": {
"email": "max.mustermann@example.com"
}
}
Note that locale
stands for language. Optional values - if not passed, de
is set respectively by default.
Header constraints API key may be passed via header
Success Responses
Condition : Data provided, correct app information set and LUIS information is valid.
Code : 200 OK
Content example :
- For address validation, the response will be:
{
"error": false,
"city_is_valid": true,
"zip": "10115",
"city": "Berlin",
"street_is_valid": true,
"street_has_options": false,
"street": "Bergstr.",
"number": 10
}
- For street_in_city validation, the response will be:
{
"error": false,
"is_valid": true,
"has_options": false,
"street": "Bergstr.",
"number": "10"
}
- For zip validation, the response will be:
{
"error": false,
"is_valid": true,
"zip": "10115",
"city": "Berlin"
}
- For iban validation, the response will be:
{
"error": false,
"is_valid": true,
"iban": "DE02120300000000202051"
}
- For email validation, the response will be:
{
"query": "Normen.meyer@daimler.com",
"e-mail recognized": true,
"e-mail": "normen.meyer@daimler.com",
"entities": {
"company_name": [
"Normen.meyer@daimler.com"
],
"email_spelled": [
"Normen.meyer@daimler.com"
],
"email": [
"normen.meyer@daimler.com"
],
"domain": [
"daimler.com"
],
"$instance": {
"company_name": [
{
"type": "company_name",
"text": "Normen.meyer@daimler.com",
"startIndex": 0,
"length": 24,
"score": 0.46415412,
"modelTypeId": 1,
"modelType": "Entity Extractor",
"recognitionSources": [
"model"
]
}
],
"email_spelled": [
{
"type": "email_spelled",
"text": "Normen.meyer@daimler.com",
"startIndex": 0,
"length": 24,
"score": 0.76997584,
"modelTypeId": 1,
"modelType": "Entity Extractor",
"recognitionSources": [
"model"
]
}
],
"email": [
{
"type": "builtin.email",
"text": "Normen.meyer@daimler.com",
"startIndex": 0,
"length": 24,
"modelTypeId": 2,
"modelType": "Prebuilt Entity Extractor",
"recognitionSources": [
"model"
]
}
],
"domain": [
{
"type": "domain",
"text": "daimler.com",
"startIndex": 13,
"length": 11,
"score": 0.9899364,
"modelTypeId": 1,
"modelType": "Entity Extractor",
"recognitionSources": [
"model"
]
}
]
}
},
"topScoringIntent": "GetEntities"
}
"entities" returns LUIS response.
```json
# If no entity could be extracted
{
"query": "mein email",
"e-mail recognized": false,
"e-mail": "",
"entities": {},
"topScoringIntent": "GetEntities"
}
Error Responses
Condition : If provided data is invalid, e.g. region not supported.
Code : 200 OK
Content example :
[ERROR] Locale not supported
{
"error": true,
"is_valid": false,
"error_message": "Locale US not supported."
}
Condition : If no query string (utterance from conversation, which may include a license plate) has been passed.
Code : 200 BAD REQUEST
Content example :
{
"error": false,
"error_message": "Submitted IBAN is not a valid IBAN for DE with length of 22",
"is_valid": false
}
Table Requestor API
URL : /TableRequestor/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : Table storage connection string, see Get Your Keys for instructions
Data constraints
The request structure in the params
section depends on the structure of your data in the table storage. The entire, unfiltered data set can be requested by only passing the PartitionKey
in the request's params
section.
{
"table": {
"name": "[0-100 chars]"
},
"params": {
"PartitionKey": "[name of partition-key]",
"Lastname": "[example, depends on the column names in your table storage]"
}
}
Header constraints API key may be passed via header
Success Responses
Condition : Data provided, successful table storage authentication and valid table name.
Code : 200 OK
Content example : Response will reflect back an array of results that match to the filter parameters.
{
"table": {
"name": "CustomerData"
},
"params": {
"PartitionKey": "CustomerData",
"LastName": "Nadella"
}
}
Error Responses
Condition : If connection data is not valid or table does not exist.
Code : 400 BAD REQUEST
Content example :
[ERROR] - Connection to table storage could not be established, please verify the connection string and table name.
Condition : If empty or invalid request body has been passed.
Code : 400 BAD REQUEST
Content example :
[ERROR] - Pass a table and set of variables you want to look up in the customer data base, for example:
{'table': {'name': 'UserData'}, 'params': {'PartitionKey': 'UserData'}}.
Authentication API
URL : /Authentication/
Method : GET
/ POST
Auth required : Only deployed version, if authentication is activated (strongly recommended)
Permissions required : Table storage connection string, see Get Your Keys for instructions
Data constraints
The request structure in the attributes
section depends on the structure of your data in the table storage and the manifest definition for the authentication. The method
field controls whether we use levenstein and/or phonethic matching. The verbose
setting allows for a more detailed debugging output but should be set to false
in productive implementation.
{
"attributes":
{
"Firstname": "Satya",
"Lastname": "Nadella",
"Birthdate": "1900-02-03",
"Id": "1234"
},
"method": 4,
"verbose": true,
"region": "de",
"locale": "de"
}
Header constraints API key may be passed via header
Success Responses
Condition : Data provided, successful table storage authentication and valid table name.
Code : 200 OK
Content example : Response will reflect back an array of results that match to the filter parameters.
{
"result": {
"authenticated": false
},
"verbose": {
"attributes": {
"Birthdate": false,
"Firstname": true,
"Lastname": true
},
"method": 4
}
}
Error Responses
Condition : If connection data is not valid or table does not exist.
Code : 400 BAD REQUEST
Content example :
[ERROR] - Connection to table storage could not be established, please verify the connection string and table name.
Condition : If empty or invalid request body has been passed.
Code : 400 BAD REQUEST
Content example :
[ERROR] - Pass a table and set of variables you want to look up in the customer data base, for example:
{'table': {'name': 'UserData'}, 'params': {'PartitionKey': 'UserData'}}.
Operations
The section below describes the frameworks to be installed locally before you can get started testing, debugging and deploying the service.
Local Installation
First, you have to install/set up following components:
-
PowerShell
- Azure Command Line Interface (CLI), command line tools for Azure using PowerShell
- Azure Functions Core Tools, download for your local runtime environment, e.g. as
.exe
-> v3.x: Windows 64-Bit - A restart is highly recommended or even required after installing these components, otherwise you might face some hiccups.
-
Python >= 3.7
- We recommend you to use the official version from the Python website, make sure you install
pip
and set Python aspath
variable during the installation process.
- We recommend you to use the official version from the Python website, make sure you install
-
Postman
- Framework for API testing, download it here and install it.
Testing and Debugging
- Get your code from GitHub:
git clone https://github.com/microsoft/looky
andcd
into the environment - Create a virtual environment:
python –m venv .venv
- Activate the virtual environment:
source .venv/bin/activate
(Linux) or.venv\Scripts\activate
(Windows), typedeactivate
to disable it again if needed - Install the requirements:
pip install -r requirements.txt
- Set your keys (only for local development and debugging) in the
config.ini
(they are needed for the LUIS request) - For debugging and local testing, open a separate PowerShell window and execute
func start --verbose
in the root folder of the function. This enables you to do code changes during runtime without shutting down the function completely when there is an issue - Use Postman for testing the endpoints using the localhost request of this collection
Deployment to Azure
- Open your PowerShell
- Activate your environment, if you haven't before:
source .venv/bin/activate
(Linux) or.venv/Scripts/activate
(Windows) - Login to your Azure Account:
az login
(a browser window will open, where you may have to log on Azure) - Execute the command below:
func azure functionapp publish [insert your function name] --remote build
- Wait until the deployment is finished
- (optional, only has to be done for initial deployment OR when settings are updated) Execute following command:
az webapp config appsettings set -g [insert name of resource group] -n [insert your function name] --settings @appsettings.json
- Use Postman for testing the endpoints with the collections
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.