* update nlr_versions.json for new models

* initialization

* documentation .1

* documentation .2

* documentation .3

* documentation .3

* interactive documentation .5

* interactive documentation .6

* documentation .7

* interactive documentation .8

* schoolnavigator data

* Revert "schoolnavigator data"

This reverts commit 709fb56ba5.

* some report organization

* edit docs

* Delete evaluation-sessions.PNG

* Update BFOrchestratorUsage.md

* moving orchestrator docs to sdk

* fix link to readme refs

* add screen snapshots for the report document

* TDB documented

* thresholds

* Update DispatchMigrationExample.md

Co-authored-by: nigao <nigao@microsoft.com>
Co-authored-by: Eyal Schwartz <eyals@microsoft.com>
Co-authored-by: Tien Suwandy <tiens@microsoft.com>
This commit is contained in:
Hung-chih Yang 2020-11-12 12:01:20 -08:00 коммит произвёл GitHub
Родитель 16773c9b6c
Коммит 856c8db6c8
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
22 изменённых файлов: 1139 добавлений и 93 удалений

Просмотреть файл

@ -1,20 +1,177 @@
# Orchestrator [PREVIEW]
# Orchestrator (PREVIEW)
Orchestrator Preview documentation is available in the *Experimental* section:
Conversational AI applications today are built using disparate technologies to fulfill language understanding (LU) needs e.g. [LUIS][1], [QnA Maker][2]. Often, conversational AI applications are also built by assembling different [skills][3] each of which fulfill a specific conversation topic and can be built using different LU technologies. Hence, conversational AI applications typically require LU arbitration/ decision making to route incoming user request to an appropriate skill or to dispatch to a specific sub-component. Orchestration refers to the ability to perform LU arbitration/ decision making for a conversational AI application.
See https://aka.ms/bf-orchestrator.
[Orchestrator][18] is a [transformer][4] based solution that is optimized for conversational AI applications. It is built ground-up to run locally with your bot.
## Scenarios
**Dispatch**: Orchestrator is a successor to [dispatch][5]. You can use Orchestrator instead of the current dispatch solution to arbitrate across your [LUIS][1] and [QnA Maker][2] applications. With Orchestrator, you are likely to see:
- Improved classification accuracy
- Higher resilience to data imbalance across your LUIS and QnA Maker authoring data.
- Ability to correctly dispatch from relatively little authoring data.
**Intent Recognizer**: You can use Orchestrator as an intent recognizer with [Adaptive dialogs][6]. Using the same approach as in the dispatch scenario above in order to route to responses within your bot logic.
**Entity Extraction** is not supported yet. It is on the planning roadmap to add entity extraction in the future.
## Authoring Experience
Orchestrator can be used in different development environments:
* **Code First**: Orchestrator can be integrated into your code project by replacing LUIS for intent recognition such as for skill delegation or dispatching to subsequent language understanding services. See [Runtime Integration](#runtime-integration) section for more.
* [Bot Framework Composer][19]: Orchestrator can be selected as a recognizer within Bot Framework Composer. At this point there are limitations to using Orchestrator in Composer primarily around importing of existing models and tuning up recognition performance (* currently available only if the feature flag is enabled with Composer).
Thus, use of [BF command line tool][7] to prepare and optimize the model for your domain is required in most, if not all, use cases. To illustrate the workflow, here is a sample the end to end authoring experience:
<p align="center">
<img width="350" src="./docs/media/authoring.png" />
</p>
### Prepare
* Pre-requisite: Install [BF CLI Orchestrator plugin][11] first.
1. Author Intent-utterances example based .lu definition referred to as a *label file* using the Language Understanding practices as described in [Language Understanding][2] for dispatch (e.g. author .lu file or within the [Composer][3] GUI experience).
* Alternatively, [export][8] your LUIS application and [convert][9] to .lu format or [export][10] your QnA Maker KB to .qna format.
* See also the [.lu file format][21] to author a .lu file from scratch.
2. Download Natural Language Representation ([NLR][20]) base Model (will be referred to as the *basemodel*) using the `bf orchestrator:basemodel:get` command.
* See `bf orchestrator:basemodel:list` for alternate models. You may need to experiment with the different models to find which performs best for your language domain.
3. Combine the label file .lu from (1) with the base model from (2) to create a *snapshot* file with a .blu extension.
* Use [`bf orchestrator:create`][16] to create just a single .blu snapshot file for all Lu/json/qna tsv files for dispatch scenario.
### Validate
* Create another test .lu file similar to (1) with utterances that are similar but are not identical to the ones specified in the example based .lu definition in (1). This is typically variations on end-user utterances.
* Test quality of utterance to intent recognition.
* Examine report to ensure that the recognition quality is satisfactory. See more in [Report Interpretation][22].
* If not, adjust the label file in (1) and repeat this cycle.
## Runtime Integration
For use in dispatch scenario, you can create `OrchestratorRecognizer` and provide it the path to the model as well the snapshot. Use the `RecognizeAsync` (C#), `recognizeAsync` (JS) method to have Orchestrator recognize user input.
**C#:**
- Add reference to `Microsoft.Bot.Builder.AI.Orchestrator` package.
- Set your project to target `x64` platform
- Install latest supported version of [Visual C++ redistributable package](https://support.microsoft.com/en-gb/help/2977003/the-latest-supported-visual-c-downloads)
```C#
using Microsoft.Bot.Builder.AI.Orchestrator;
// Get Model and Snapshot path.
string modelPath = Path.GetFullPath(OrchestratorConfig.ModelPath);
string snapshotPath = Path.GetFullPath(OrchestratorConfig.SnapshotPath);
// Create OrchestratorRecognizer.
OrchestratorRecognizer orc = new OrchestratorRecognizer()
{
ModelPath = modelPath,
SnapshotPath = snapshotPath
};
// Recognize user input.
var recoResult = await orc.RecognizeAsync(turnContext, cancellationToken);
```
**JS:**
- Add `botbuilder-ai-orchestrator` package to your bot
```JS
const { OrchestratorRecognizer } = require('botbuilder-ai-orchestrator');
// Create OrchestratorRecognizer.
const dispatchRecognizer = new OrchestratorRecognizer().configure({
modelPath: process.env.ModelPath,
snapshotPath: process.env.SnapShotPath
});
// To recognize user input
const recoResult = await dispatchRecognizer.recognize(context);
```
2. For use in adaptive dialogs, set the `recognizer` to `OrchestratorAdaptiveRecognizer`
**C#:**
- Add reference to `Microsoft.Bot.Builder.AI.Orchestrator` package.
```C#
using Microsoft.Bot.Builder.AI.Orchestrator;
// Get Model and Snapshot path.
string modelPath = Path.GetFullPath(OrchestratorConfig.ModelPath);
string snapshotPath = Path.GetFullPath(OrchestratorConfig.SnapshotPath);
// Create adaptive dialog
const myDialog = new AdaptiveDialog()
{
// Set Recognizer to OrchestratorAdaptiveRecognizer.
Recognizer = new OrchestratorAdaptiveRecognizer()
{
ModelPath = modelPath,
SnapshotPath = snapshotPath
}
}
```
**JS:**
- Add `botbuilder-ai-orchestrator` package to your bot.
```JS
const { OrchestratorAdaptiveRecognizer } = require('botbuilder-ai-orchestrator');
// Create adaptive dialog.
const myDialog = new AdaptiveDialog('myDialog').configure({
// Set recognizer to OrchestratorAdaptiveRecognizer.
recognizer: new OrchestratorAdaptiveRecognizer().configure(
{
modelPath: new StringExpression(process.env.ModelPath),
snapshotPath: new StringExpression(process.env.RootDialogSnapshotPath),
});
})
```
## References
## Composer Integration
\<TBD: This section is FYI in preparation for upcoming Composer functionality. It will be updated once ready.>
Once the feature flag is enabled in Composer, it is possible to specify Orchestrator as a recognizer. For the most basic intent recognition cases, simply specify Orchestrator as the recognizer, and fill in the language data as you would for LUIS. For more advanced scenarios, such as dispatch orchestration, follow the steps above to import and tune up routing quality.
- [Tech overview](https://github.com/microsoft/BotBuilder-Samples/blob/main/experimental/orchestrator/docs/Overview.md)
- [API reference](https://github.com/microsoft/BotBuilder-Samples/blob/main/experimental/orchestrator/docs/API_reference.md)
- [Roadmap](https://github.com/microsoft/BotBuilder-Samples/blob/main/experimental/orchestrator/docs/API_reference.md#Roadmap)
- [BF CLI Orchestrator plugin](https://github.com/microsoft/botframework-cli/tree/beta/packages/orchestrator )
- [Natural Language Representation](https://aka.ms/NLRModels) base models
## Additional Reading
- [Tech overview][18]
- [API reference][14]
- [Roadmap](./docs/Overview.md#Roadmap)
- [BF CLI Orchestrator plugin][11]
- [C# samples][12]
- [NodeJS samples][13]
[1]:https://luis.ai
[2]:https://qnamaker.ai
[3]:https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-skills-overview?view=azure-bot-service-4.0
[4]:https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
[5]:https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-tutorial-dispatch?view=azure-bot-service-4.0&tabs=cs
[6]:https://aka.ms/adaptive-dialogs
[7]:https://github.com/microsoft/botframework-cli
[8]:https://github.com/microsoft/botframework-cli/tree/master/packages/luis#bf-luisversionexport
[9]:https://github.com/microsoft/botframework-cli/tree/master/packages/luis#bf-luisconvert
[10]:https://github.com/microsoft/botframework-cli/tree/master/packages/qnamaker#bf-qnamakerkbexport
[11]:https://github.com/microsoft/botframework-cli/tree/beta/packages/orchestrator
[12]:https://github.com/microsoft/BotBuilder-Samples/tree/main/experimental/orchestrator/csharp_dotnetcore
[13]:https://github.com/microsoft/BotBuilder-Samples/tree/main/experimental/orchestrator/javascript_nodejs
[14]:./docs/API_reference.md
[15]: TBD/AvailableIndex
[16]:https://github.com/microsoft/botframework-cli/tree/beta/packages/orchestrator#bf-orchestratorcreate
[17]:TBD/AvailableIndex
[18]:./docs/Overview.md
[19]: https://docs.microsoft.com/en-us/composer/introduction
[20]: https://aka.ms/NLRModels "Natural Language Representation Models"
[21]:https://docs.microsoft.com/en-us/azure/bot-service/file-format/bot-builder-lu-file-format?view=azure-bot-service-4.0 "LU file format"
[22]:./docs/BFOrchestratorReport.md "report interpretation"

Просмотреть файл

@ -0,0 +1,292 @@
# Orchestrator (PREVIEW2)
## C#
**OrchestratorRecognizer**
```C#
/// <summary>
/// Class that represents an adaptive Orchestrator recognizer.
/// </summary>
public class OrchestratorRecognizer : IRecognizer
{
/// <summary>
/// Initializes a new instance of the <see cref="OrchestratorRecognizer"/> class.
/// </summary>
[JsonConstructor]
public OrchestratorRecognizer()
{
}
/// <summary>
/// Gets or sets the id for the recognizer.
/// </summary>
/// <value>
/// The id for the recognizer.
/// </value>
[JsonProperty("id")]
public string Id { get; set; }
/// <summary>
/// Gets or sets the full path to the NLR model to use.
/// </summary>
/// <value>
/// Model path.
/// </value>
[JsonProperty("modelPath")]
public string ModelPath { get; set; }
/// <summary>
/// Gets or sets the full path to the snapshot to use.
/// </summary>
/// <value>
/// Snapshot path.
/// </value>
[JsonProperty("snapshotPath")]
public string SnapshotPath { get; set; }
/// <summary>
/// Gets or sets the entity recognizers.
/// </summary>
/// <value>
/// The entity recognizers.
/// </value>
[JsonProperty("entityRecognizers")]
public List<EntityRecognizer> EntityRecognizers { get; set; } = new List<EntityRecognizer>();
/// <summary>
/// Gets or sets the disambiguation score threshold.
/// </summary>
/// <value>
/// Recognizer returns ChooseIntent (disambiguation) if other intents are classified within this score of the top scoring intent.
/// </value>
[JsonProperty("disambiguationScoreThreshold")]
public float DisambiguationScoreThreshold { get; set; } = 0.05F;
/// <summary>
/// Gets or sets a value indicating whether detect ambiguous intents.
/// </summary>
/// <value>
/// When true, recognizer will look for ambiguous intents (intents with close recognition scores from top scoring intent).
/// </value>
[JsonProperty("detectAmbiguousIntents")]
public bool DetectAmbiguousIntents { get; set; } = false;
/// <inheritdoc/>
public async Task<RecognizerResult> RecognizeAsync(ITurnContext turnContext, CancellationToken cancellationToken);
}
```
**OrchestratorAdaptiveRecognizer**
```C#
/// <summary>
/// Class that represents an adaptive Orchestrator recognizer.
/// </summary>
public class OrchestratorAdaptiveRecognizer : Recognizer
{
/// <summary>
/// The Kind name for this recognizer.
/// </summary>
[JsonProperty("$kind")]
public const string Kind = "Microsoft.OrchestratorRecognizer";
/// <summary>
/// Property key in RecognizerResult that holds the full recognition result from Orchestrator core.
/// </summary>
public const string ResultProperty = "result";
/// <summary>
/// Initializes a new instance of the <see cref="OrchestratorAdaptiveRecognizer"/> class.
/// </summary>
/// <param name="callerLine">Caller line.</param>
/// <param name="callerPath">Caller path.</param>
[JsonConstructor]
public OrchestratorAdaptiveRecognizer([CallerFilePath] string callerPath = "", [CallerLineNumber] int callerLine = 0);
/// <summary>
/// Initializes a new instance of the <see cref="OrchestratorAdaptiveRecognizer"/> class.
/// </summary>
/// <param name="modelPath">Path to NLR model.</param>
/// <param name="snapshotPath">Path to snapshot.</param>
/// <param name="resolver">Label resolver.</param>
public OrchestratorAdaptiveRecognizer(string modelPath, string snapshotPath, ILabelResolver resolver = null);
/// <summary>
/// Gets or sets the full path to the NLR model to use.
/// </summary>
/// <value>
/// Model path.
/// </value>
[JsonProperty("modelPath")]
public StringExpression ModelPath { get; set; } = "=settings.orchestrator.modelPath";
/// <summary>
/// Gets or sets the full path to the snapshot to use.
/// </summary>
/// <value>
/// Snapshot path.
/// </value>
[JsonProperty("snapshotPath")]
public StringExpression SnapshotPath { get; set; } = "=settings.orchestrator.snapshotPath";
/// <summary>
/// Gets or sets the entity recognizers.
/// </summary>
/// <value>
/// The entity recognizers.
/// </value>
[JsonProperty("entityRecognizers")]
public List<EntityRecognizer> EntityRecognizers { get; set; } = new List<EntityRecognizer>();
/// <summary>
/// Gets or sets the disambiguation score threshold.
/// </summary>
/// <value>
/// Recognizer returns ChooseIntent (disambiguation) if other intents are classified within this score of the top scoring intent.
/// </value>
[JsonProperty("disambiguationScoreThreshold")]
public NumberExpression DisambiguationScoreThreshold { get; set; } = 0.05F;
/// <summary>
/// Gets or sets detect ambiguous intents.
/// </summary>
/// <value>
/// When true, recognizer will look for ambiguous intents (intents with close recognition scores from top scoring intent).
/// </value>
[JsonProperty("detectAmbiguousIntents")]
public BoolExpression DetectAmbiguousIntents { get; set; } = false;
/// <summary>
/// Return recognition results.
/// </summary>
/// <param name="dc">Context object containing information for a single turn of conversation with a user.</param>
/// <param name="activity">The incoming activity received from the user. The Text property value is used as the query text for QnA Maker.</param>
/// <param name="cancellationToken">A cancellation token that can be used by other objects or threads to receive notice of cancellation.</param>
/// <param name="telemetryProperties">Additional properties to be logged to telemetry with the LuisResult event.</param>
/// <param name="telemetryMetrics">Additional metrics to be logged to telemetry with the LuisResult event.</param>
/// <returns>A <see cref="RecognizerResult"/> containing the QnA Maker result.</returns>
public override async Task<RecognizerResult> RecognizeAsync(DialogContext dc, Schema.Activity activity, CancellationToken cancellationToken, Dictionary<string, string> telemetryProperties = null, Dictionary<string, double> telemetryMetrics = null);
}
```
## NodeJS
**OrchestratorRecognizer**
```JS
export class OrchestratorRecognizer extends Configurable {
/**
* Full recognition results are available under this property
*/
public readonly resultProperty: string = 'result';
/**
* Recognizers unique ID.
*/
public id: string;
/**
* Path to the model to load.
*/
public modelPath: string = null;
/**
* Path to the snapshot (.blu file) to load.
*/
public snapshotPath: string = null;
/**
* The entity recognizers.
*/
public entityRecognizers: EntityRecognizer[] = [];
/**
* Threshold value to use for ambiguous intent detection. Defaults to 0.05.
* Any intents that are classified with a score that is within this value from the top
* scoring intent is determined to be ambiguous.
*/
public disambiguationScoreThreshold: number = 0.05;
/**
* Enable ambiguous intent detection. Defaults to false.
*/
public detectAmbiguousIntents: boolean = false;
/**
* Returns recognition result. Also sends trace activity with recognition result.
* @param context Context for the current turn of conversation with the use.
*/
public async recognize(context: TurnContext): Promise<RecognizerResult> {}}
```
**OrchestratorAdaptiveRecognizer**
```JS
export class OrchestratorAdaptiveRecognizer extends Recognizer {
/**
* Recognizers unique ID.
*/
public id: string;
/**
* Path to the model to load.
*/
public modelPath: StringExpression = new StringExpression('');
/**
* Path to the snapshot (.blu file) to load.
*/
public snapshotPath: StringExpression = new StringExpression('');
/**
* Threshold value to use for ambiguous intent detection.
* Any intents that are classified with a score that is within this value from the top scoring intent is determined to be ambiguous.
*/
public disambiguationScoreThreshold: NumberExpression = new NumberExpression(0.05);
/**
* Enable ambiguous intent detection.
*/
public detectAmbiguousIntents: BoolExpression = new BoolExpression(false);
/**
* The entity recognizers.
*/
public entityRecognizers: EntityRecognizer[] = [];
/**
* Intent name if ambiguous intents are detected.
*/
public readonly chooseIntent: string = 'ChooseIntent';
/**
* Property under which ambiguous intents are returned.
*/
public readonly candidatesCollection: string = 'candidates';
/**
* Intent name when no intent matches.
*/
public readonly noneIntent: string = 'None';
/**
* Full recognition results are available under this property
*/
public readonly resultProperty: string = 'result';
/**
* Returns an OrchestratorAdaptiveRecognizer instance.
* @param modelPath Path to NLR model.
* @param snapshoPath Path to snapshot.
* @param resolver Orchestrator resolver to use.
*/
constructor(modelPath?: string, snapshoPath?: string, resolver?: any) {}
/**
* Returns a new OrchestratorAdaptiveRecognizer instance.
* @param dialogContext Context for the current dialog.
* @param activity Current activity sent from user.
*/
public async recognize(dialogContext: DialogContext, activity: Activity): Promise<RecognizerResult> {}
}
```

Просмотреть файл

@ -1,7 +1,7 @@
# Interactive
Orchestrator CLI has an "interactive" command which enables a user to
[BF Orchestrator CLI][1] has an "interactive" command which enables a user to
dynamically interact with an Orchestrator base language model (see examples in [Start an interactive session without a training set](#start-an-interactive-session-without-a-training-set)) and
improve the accuracy of an existing language model (see examples in [Start an interactive session with a training set](#start-an-interactive-session-with-a-training-set)) through some CLI commandlets.
@ -16,8 +16,6 @@ ensuing commandlets for maintaining the base model's example set. These variable
- **"new" intent labels** -- Another cache for storing an array of intent labels, which were mainly
used for changing an utterance's intent labels within an Orchestrator model.
## Scenarios
### Start an interactive session without a training set
@ -846,6 +844,4 @@ Below is the list of the commandlets that can be issued during a 'interactive' s
- [BF Orchestrator CLI](https://aka.ms/bforchestratorcli)
## Links
[1]:https://aka.ms/bforchestratorcli "BF Orchestrator CLI"

Просмотреть файл

@ -1,68 +1,71 @@
# Report Interpretation
Use the BF CLI Orchestrator command to evaluate the performance of an Orchestrator snapshot file (with .blu extension). A snapshot is composed of natural language representation base model (see [models][3]) along with a set of examples as provided in a label file (typically a [.lu file][4]). The snapshot file is used in Bot Framework to detect intents from user utterances.
The [BF Orchestrator CLI][1] has a "test" command for evaluating the performance of an Orchestrator snapshot file (with .blu extension). A snapshot is composed of natural language representation base model (see [models][3]) along with a set of examples as provided in a label file (typically a [.lu file][4]). The snapshot file is used in Bot Framework to detect intents from user utterances.
In order to achieve high quality natural language processing (e.g. intent detection), it is necessary to assess & refine the quality of the model. Although this is much simplified in Orchestrator thanks to its use of pre-trained models, this optimization cycle is still required in order to account for human language variations.
BF CLI contains several commands that can produce a report, most notably bf [orchestrator:test][5] command. See more on Machine Learning evaluation methodology in the [References](# references) section below.
Use the following guidance to interpret the report.
See more on Machine Learning evaluation methodology in the [References](# references) section below.
Use the following guidance to interpret the report and take some actions (such as the ones below) to improve the snapshot file:
- Merge two intent labels into one as the utterances in either are semantically similar.
- Split an intent's utterance pool and create a new intent label as the utterances may not be semanically similar.
- Change an utterance's intent label as the utterance may be semantically closer to a different intent label.
- Rephrase an utterance and make it semantically closer to other utterances labled with the same intent.
- Add more utterances to an intent label as the intent's utterance pool could be too scarce.
- Remove some utterances from an intent label if there were too many utterances label to it.
# Report Organization
The test command thus produces a folder with HTML report and a few supporting artifacts as follows:
- orchestrator_testing_set_ground_truth_instances.json: **TBD**
- orchestrator_testing_set_labels.txt: **TBD**
- orchestrator_testing_set_prediction_instances.json: **TBD**
- orchestrator_testing_set_scores.txt: **TBD**
- orchestrator_testing_set_summary.html: Report summary in HTML format
- orchestrator_testing_set_ground_truth_instances.json: test instance ground-truth file in JSON format.
- orchestrator_testing_set_labels.txt: intent labels in a plain text file.
- orchestrator_testing_set_prediction_instances.json: test instance prediction file in JSON format.
- orchestrator_testing_set_scores.txt: test instance prediction file in a plain TSV format.
- orchestrator_testing_set_summary.html: report summary in HTML format
The report summary contains several sections as follows:
## Intent / Utterance Statistics
This section contains descriptive statistics **TBD: Bot audience is not familiar with term descriptive statistics. Use simpler language** of labels and utterances.
This section contains label and utterance distributions.
It has two statistical sections, one for labels, the other utterances:
It has two statistical sections, one for labels, the other utterances. Attached is an example rendition of the section.
- Label statistics
- Utterance statistics
![Evaluation Report Intent/Utterance Statistics](media/EvaluationReportTabVaIntentUtteranceStatistics.png)
### Label statistics
Label statistics lists the number of utterances labeled to each label. Additional metrics include utterance prevalence (ratio) for every label. The distributions can give Orchestrator users an overall view of the labels and utterances, and whether the distributions are skewed and emphasize too much on some labels, but not others.
Label statistics lists the number of utterances labeled to each label. Additional metrics include utterance prevalence (ratio) for every label. The distributions can give Orchestrator users an overall view of the labels and utterances, and whether the distributions are skewed and emphasize too much on some labels, but not others. A machine learn model may learn more from a label (intent) with more instances (utterances) labeled to it. Thus, a developer can check this table and see if some intent needs more utterances in the snapshot file.
### Utterance statistics
On the other hand, utterance statistics focus on the #label distribution by the utterances. Some utterances are labeled with more than one intents, which might not be desirable. This table reflects the distribution of multi-label utterances.
### How to use this section
**TBD**
On the other hand, utterance statistics focus on the #label distribution by the utterances. Some utterances are labeled with more than one intents, which might not be desirable and could be a bug. This table reflects the distribution of multi-label utterances.
From the above screen snapshot, we can see that there are two utterances labeled twice with distinct labels. Those multi-label utterances will be listed in the next section and the owner can decide to remove the duplicates from the snapshot file.
## Utterance Duplicates
This section reports on utterances with duplicate or multiple labels. A duplicate utterance is detected when it is present more than once. Thus, the report lists the utterances tagged with more than one labels. Sometimes some dataset might contain utterances tagged with the same labels multiple times.
The report also lists the redundancy.
Sometimes some dataset might contain utterances tagged with the same labels multiple times.
The report also lists this redundancy.
This section has two sub-sections:
- Multi-label utterances and their labels
- Duplicate utterance and label pairs
### How to use this section
**TBD**
They report on utterances with duplicate or multiple labels. A duplicate utterance is detected when it is present more than once in a snapshot file. Sometimes some dataset might even contain utterances tagged with the same labels multiple times.
The report also lists the redundancy of label/utterance pairs. Orchestrator will deduplicate such redundancy.
Please see the attached screen snapshot as an example.
![Evaluation Report Utterance Duplicates](media/EvaluationReportTabVaUtteranceDuplicates.png)
## Ambiguous
This section reports on utterances ambiguous predictions. For an evaluation utterance, if an Orchestrator model correctly predicts its intent label, then it's a true positive prediction. However every prediction comes with a score, which is essentially the probability and confidence for the prediction. If the Orchestrator model also makes a high-score prediction close to that of the correctly predicted label, then we call such a prediction "ambiguous."
This section reports on utterances with ambiguous predictions. For an evaluation utterance, if an Orchestrator model correctly predicts its intent label, then it's a true positive instance. Every intent label will be predicted with a score, which is essentially the probability or confidence for that label prediction. The predicted intent usually is the one with the highest score. If the Orchestrator model also makes some other high-score prediction close to that of the correctly predicted label, then we call such a prediction "ambiguous."
In this section, the report lists all the utterances with an ambiguous prediction in a table.
The table has several columns:
@ -70,58 +73,36 @@ The table has several columns:
- Utterance -- the utterance
- Labels -- the true labels for the utterance
- Predictions -- the labels predicted by the Orchestrator model
- Close Predictions -- some other labels predicted with a close high score to that of the predicted label.
- Close Predictions -- some other labels predicted with a close, high score to that of the predicted label.
Besides the prediction score, the report also shows the closest example to the utterance
within the label's utterance set.
Below is a screen snapshot of an ambiguous report:
### How to use this section
![Evaluation Report Ambiguous](media/EvaluationReportTabVaAmbiguous.png)
**TBD**
Sometimes some dataset might contain utterances tagged with the same labels multiple times.
The report also lists this redundancy.
- Multi-label utterances and their labels
- Duplicate utterance and label pairs
### How to use this section
**TBD**
###
Ambiguous utterances can be a sign of overlapping intent labels. In another word, two intent labels may have their utterance pools semantically close to each other. In the example report about, the utterance "what my events today" was correctly predicted with the Calendar intent label and the closest example is
"What is on my calendar today". However the FAQ3 intent was also predicted with a high school and the closest example from that intent is "What's going on today." From these two closest example, one can remove the latter from the FAQ3 intent as it is now more specific to the Calendar intent, then just a FAQ example.
## Misclassified
This section reports on utterances with incorrect predictions. An a misclassified predication is one in which an Orchestrator model falsely predicts its intent label. Usually the label with the highest prediction score is chosen as the predicted label, but
it can be different from the ground-truth label for the utterance.
This section reports on utterances with incorrect predictions. A misclassified predication is one where an Orchestrator model falsely predicts its intent label. Usually the label with the highest prediction score is chosen as the predicted label, but it can be different from the ground-truth label for the utterance.
Similar to the last section, the report also lists the prediction and ground-truth labels with
their prediction scores and closest examples.
### How to use this section
**TBD**
their prediction scores and closest examples. Below is a screen snapshot of the misclassified report. A user can also follow the utterances list and decide to update their intent labels and or rephrase the utterances themselves.
![Evaluation Report Misclassified](media/EvaluationReportTabVaMisclassified.png)
## Low Confidence
This section reports on predictions that scored too low to be considered "confident" intent detection.
Sometimes a prediction may be predicted correctly with the highest scores among all labels, but the score is very low, lower than the provided threshold (see more on thresholds here **TBD**). We call such predictions low confidence.
Just like the last sections, the report lists the prediction and ground-truth labels with their prediction scores and closest examples.
### How to use this section
**TBD**
Sometimes a prediction may be predicted correctly with the highest scores among all labels, but the score is very low, lower than a threshold. We call such predictions low confidence.
Notice that there are several default thresholds used to guide producing the report sections thus far. Usually an Orchestrator user can just predict a label with the highest score, but he/she can also utilize some thresholds for a fine-tuned prediction in a chat bot's dialog logic.
Just like the last sections, the report lists the prediction and ground-truth labels with their prediction scores and closest examples. Also like the previous sections, the utterances listed in this section can guide a user how to improve the snapshot file.
![Evaluation Report Low Confidence](media/EvaluationReportTabVaLowConfidence.png)
## Metrics
@ -132,6 +113,8 @@ Advanced machine-learning practitioners may analyze the overall model performanc
- Confusion matrix metrics
- Average confusion matrix metrics
![Evaluation Report Metrics](media/EvaluationReportTabVaMetrics.png)
### Confusion matrix metrics
In this table, the Orchestrator CLI test command reads an evaluation set with ground-truth labels. An evaluation set contains a collection of utterances and their labels. It then calls the Orchestrator base model and makes a prediction for every utterance in the set and generate predicted labels for every utterance. It then compares the predicted labels against the ground-truth labels and creates a table of per-label binary confusion matrices.
@ -175,6 +158,15 @@ these metrics for an overall metric and model performance.
There are many nuanced ways to aggregate confusion matrix metrics. For comparing models, it's critical
to compare based on a consistent formula. Please reference the [BF Orchestrator CLI][1] readme page for advanced CLI usage details.
## Thresholds
This evaluation report is created using several thresholds that they can also be useful in building chat bot logic. These thresholds can be reset through environment variables listed below:
- ambiguousClosenessThreshold: default to 0.2, which means that if there are labels predicted to have a score close to within 20% of the top and corrected predicted label, then this utterance is ambiguously predicted.
- lowConfidenceScoreThreshold: default to 0.5, which means that if the top predicted score is lower than 0.5, then the prediction is considered low confidence.
- multiLabelPredictionThreshold: default to 1, which means that the report will predict only one label. However, if the threshold is lower than 1, then every label with a predicted score higher than that will be predicted. This threshold is usually used for multi-label, multi-intent scenarios.
- unknownLabelPredictionThreshold:default to 0.3, which means that the evaluation process will consider a prediction UNKNOWN if the score is lower than that threshold.
## References
- [BF Orchestrator CLI](https://aka.ms/bforchestratorcli)
@ -182,8 +174,6 @@ to compare based on a consistent formula. Please reference the [BF Orchestrator
- [Wikipedia: Training, validation, and test sets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets)
- [Machine Learning Mastery](https://machinelearningmastery.com/difference-test-validation-datasets/).
## Links
[1]:https://aka.ms/bforchestratorcli "BF Orchestrator CLI"
[2]:https://en.wikipedia.org/wiki/Confusion_matrix "Wikipedia: Confusion matrix"
[3]:https://aka.ms/nlrmodels "NLR Models"

Просмотреть файл

@ -126,12 +126,12 @@ See [Report Interpretation][6] for more.
## References
- [Orchestrator](https://aka.ms/bf-orchestrator)
- [Language Understanding](https://docs.microsoft.com/en-us/composer/concept-language-understanding)
- [Composer](https://docs.microsoft.com/en-us/composer/introduction)
- [Natural Language Representation Models](https://github.com/microsoft/botframework-cli/blob/main/specs/nlrmodels.md)
- [Wikipedia: Training, validation, and test sets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets)
- [Machine Learning Mastery](https://machinelearningmastery.com/difference-test-validation-datasets/).
- [Orchestrator][1]
- [Language Understanding][3]
- [Composer][5]
- [Natural Language Representation Models][4]
- [Wikipedia: Training, validation, and test sets][9]
- [Machine Learning Mastery][10]
@ -143,8 +143,8 @@ See [Report Interpretation][6] for more.
[6]:https://aka.ms/bforchestratorreport "Orchestrator Report"
[7]:https://aka.ms/bforchestratorinteractive "Orchestrator Interactive Command"
[8]:https://docs.microsoft.com/en-us/composer/concept-language-understanding "Language understanding"
[9]:https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets "ML testing"
[10]:https://machinelearningmastery.com/difference-test-validation-datasets/ "Machine Learning Mastery"

Просмотреть файл

@ -0,0 +1,487 @@
# Example Migration from LUIS Dispatch to Orchestrator
The following article describes how to migrate a legacy *dispatch* based solution to [Orchestrator][3] routing.
In [NLP With Dispatch][2] C# Sample we use LUIS as the top intent arbitrator to redirect intent processing to subsequent language understanding services, LUIS and QnAMaker. Recall that the top routing was performed by using *dispatch* CLI to create a language model combining the subsequent LUIS and QnAMaker models, and creating an aggregate top LUIS application to be used in the bot logic to further delegate utterances to the detected language service.
Here, we will modify that sample to use Orchestrator in place of the top LUIS arbitrator as follows:
<p align="center">
<img width="450" src="./media/dispatch-logic-flow.png" />
</p>
# Prerequisites
* Complete the [NLP With Dispatch][2] C# Sample to serve as the starting point.
* Have access to create & use [LUIS][4] and [QnAMaker][5] services.
* See [Dispatch Sample documentation][1] for full details.
* Install [BF CLI][6]
* Install BF CLI [Orchestrator Plugin][7]
* Bot project must target x64 platform
* Install latest supported version of [Visual C++ runtime](https://support.microsoft.com/en-gb/help/2977003/the-latest-supported-visual-c-downloads)
# Migration Walkthrough
Start with fully working [NLP With Dispatch][2] C# Sample including all language artifacts (output of dispatch CLI).
## Prepare
* Add the ```Microsoft.Bot.Builder.AI.Orchestrator``` assembly and dependencies to your project from nuget package manager.
## Create Orchestrator Language model
* Get Orchestrator base model
* Create a snapshot with dispatcher samples
```
> md model
> md generated
> bf orchestrator:basemodel:get --out model
> bf orchestrator:create --in CognitiveModels\NLPDispatchSample14.json --model model --out generated
"Processing c:\\...\\CognitiveModels\\NLPDispatchSample14.json...\n"
"Snapshot written to c:\\...\\generated\\NLPDispatchSample14.blu"
```
## Modify Settings
* Inspect your LUIS and QnAMaker configurations and modify ```appsettings.json``` so as to specify the two subsequent LUIS applications.
* Add configuration for the top Orchestrator arbitrator (i.e. the new dispatcher)
```
{
"Logging": {
"LogLevel": {
"Default": "Warning"
}
},
"MicrosoftAppId": "",
"MicrosoftAppPassword": "",
"QnAKnowledgebaseId": "--same as in original sample--",
"QnAEndpointKey": "--same as in original sample--",
"QnAEndpointHostName": "--same as in original sample--",
"LuisHomeAutomationAppId": "--pick from generated NLPDispatchSample14.dispatch--",
"LuisWeatherAppId": "--pick from generated NLPDispatchSample14.dispatch--",
"LuisAPIKey": "--same as in original sample--",
"LuisAPIHostName": "Old: westus. New: https://westus.api.cognitive.microsoft.com/",
"Orchestrator": {
"ModelPath": ".\\model",
"SnapshotPath": ".\\generated\\NLPDispatchSample14.blu"
},
"AllowedHosts": "*"
}
```
## Modify Startup Configuration
* The new ```Startup.cs``` file shall include Orchestrator initialization.
* Modify ```(I)BotService.cs``` to expose Orchestrator as dispatch.
* Add class for Orchestrator configuration settings.
**Startup.cs**
```
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
using System.IO;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Bot.Builder;
using Microsoft.Bot.Builder.AI.Orchestrator;
using Microsoft.Bot.Builder.Integration.AspNet.Core;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
namespace Microsoft.BotBuilderSamples
{
public class Startup
{
public OrchestratorConfig OrchestratorConfig { get; }
public Startup(IConfiguration configuration)
{
OrchestratorConfig = configuration.GetSection("Orchestrator").Get<OrchestratorConfig>();
}
// This method gets called by the runtime. Use this method to add services to the container.
public void ConfigureServices(IServiceCollection services)
{
services.AddControllers().AddNewtonsoftJson();
// Create the Bot Framework Adapter with error handling enabled.
services.AddSingleton<IBotFrameworkHttpAdapter, AdapterWithErrorHandler>();
services.AddSingleton<OrchestratorRecognizer>(InitializeOrchestrator());
// Create the bot services (Orchestrator, LUIS, QnA) as a singleton.
services.AddSingleton<IBotServices, BotServices>();
// Create the bot as a transient.
services.AddTransient<IBot, DispatchBot>();
}
// This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
if (env.IsDevelopment())
{
app.UseDeveloperExceptionPage();
}
app.UseDefaultFiles()
.UseStaticFiles()
.UseRouting()
.UseAuthorization()
.UseEndpoints(endpoints =>
{
endpoints.MapControllers();
});
// app.UseHttpsRedirection();
}
private OrchestratorRecognizer InitializeOrchestrator()
{
string modelPath = Path.GetFullPath(OrchestratorConfig.ModelPath);
string snapshotPath = Path.GetFullPath(OrchestratorConfig.SnapshotPath);
OrchestratorRecognizer orc = new OrchestratorRecognizer()
{
ModelPath = modelPath,
SnapshotPath = snapshotPath
};
return orc;
}
}
}
```
**IBotServices.cs**
```
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
using Microsoft.Bot.Builder.AI.Luis;
using Microsoft.Bot.Builder.AI.Orchestrator;
using Microsoft.Bot.Builder.AI.QnA;
namespace Microsoft.BotBuilderSamples
{
public interface IBotServices
{
LuisRecognizer LuisHomeAutomationRecognizer { get; }
LuisRecognizer LuisWeatherRecognizer { get; }
OrchestratorRecognizer Dispatch { get; }
QnAMaker SampleQnA { get; }
}
}
```
**BotServices.cs**
```
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
using Microsoft.Bot.Builder.AI.Luis;
using Microsoft.Bot.Builder.AI.Orchestrator;
using Microsoft.Bot.Builder.AI.QnA;
using Microsoft.Extensions.Configuration;
namespace Microsoft.BotBuilderSamples
{
public class BotServices : IBotServices
{
public OrchestratorRecognizer Dispatch { get; private set; }
public QnAMaker SampleQnA { get; private set; }
public LuisRecognizer LuisHomeAutomationRecognizer { get; private set; }
public LuisRecognizer LuisWeatherRecognizer { get; private set; }
public BotServices(IConfiguration configuration, OrchestratorRecognizer dispatcher)
{
// Read the setting for cognitive services (LUIS, QnA) from the appsettings.json
// If includeApiResults is set to true, the full response from the LUIS api (LuisResult)
// will be made available in the properties collection of the RecognizerResult
LuisHomeAutomationRecognizer = CreateLuisRecognizer(configuration, "LuisHomeAutomationAppId");
LuisWeatherRecognizer = CreateLuisRecognizer(configuration, "LuisWeatherAppId");
Dispatch = dispatcher;
SampleQnA = new QnAMaker(new QnAMakerEndpoint
{
KnowledgeBaseId = configuration["QnAKnowledgebaseId"],
EndpointKey = configuration["QnAEndpointKey"],
Host = configuration["QnAEndpointHostName"]
});
}
private LuisRecognizer CreateLuisRecognizer(IConfiguration configuration, string appIdKey)
{
var luisApplication = new LuisApplication(
configuration[appIdKey],
configuration["LuisAPIKey"],
configuration["LuisAPIHostName"]);
// Set the recognizer options depending on which endpoint version you want to use.
// More details can be found in https://docs.microsoft.com/en-gb/azure/cognitive-services/luis/luis-migration-api-v3
var recognizerOptions = new LuisRecognizerOptionsV2(luisApplication)
{
IncludeAPIResults = true,
PredictionOptions = new LuisPredictionOptions()
{
IncludeAllIntents = true,
IncludeInstanceData = true
}
};
return new LuisRecognizer(recognizerOptions);
}
}
}
```
**OrchestratorConfig.cs**
```
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace Microsoft.BotBuilderSamples
{
public class OrchestratorConfig
{
public string SnapshotPath { get; set; }
public string ModelPath { get; set; }
}
}
```
## Modify Bot Logic
**Bots\DispatchBot.cs**
```
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Authentication;
using Microsoft.Azure.CognitiveServices.Language.LUIS.Runtime.Models;
using Microsoft.Bot.Builder;
using Microsoft.Bot.Schema;
using Microsoft.BotFramework.Orchestrator;
using Microsoft.Extensions.Logging;
namespace Microsoft.BotBuilderSamples
{
public class DispatchBot : ActivityHandler
{
private readonly ILogger<DispatchBot> _logger;
private readonly IBotServices _botServices;
public DispatchBot(IBotServices botServices, ILogger<DispatchBot> logger)
{
_logger = logger;
_botServices = botServices;
}
protected override async Task OnMessageActivityAsync(ITurnContext<IMessageActivity> turnContext, CancellationToken cancellationToken)
{
// Top intent tell us which cognitive service to use.
var allScores = await _botServices.Dispatch.RecognizeAsync(turnContext, cancellationToken);
// var topIntent = allScores.Intents.First().Key;
var topIntent = allScores.GetTopScoringIntent();
string Intent = topIntent.intent;
// Next, we call the dispatcher with the top intent.
await DispatchToTopIntentAsync(turnContext, Intent, allScores, cancellationToken);
}
protected override async Task OnMembersAddedAsync(IList<ChannelAccount> membersAdded, ITurnContext<IConversationUpdateActivity> turnContext, CancellationToken cancellationToken)
{
const string WelcomeText = "Type a greeting, or a question about the weather to get started.";
foreach (var member in membersAdded)
{
if (member.Id != turnContext.Activity.Recipient.Id)
{
await turnContext.SendActivityAsync(MessageFactory.Text($"**NLP with Orchestrator Sample**\n\n{WelcomeText}"), cancellationToken);
}
}
}
private async Task DispatchToTopIntentAsync(ITurnContext<IMessageActivity> turnContext, string intent, RecognizerResult recognizerResult, CancellationToken cancellationToken)
{
string props;
switch (intent)
{
case "l_HomeAutomation":
props = GetRecognizerProperties("Home Automation", recognizerResult.Properties);
await turnContext.SendActivityAsync(MessageFactory.Text(props), cancellationToken);
await ProcessHomeAutomationAsync(turnContext, cancellationToken);
break;
case "l_Weather":
props = GetRecognizerProperties("Weather", (Dictionary<string, object>)recognizerResult.Properties);
await turnContext.SendActivityAsync(MessageFactory.Text(props), cancellationToken);
await ProcessWeatherAsync(turnContext, cancellationToken);
break;
case "q_sample-qna":
props = GetRecognizerProperties("QnAMaker", (Dictionary<string, object>)recognizerResult.Properties);
await turnContext.SendActivityAsync(MessageFactory.Text(props), cancellationToken);
await ProcessSampleQnAAsync(turnContext, cancellationToken);
break;
default:
_logger.LogInformation($"Dispatch unrecognized intent: {intent}.");
await turnContext.SendActivityAsync(MessageFactory.Text($"Dispatch unrecognized intent: {intent}."), cancellationToken);
break;
}
}
private async Task ProcessHomeAutomationAsync(ITurnContext<IMessageActivity> turnContext, CancellationToken cancellationToken)
{
_logger.LogInformation("ProcessHomeAutomationAsync");
// Retrieve LUIS result for HomeAutomation.
var recognizerResult = await _botServices.LuisHomeAutomationRecognizer.RecognizeAsync(turnContext, cancellationToken);
var result = recognizerResult.Properties["luisResult"] as LuisResult;
var topIntent = result.TopScoringIntent.Intent;
await turnContext.SendActivityAsync(MessageFactory.Text($"HomeAutomation top intent: {topIntent}.\n\n"), cancellationToken);
// await turnContext.SendActivityAsync(MessageFactory.Text($"HomeAutomation intents detected\n\n{string.Join("\n\n* ", result.Intents.Select(i => i.Intent))}"), cancellationToken);
if (result.Entities.Count > 0)
{
await turnContext.SendActivityAsync(MessageFactory.Text($"HomeAutomation entities were found in the message:\n\n{string.Join("\n\n* ", result.Entities.Select(i => i.Entity))}"), cancellationToken);
}
}
private async Task ProcessWeatherAsync(ITurnContext<IMessageActivity> turnContext, CancellationToken cancellationToken)
{
_logger.LogInformation("ProcessWeatherAsync");
// Retrieve LUIS result for Weather.
var recognizerResult = await _botServices.LuisWeatherRecognizer.RecognizeAsync(turnContext, cancellationToken);
var result = recognizerResult.Properties["luisResult"] as LuisResult;
var topIntent = result.TopScoringIntent.Intent;
await turnContext.SendActivityAsync(MessageFactory.Text($"ProcessWeather top intent: {topIntent}.\n\n"), cancellationToken);
await turnContext.SendActivityAsync(MessageFactory.Text($"ProcessWeather Intents detected:\n\n{string.Join("\n\n* ", result.Intents.Select(i => i.Intent))}"), cancellationToken);
if (result.Entities.Count > 0)
{
await turnContext.SendActivityAsync(MessageFactory.Text($"ProcessWeather entities were found in the message:\n\n{string.Join("\n\n* ", result.Entities.Select(i => i.Entity))}"), cancellationToken);
}
}
private string GetRecognizerProperties(string Domain, IDictionary<string, object> recognizerResult)
{
StringBuilder resultString = new StringBuilder();
resultString.Append($"**Dispatch: {Domain}**\n\nProperties:\n\n");
IList<BotFramework.Orchestrator.Result> result = (IList < BotFramework.Orchestrator.Result >)recognizerResult["result"];
for (var i = 0; i < result.Count; i++)
{
BotFramework.Orchestrator.Result r = result[i];
resultString.Append($"---\n\n* Closest Text: {r.ClosestText}\n\n");
resultString.Append($"* Label: {r.Label.Name}\n\n");
resultString.Append($"* Score: {r.Score}\n\n");
}
return resultString.ToString();
}
private async Task ProcessSampleQnAAsync(ITurnContext<IMessageActivity> turnContext, CancellationToken cancellationToken)
{
_logger.LogInformation("ProcessSampleQnAAsync");
var results = await _botServices.SampleQnA.GetAnswersAsync(turnContext);
if (results.Any())
{
await turnContext.SendActivityAsync(MessageFactory.Text(results.First().Answer), cancellationToken);
}
else
{
await turnContext.SendActivityAsync(MessageFactory.Text("Sorry, could not find an answer in the Q and A system."), cancellationToken);
}
}
}
}
```
# Summary
Compile and run. The sample will use Orchestrator to arbitrate ("dispatch") to the corresponding language service, LUIS or QnAMaker which will process the intent and respond to the user.
# References
* [NLP With Dispatch Sample][2]
* [Dispatch Sample documentation][1]
[1]:https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-tutorial-dispatch?view=azure-bot-service-4.0&tabs=cs "Legacy dispatch MSDocs"
[2]:https://github.com/Microsoft/BotBuilder-Samples/tree/main/samples/csharp_dotnetcore/14.nlp-with-dispatch "14.nlp-with-dispatch C#"
[3]:https://aka.ms/bf-orchestrator "Orchestrator"
[4]:https://luis.ai "LUIS"
[5]:https://qnamaker.ai "QnAMaker"
[6]:https://github.com/microsoft/botframework-cli "BF CLI"
[7]:https://github.com/microsoft/botframework-cli/tree/beta/packages/orchestrator "Orchestrator plugin"

Просмотреть файл

@ -1,7 +1,3 @@
## -- DRAFT --
# Prebuilt Language Models
Prebuilt language models have been trained towards more sophisticated tasks for both monolingual as well as multilingual scenarios. In public preview only English models are made available.
@ -24,12 +20,11 @@ This is a high quality base model but it is larger and slower than some other op
## References
* [UniLMv2 Paper][1]
* [Base Models Versions Repository][2]
* [KNN (K nearest neighbors algorithm)][3]
* [Model Evalutions][4]
[1]: https://arxiv.org/abs/2002.12804 "UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training"
[2]: https://aka.ms/nlrversions
[3]: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
[4]: ./Overview.md#evaluation-of-orchestrator-on-snips

Просмотреть файл

@ -0,0 +1,129 @@
# Technical Overview
The Orchestrator is a replacement of the [Bot Framework Dispatcher][1] used in chat bots since 2018. It makes the state of the art natural language understanding methods available to bot developers while at the same time making the process of language modeling quick, and not requiring the expertise in [Deep Neural Networks (DNN), Transformers][5], or [Natural Language Processing (NLP)][6]. This work is co-authored with the industry experts in the field and includes some of the top methods used in the [General Language Understanding Evaluation (GLUE)][7] leaderboard. Orchestrator will continue to evolve and adopt the latest advancements in science and in the industry.
Orchestrator enables composability of bots allowing reuse of skills or the entire bots contributed by the community in an easy way without requiring time consuming retraining of the language models. It is our goal to support the community and continue responding to the provided feedback.
## Design Objectives and Highlights
Thanks to the community feedback we compiled a list of objectives and requirements which are addressed in the initial release of the Orchestrator. The [Roadmap](###roadmap) section describes the additional work planned in the upcoming releases.
### No [ML][12] or [NLP][6] expertise required
In the legacy approach so far in order to train a robust language model a significant expertise and time was required to produce a robust model. E.g. the chat bot author would be concerned with proper data distributions, data imbalance, feature-level concerns including generation of various synonym lists etc. When not paying attention to these aspects the final model quality was often poor. With the Orchestrator these aspects are of no concern anymore to the developer and the related expertise is also not required in order to create robust language model (see [Evaluation of Orchestrator on SNIPS](#evaluation-of-orchestrator-on-snips) in the advanced topics section for the evaluation results).
### Minimal or no model training required
Building a language model requires multiple iterations of adding or removing training examples followed by training the model and evaluation. This process may take days or even weeks to accomplish satisfactory results. Also, when using the [transformer][5] model for the classification task a classification layer (or layers) are added and trained making this process expensive, time consuming and often requiring GPU.
To address these concerns, we chose an example-based approach where the language model is defined as a set of labeled examples. In Orchestrator a model example is represented as a vector of numbers (an embedding) obtained from the [transformer model][5] for a given text that the corresponding skills is capable of handling (that's the definition of the application language model in Orchestrator). During runtime a similarity of the new example is calculated comparing it to the existing model examples per skill. The weighted average of *K* closest examples ([KNN algorithm][9]) is taken to determine the classification result. This approach does not require an explicit training step, only calculation of the embeddings for the model examples is done. It takes about 10 milliseconds per example to accomplish that, so a modification of an existing model that adds 100 new examples will take about 1 second which is done locally without GPU and without remote server roundtrips.
### Local, fast library, not a remote service
The Orchestrator core is written in C++ and is available as a library in C#, Node.js and soon Python and Java. The library can be used directly by the bot code (a preferred approach) or can be hosted out-of-proc or on a remote server. Running locally eliminates additional service round trip costs (latency and pricing meters). This is especially helpful when using Orchestrator to dispatch across disparate LU/ QnA services.
Loading the English pretrained language model released for the initial preview takes about 2 sec with the memory footprint of a little over 200MB. Classification of a new example with this initial model takes about 10 milliseconds (depending on the text length). These numbers are for illustration only to give a sense of performance. As we improve the models or include additional languages these numbers will likely change.
### State-of-the-art classification with few training examples
Developers often face an issue of a very few training examples available to properly define the language model. With the powerful pre-trained SOTA models used by the Orchestrator this is not a concern anymore. Even just one example for an intent/skill can often go a long way in making quite accurate predictions. For example, a "Greeting" intent defined with just one example, "hello", can be successfully predicted for examples like "how are you today" or "good morning to you". The power of the pretrained models and their generalization capabilities using a very few simple (and short) examples is impressive. This ability is often called a "few-shot learning" including ["one-shot learning"][11] that the Orchestrator also supports. This ability is made possible thanks to the pretrained models that were trained on large data sets and then optimized for conversation also on large data.
### Ability to classify the "unknown" intent without additional examples
Another common challenge that developers face in handling intent classification decisions is determining whether the top scoring intent should be triggered or not. Orchestrator provides a solution for this. Its scores can be interpreted as probabilities calibrated in such way that the score of 0.5 is defined as the maximum score for an "unknown" intent selected in a way to balance the precision and recall. If the top intent's score is 0.5 or lower the query/request should be considered of an "unknown" intent and should probably trigger a follow up question by the bot. On the other hand, if the score of two intents is above 0.5 then both intents (skills) could be triggered. If the bot is designed to handle only one intent at a time, then the application rules or other priorities could pick the one that gets triggered in this case.
The classification of the "unknown" intent is done without the need for any examples that define the "unknown" (often referred to as ["zero-shot learning"][10]) which would be challenging to accomplish. It would be hard to accomplish this without the heavily pretrained language model especially that the bot application may be extended in the future with additional skills that were "unknown" so far.
### Extend to support Bot Builder Skills
While the [Dispatcher's][1] focus was to aid in triggering between multiple [LUIS][3] apps and [QnA Maker][4] KBs the Orchestrator expands this functionality into supporting generic [Bot Builder Skills][2] to allow composability of bot skills. The skills developed and made available by the community may be easily reused and integrated in a new bot with no language model retraining required. Orchestrator provides a toolkit to evaluate this extension identifying ambiguous examples that should be reviewed by the developer. Also, an optional fine-tuning functionality is available in the CLI but this step is not required in most cases.
### Ease of composability
The language models of skills and even entire bots that are made available by the community can be integrated in a new bot by simply adding their snapshot(s) (see the [API reference][20] for more information). Model snapshots represent skills, group of skills or even entire bots, contain all the language model data required to trigger them. Importing a new model snapshot can be done in runtime and takes just milliseconds. This opens opportunities for interesting scenarios where the model can be modified to emphasize deeper, more specialized skills that are likely to trigger. This flexibility is beneficial in cases of complex dialogs or even for handling the conversation contexts which could include model snapshots.
### Ability to explain the classification results
The ability to explain classification results could be important in an application. In general, attempting to interpret the results of deep learned models (like [transformers][5]) could be very challenging. Orchestrator enables this by providing the closest example in the model to the one that is evaluated. In a case of misclassification this simple mechanism helps the developer in determining whether a new example should be added that defines a skill or if the existing example in the model was mislabeled. This feature simplifies implementation of [reinforcement learning][18] for the bot which can be done by non-experts (the language fluency is only required).
### High performance
The core of Orchestrator is written in C++. Since its runtime algorithms can be easily vectorized the Orchestrator core takes advantage of the vector operators supported by the mainstream CPUs ([SIMD][13]) without the need for a [GPU][14]. As a result, similarity calculation time during [KNN][9] inference is negligible comparing with other local processing tasks even for largest models.
### Compact models
The [transformer][5] models in Orchestrator produce embeddings that are relatively large, over 3kB in size per example (size of the embeddings). If these large embeddings were used directly not only this would increase the runtime memory requirement quite significantly but also would add substantial CPU processing costs. A commonly used similarity measure, cosine similarity, with this size of embeddings and KNN processing would add a significant overhead during inference. Instead of this approach, Orchestrator uses a quantization method that shrinks the embeddings to under 100 bytes in size, reducing the processing time over 50 times while preserving the same level of accuracy. This technology is available already in the initial public preview of Orchestrator.
### Runtime flexibility
It is important to reiterate that the Orchestrator runtime has significantly more flexibility and functionality than a typical [transformer][5] or a generic [ML][12] runtime. In addition to the inference capability the developer has an option to enable the following in the bot code:
*Modify the language model in real-time* - to add additional functionality (expand the language model with additional skills or examples) or perform continuous model improvements using [reinforcement learning][18] techniques (specialized tools to assist with reinforcement learning will be released in the upcoming releases).
*Modify the language model behavior in real-time* - the runtime parameters can be adjusted without restarting the process or even reloading the model. This includes adjusting how strict the intent triggering is (tradeoff between the [precision and recall][19]) which can be dynamically adjusted depending on the phase in the dialog; or adjusting the resiliency to mislabeled or low quality examples that define the model which is done by modifying the KNN-K value (e.g. a case where the model examples were crowd-sourced and not cleaned up yet or when the model is allowed to be adjusted dynamically by many people or when a skill language model definition was added to the bot and not evaluated yet).
### Evaluation of Orchestrator on SNIPS
#### Model attributes
| Model |Base Model |Layers |Encoding time per query | Disk Allocation |
| ------------ | ------------ | ------------ | ------------ | ------------ |
|pretrained.20200924.microsoft.dte.00.03.en.onnx | BERT | 3 | ~ 7 ms | 164M |
|pretrained.20200924.microsoft.dte.00.06.en.onnx | BERT | 6 | ~ 16 ms | 261M |
|pretrained.20200924.microsoft.dte.00.12.en.onnx | BERT | 12 | ~ 26 ms | 427M |
|pretrained.20200924.microsoft.dte.00.12.roberta.en.onnx | RoBERTa | 12 | ~ 26 ms | 486M |
#### Model performance, evaluated by micro-average-accuracy
|Training samples per intent |5 |10 |25 |50 |100 |200 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |------------ |
|pretrained.20200924.microsoft.dte.00.03.en.onnx | 0.756 | 0.839 | 0.904 | 0.929 | 0.943 | 0.951 |
|pretrained.20200924.microsoft.dte.00.06.en.onnx | 0.924 | 0.940 | 0.957 | 0.960 | 0.966 | 0.969 |
|pretrained.20200924.microsoft.dte.00.12.en.onnx | 0.902 | 0.931 | 0.951 | 0.960 | 0.964 | 0.969 |
|pretrained.20200924.microsoft.dte.00.12.roberta.en.onnx | 0.946 | 0.956 | 0.966 | 0.971 | 0.973 | 0.977 |
## Roadmap
In the upcoming releases we are planning to expand Orchestrator in several areas:
### Entity recognition
A commonly requested feature as the part of intent triggering is to provide the "parameters" for the triggered intents which are entities recognized in the query text. The Orchestrator interfaces which are already part of the initial preview support handling the recognized entities. This functionality together with the corresponding prebuilt language model(s) will be made available in the upcoming releases.
### Multi-lingual models
An important extension that will be made in the upcoming releases is the support for multi-lingual models and possibly also specialized international models prioritized by languages supported by other Microsoft offerings.
### Extensibility with custom pretrained language models
The prebuilt language models' format and the runtime supported for the initial release is [ONNX][15]. We will extend Orchestrator to directly support [PyTorch][16] and [TensorFlow][17] model formats and their corresponding runtimes.
### Reinforcement learning
The Orchestrator design with its [flexibility](###runtime-flexibility) provides capability for efficient [reinforcement learning][18] for continuous language model improvements. Additional tools for this purpose to assist with this task and help in its automation will be released in the upcoming releases.
### Expand model tuning capability
Currently all the model parameters (hyper-params) are global for all intents/skills. In the upcoming releases the configuration per intent will be enabled. E.g. for certain intents the triggering should be more strict and for other ones more fuzzy or even with a catch-all type of behavior on the language model level ([precision vs recall][19] control per intent).
### Possible additional improvements based on the preview feedback
As we collect more feedback from the community during the preview there may be additional areas of improvements that well address in the upcoming releases. We encourage users to submit them through GitHub.
[1]:https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-tutorial-dispatch?view=azure-bot-service-4.0&tabs=cs
[2]:https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-skills-overview?view=azure-bot-service-4.0
[3]:https://www.luis.ai/
[4]:https://www.qnamaker.ai/
[5]:https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
[6]:https://en.wikipedia.org/wiki/Natural_language_processing
[7]:https://gluebenchmark.com/leaderboard
[8]:https://github.com/snipsco/nlu-benchmark
[9]:https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
[10]:https://en.wikipedia.org/wiki/Zero-shot_learning
[11]:https://en.wikipedia.org/wiki/One-shot_learning
[12]:https://en.wikipedia.org/wiki/Machine_learning
[13]:https://en.wikipedia.org/wiki/SIMD
[14]:https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
[15]:https://onnx.ai/
[16]:https://en.wikipedia.org/wiki/PyTorch
[17]:https://en.wikipedia.org/wiki/TensorFlow
[18]:https://en.wikipedia.org/wiki/Reinforcement_learning
[19]:https://en.wikipedia.org/wiki/Precision_and_recall
[20]:./API_reference.md

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabEmailAmbiguous.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 43 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 39 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabEmailLowConfidence.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 35 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabEmailMetrics.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 79 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabEmailMisclassified.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 37 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 82 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabVaAmbiguous.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 66 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 32 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabVaLowConfidence.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 65 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabVaMetrics.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 92 KiB

Двоичные данные
Orchestrator/docs/media/EvaluationReportTabVaMisclassified.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 68 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 30 KiB

Двоичные данные
Orchestrator/docs/media/authoring.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 66 KiB

Двоичные данные
Orchestrator/docs/media/dispatch-logic-flow.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 33 KiB