Added RAI Text Classification Scenario, Workflow and config ini updated (#2780)

This commit is contained in:
rkamat25 2023-11-10 01:30:10 +05:30 коммит произвёл GitHub
Родитель 9b749bd5c2
Коммит 461da2c035
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
23 изменённых файлов: 2121 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,77 @@
# This code is autogenerated.
# Code is generated by running custom script: python3 readme.py
# Any manual changes to this file may cause incorrect behavior.
# Any manual changes will be overwritten if the code is regenerated.
name: sdk-responsible-ai-text-responsibleaidashboard-text-classification-financial-news-responsibleaidashboard-text-classification-financial-news
# This file is created by sdk/python/readme.py.
# Please do not edit directly.
on:
workflow_dispatch:
schedule:
- cron: "49 0/12 * * *"
pull_request:
branches:
- main
paths:
- sdk/python/responsible-ai/text/responsibleaidashboard-text-classification-financial-news/**
- .github/workflows/sdk-responsible-ai-text-responsibleaidashboard-text-classification-financial-news-responsibleaidashboard-text-classification-financial-news.yml
- sdk/python/dev-requirements.txt
- infra/bootstrapping/**
- sdk/python/setup.sh
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: check out repo
uses: actions/checkout@v2
- name: setup python
uses: actions/setup-python@v2
with:
python-version: "3.8"
- name: pip install notebook reqs
run: pip install -r sdk/python/dev-requirements.txt
- name: pip install mlflow reqs
run: pip install -r sdk/python/mlflow-requirements.txt
- name: azure login
uses: azure/login@v1
with:
creds: ${{secrets.AZUREML_CREDENTIALS}}
- name: bootstrap resources
run: |
echo '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}';
bash bootstrap.sh
working-directory: infra/bootstrapping
continue-on-error: false
- name: setup SDK
run: |
source "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh";
source "${{ github.workspace }}/infra/bootstrapping/init_environment.sh";
bash setup.sh
working-directory: sdk/python
continue-on-error: true
- name: setup-cli
run: |
source "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh";
source "${{ github.workspace }}/infra/bootstrapping/init_environment.sh";
bash setup.sh
working-directory: cli
continue-on-error: true
- name: run responsible-ai/text/responsibleaidashboard-text-classification-financial-news/responsibleaidashboard-text-classification-financial-news.ipynb
run: |
source "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh";
source "${{ github.workspace }}/infra/bootstrapping/init_environment.sh";
bash "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh" generate_workspace_config "../../.azureml/config.json";
bash "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh" replace_template_values "responsibleaidashboard-text-classification-financial-news.ipynb";
[ -f "../../.azureml/config" ] && cat "../../.azureml/config";
papermill -k python responsibleaidashboard-text-classification-financial-news.ipynb responsibleaidashboard-text-classification-financial-news.output.ipynb
working-directory: sdk/python/responsible-ai/text/responsibleaidashboard-text-classification-financial-news
- name: upload notebook's working folder as an artifact
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: responsibleaidashboard-text-classification-financial-news
path: sdk/python/responsible-ai/text/responsibleaidashboard-text-classification-financial-news

Просмотреть файл

@ -41,3 +41,7 @@ COMPUTE_NAMES = demand-fcst-mm-cluster
[automl-forecasting-distributed-tcn]
USE_FORECAST_REQUIREMENTS = 0
COMPUTE_NAMES = "distributed-tcn-cluster"
[responsibleaidashboard-text-classification-financial-news]
USE_FORECAST_REQUIREMENTS = 0
COMPUTE_NAMES = "cpucluster"

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 36 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 31 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 36 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 38 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 38 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 44 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 93 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 49 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 110 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 117 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 97 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 115 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 115 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 76 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 39 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 26 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 64 KiB

Просмотреть файл

@ -0,0 +1,232 @@
## Text Classification with traditional NLP
One of the many uses of text classification is to categorize news articles.
This can be done on one, or more, of many different dimensions.
Topic, length, source, are just a few possibilities.
In this example we will focus business news articles related to financial services topics.
## Deploy and run the notebook
This demo relies on the included Jupyter notebook.
This notebook does the following:
- Loads and transforms data
- Trains and tests a model
- Creates a Responsible AI dashboard
Once you load this notebook to your Azure ML studio workspace, follow the steps described in the notebook to create your RAI dashboard.
Then return to this page to explore the dashboard.
For help finding the RAI dashboard, please review this [information](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-responsible-ai-dashboard)
## Synthetic dataset and model
**Machine learning model** -- The notebook trains a **()**.
**WARNING: Use of synthetic data**
This accelerator was developed using synthetic data to emphasize the importance of data privacy.
For this reason, you my find some anomalies in certain metrics or dashboard components.
These should not distract from the demonstration.
The tools and techniques described here remain valid, despite any data shortcomings that may be present.
Data dictionary
|Type | Feature Name | Feature Description|
|---------------|-----------------------|----------------------|
Feature |Article ID |Unique identifier for each article
Feature |Article Heading |The title or headline of the article
Feature |Article Text |Text from the main body of the article
Feature |Category |The category or class to which the article belongs
Here are descriptions of the different article categories:
- Banking and finance - Debt Market: News and updates related to the debt market within the banking and finance industry
- Stock Market Updates: News and updates about the stock market, including stock prices, market indices, and company earnings
- Business: News and articles about general business topics, such as corporate strategies, management, and industry trends
- Real Estate: News and updates related to the real estate industry, including property market trends and housing prices
- Cryptocurrency: News and information about cryptocurrencies, blockchain technology, and related developments
- Personal Finance: News and tips related to personal finance management, including budgeting, saving, and investing
- Financial Regulations: News and updates about financial regulations, compliance requirements, and legal developments
## Debugging the model
In data science and software development,
the word debugging usually refers to finding and removing errors in a piece of code.
With the RAI dashboard, we can debug a machine learning model and improve its overall performance and responsible AI aspects of its predictions.
Here is how the Responsible AI dashboard could assist you with debugging this classification model:
## Pre-built cohorts
The ability to define groups of data points, or cohorts, within the source data is one of the most useful features of the RAI dashboard.
In a classification scenario like this one, it is often helpful to start by using the class labels to define cohorts. To do this:
- Click the "+ New cohort" button, near the top of the dashboard
- Select "True Y" as your filter criteria
- Use the "Included values" drop-down menu to select only the "Banking and finance - Debt Market" option
- Click the "Add filter" button
- Rename the cohort to something useful to you, in the "Dataset cohort name" dialog box
- Click "Save"
![Pre-built Cohort](./media/nlp_classification/errorAnalysis_prebuiltCohorts.png)
Now repeat this for each of the news article classes, until you have defined seven pre-built cohorts.
## Error analysis
Next, let's review the error analysis tree map, which you will find at the top of the dashboard.
This chart simplifies the process of discovering and highlighting common failure patterns.
Look for the nodes with the darkest red color (i.e. high error rate) and a high fill line (i.e. high error coverage). Error rate means what percentage of the selected nodes datapoints are receiving erroneous predictions.
Error coverage defines what percentage of overall errors are concentrated in the given node.
The Error Tree visualization helps you uncover your models “blind spots”.
In this context, a blind spot is a group of datapoints, or a cohort, for which the model is less accurate and performant.
This could be any group of observations, that when grouped by a common characteristic, show significantly worse model performance.
Looking at the root node of the tree (representing errors on all data) we see the error rate for all predictions is about 24.4%.
![Error-Analysis-00](./media/nlp_classification/errorAnalysis_treeMap.png) #whole tree
- Explore nodes
Look at the left node on the second level of left branch, with less than 1.5 negative words,
and less than,17.5 positive words.
This has a high error rate of 39.1%.
![Error-Analysis-01](./media/nlp_classification/errorAnalysis_treeMap_leftBranch_leftNode.png)
This contrasts sharply with the right node, on the same level, with greater than 17.5 positive words.
This node has an error rate of only 9.5%.
![Error-Analysis-02](./media/nlp_classification/errorAnalysis_treeMap_leftBranch_rightNode.png)
These two cohorts are two good options to save in order to explore further.
Click on the more erroneous node again and choose the “Save as new cohort” button, at the upper-right corner of the tree map.
![Error-Analysis-03](./media/nlp_classification/errorAnalysis_saveCohort.png)
Do the same thing for the lower-right node.
Here we name it, "positive_words > 17.50, negative_words <= 1.50"
**Top features leading to errors** – Click on the feature list icon at the top of the error analysis section. This will surface a list of features, ranked by their correlations to the models errors.
![Error-Analysis-04](./media/nlp_classification/errorAnalysis_featureList.png)
The **Heat map** can also be a very helpful tool to identify areas of lower performance.
Click on the heat map tab, at the top of the Error analysis section.
The heat map is useful for taking a closer look at certain groups, to explore different “slices” of the data.
Select up to two features to see how errors are distributed across these groups.
Find the heat map by selecting the “Heat map” tab, next to the “Feature list” tab shown above.
For Feature 1, select "negative_words".
For Feature 2, select "positive_words".
Make Binning threshold 8.
Each cell of the heatmap represents a slice of the dataset,
and the percentage of errors out of the total number of data points in that slice.
Here we can see that the worst performance tends to happen when there is only one or two negative words.
It is also interesting to note that accuracy suffers across the board when there are no negative words,
while accuracy is very good when there are at least three negative words in the article.
![Heat-Map-00](./media/nlp_classification/errorAnalysis_heatMap.png)
Next, hover over one of the squares.
You will see Error Rate and Error Coverage.
Now compare the error rates and error coverage values for different cells.
You will see that error coverage is not consistent,
even in cells with the same error rate.
In this way, the RAI dashboard can help deepen your understanding of different slices of the data.
And as we saw with error nodes on the tree map,
you can select individual cells and save the data points as a custom cohort.
![Heat-Map-01](./media/nlp_classification/errorAnalysis_heatMap_errorCoverage.png) #error coverage
## Model overview and performance analysis
In the Model overview section, we can look at all the data and compare individual cohorts.
Each of the pre-built and custom cohorts defined above is included in this view.
You will see the accuracy score for each cohorts.
![Model-Perf-00](./media/nlp_classification/modelPerf_overviewGrid.png) #Grid view
There are also several options for visualizing metrics,
including a bar chart of accuracy scores and confusion matrix. These can be customized by different dimensions, such as cohort or metric.
**Accuracy for all cohorts**
![Model-Perf-01](./media/nlp_classification/modelPerf_overviewBarchart.png)
**Confusion matrix**
![Model-Perf-02](./media/nlp_classification/modelPerf_confusionMatrix.png)
Similar investigations can be performed using the visualization options presented below the metrics table.
- Select the "Chart view" tab
- Click on the vertical axis and set the Feature to "positive_words"
- For the horizontal axis, set this value to "True Y"
- In the "Select a dataset cohort to explore" drop-down, pick the more error prone cohort "positive_words <= 17.50, negative_words <= 1.50"
![Model-Perf-03](./media/nlp_classification/dataAnalysis_chartView_positiveWordsLessThan.png)
Now switch the dataset cohort to "positive_words > 17.50, negative_words <= 1.50."
You will see that only 4 categories have articles that fit these criteria.
![Model-Perf-04](./media/nlp_classification/dataAnalysis_chartView_positiveWordsGreaterThan.png)
## Fairness and bias
Does this model consistently classify news articles from across the different news categories?
The RAI dashboard can help identify if different groups within your overall population are being impacted differently by model results.
These can be cohorts you previously defined or newly defined groups.
**Configure the Feature Analysis tool** -- At the top of the Model overview section, select the Feature cohort option.
This component also allows you look more closely at how the model performs with respect to certain features.
Use the Feature(s) drop-down to select positive_words.
![Fairness-00](./media/nlp_classification/modelOverview_featureCohorts_positiveWords.png)
Here you will see that there are differences in performance, depending on the number of positive words in the article.
Articles with the most positive words, greater than 18.7, have the highest accuracy.
## Data analysis
Next, we come to the data analysis section.
This tool allows you to look at the data behind the cohorts
and can often give you clues as to why some groups are more error-prone than others.
This information allows you to not just identify where are the blind spots,
but also understand why.
For example, your training data may have only a handful of observations for the error-prone cohort.
Start by selecting "Chart view," then:
- Select your cohort
- Select the y-axis and change it to "negative_words"
- Select the x-axis and change it to "True Y" data
This will allow us to explore the "ground truth" data from the cohort.
Here we see that most categories contain articles with very few negative words.
In fact, articles about Personal Finance and Real Estate almost always have no negative words,
and the few that do can be considered outliers.
![Data-Analysis-00](./media/nlp_classification/dataAnalysis_chartView_negativeWords.png)
Finally, scroll down to the Feature importances section.
This tool gives you information about what model inputs, in this case text words,
are most important in determing the model output.
- Adjust the slider at the top of the chart to show the "Top 10 feature by their importance"
![Feature-Importance-00](./media/nlp_classification/dataAnalysis_featureImportance.png)
## Conclusion
The Responsible AI dashboard provides valuable tools to help you debug model performance and improve customer experience.
In this example, we saw how RAI dashboard components provided valuable insight into text classification performance for a variety of business news articles.
These tools can help you focus your model tuning efforts,
to make your model development more efficient and your models are performant across a variety of scenarios.
## Next steps
1) Complete the text classification demo using the Azure OpenAI service and compare your results
1) Make the RAI dashboard part of your regular ML Ops practice