Update model cards + versions for HLS models (#3503)
* update model versions for hls models * update model cards * Update description.md --------- Co-authored-by: Tina Manghnani <tinaem14@gmail.com>
This commit is contained in:
Родитель
a70cd6fe3e
Коммит
f809ab765e
|
@ -1,13 +1,13 @@
|
|||
## Overview
|
||||
|
||||
The CxrReportGen model utilizes a multimodal architecture, integrating a BiomedCLIP image encoder with a Phi-3-Mini text encoder to accurately interpret complex medical imaging studies of chest X-rays. CxrReportGen follows the same framework as **[MAIRA-2](https://www.microsoft.com/en-us/research/publication/maira-2-grounded-radiology-report-generation/)**. Its primary function is to generate comprehensive and structured radiology reports, with visual grounding represented by bounding boxes on the images.
|
||||
The CXRReportGen model utilizes a multimodal architecture, integrating a BiomedCLIP image encoder with a Phi-3-Mini text encoder to help an application interpret complex medical imaging studies of chest X-rays. CXRReportGen follows the same framework as **[MAIRA-2](https://www.microsoft.com/en-us/research/publication/maira-2-grounded-radiology-report-generation/)**. When built upon and integrated into an application, CXRReportGen may help developers generate comprehensive and structured radiology reports, with visual grounding represented by bounding boxes on the images.
|
||||
|
||||
### Training information
|
||||
|
||||
| **Training Dataset** | **Details** |
|
||||
|----------------|---------------------|
|
||||
| **[MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/)** | Frontal chest X-rays from the training partition of the MIMIC-CXR dataset and the associated text reports. Rule-based processing was carried out to extract findings and impressions separately, or to map non-labeled report sections to the relevant sections. During training, text is randomly sampled from either the findings or the impression section. In total 203,170 images from this dataset were used.|
|
||||
| **Propiertary datasets** | Multiple other proprietary datasets, composed of procured data, were additionally leveraged for training. Caution was taken to ensure there was no leakage of test data samples in the data used for training. |
|
||||
| **Proprietary datasets** | Multiple other proprietary datasets, composed of procured data, were additionally leveraged for training. Caution was taken to ensure there was no leakage of test data samples in the data used for training. |
|
||||
|
||||
**Training Statistics:**
|
||||
- **Data Size:** ~400,000 samples
|
||||
|
@ -118,6 +118,15 @@ You can optionally apply the below code on the output to adjust the size:
|
|||
|
||||
## Ethical Considerations
|
||||
|
||||
CxrReportGen should not be used as a diagnostic tool or as a substitute for professional medical advice. It is designed to assist radiologists by generating findings and reports, but final clinical decisions should always be made by human experts.
|
||||
CXRReportGen is not designed or intended to be deployed as-is in clinical settings: for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions; for use as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional; or to generate draft radiology reports for use in patient care.
|
||||
|
||||
For detailed guidelines on ethical use, refer to Microsoft's [Responsible AI Principles](https://www.microsoft.com/en-us/ai/responsible-ai).
|
||||
Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse.
|
||||
|
||||
While testing the model with images and/or text, ensure the the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.
|
||||
|
||||
For detailed guidelines on ethical use, refer to Microsoft’s [Responsible AI Principles](https://www.microsoft.com/en-us/ai/responsible-ai)
|
||||
|
||||
## Hardware Requirement for Compute Instances
|
||||
- Supports CPU and GPU
|
||||
- Default: Single A100 GPU or Intel CPU
|
||||
- Minimum: Single GPU instance with 24Gb Memory (Fastest) or CPU
|
|
@ -27,4 +27,4 @@ tags:
|
|||
Standard_ND96asr_v4,
|
||||
Standard_ND96amsr_A100_v4,
|
||||
]
|
||||
version: 3
|
||||
version: 4
|
|
@ -1,20 +1,23 @@
|
|||
### Overview
|
||||
Most medical imaging AI today is narrowly built to detect a small set of individual findings on a single modality like Chest x-Rays.
|
||||
This training approach is data and computationally inefficient, requiring ~6-12 months per finding, and often fails to generalize in real world environments.
|
||||
By further training existing multimodal foundation models on medical images and associated text data, Microsoft and Nuance created a multimodal foundation model that shows evidence of generalizing across various medical imaging modalities, anatomies, locations, severities, and types of medical data.
|
||||
The training methods learn to map the medical text and images into a unified numerical vector representation space, which makes it easy for computers to understand the relationships between those modalities.
|
||||
Most medical imaging AI today is narrowly built to detect a small set of individual findings on a single modality like chest X-rays. This training approach is data- and computationally inefficient, requiring ~6-12 months per finding1, and often fails to generalize in real world environments. By further training existing multimodal foundation models on medical images and associated text data, Microsoft and Nuance created a multimodal foundation model that shows evidence of generalizing across various medical imaging modalities, anatomies, locations, severities, and types of medical data. The training methods learn to map the medical text and images into a unified numerical vector representation space, which makes it easy for computers to understand the relationships between those modalities.
|
||||
|
||||
Embeddings is an important building block in AI research and development for retrieval, search, comparison, classification, and tagging tasks, and developers and researchers can now use MedImageInsight embeddings in the medical domain.
|
||||
MedImageInsight embeddings is open source allowing developers to customize and adapt to their specific use cases.
|
||||
Embeddings are an important building block in AI research and development for retrieval, search, comparison, classification, and tagging tasks, and developers and researchers can now use MedImageInsight embeddings in the medical domain. MedImageInsight embeddings is open source allowing developers to customize and adapt to their specific use cases.
|
||||
|
||||
Please see https://aka.ms/medimageinsightpaper for more details.
|
||||
|
||||
For documentation and example Jupyter Notebooks, visit: https://aka.ms/MedImageInsightDocs.
|
||||
|
||||
[^1]: [2022.12.07.22283216v3.full.pdf (medrxiv.org)](https://www.medrxiv.org/content/10.1101/2022.12.07.22283216v3.full.pdf)
|
||||
|
||||
### Model Architecture
|
||||
|
||||
Microsoft MedImageInsight includes 360 million parameter image encoder and 252 million parameter language encoder and comes as pretrained model with fine-tuning capability. The language encoder is not run in inference for each image. It is only run once (offline) to generate classifier head. MedImageInsight is a vision language transformer and was derviced from the Florence computer vision foundation model. Florence is a two-tower architecture similar to CLIP, except the DaViT archictecture is used as the image encoder and the UniCL objective is used as the objective function for MedImageInsight.
|
||||
Microsoft MedImageInsight includes 360 million parameter image encoder and 252 million parameter language encoder and comes as pretrained model with fine-tuning capability. The language encoder is not run in inference for each image. It is only run once (offline) to generate classifier head. MedImageInsight is a vision language transformer and was derived from the Florence computer vision foundation model. Florence is a two-tower architecture similar to CLIP, except the DaViT architecture is used as the image encoder and the UniCL objective is used as the objective function for MedImageInsight.
|
||||
|
||||
Model input supports image and text input and generates vector embeddings as output. This is a static model trained on an offline dataset that is described below.
|
||||
Model input supports image and text input and generates vector embeddings as output. This is a static model trained on an offline dataset that is described below.
|
||||
|
||||
### License and where to send questions or comments about the model
|
||||
A custom commercial license is available. Please contact the team for details.
|
||||
The license for MedImageParse is the MIT license.
|
||||
For questions or comments, please contact: hlsfrontierteam@microsoft.com
|
||||
|
||||
### Training information
|
||||
|
||||
|
@ -42,23 +45,30 @@ A custom commercial license is available. Please contact the team for details.
|
|||
|
||||
|
||||
### Evaluation Results
|
||||
In this section, we report the results for the models on standard academic benchmarks. For all the evaluations, we use our internal evaluations library. For these models, we always pick the best score between our evaluation framework and any publicly reported results.
|
||||
| **Modality** | **Use Case** | **Benchmark** | **Maturity relative to Human Expert** | **MSFT IP or Partner Models** | **Google Models** |
|
||||
In this section, we report the results for the models on standard academic benchmarks. For all the evaluations, we use our internal evaluations library. For these models, we always pick the best score between our evaluation framework and any publicly reported results. Full details at https://aka.ms/medimageinsightpaper
|
||||
| **Modality** | **Use Case** | **Benchmark (# Labels) ** | **Maturity relative to Human Expert** | **MSFT IP or Partner Models** | **Google Models** |
|
||||
|----------------|---------------------|-----------------------------------------------------------------------------------------------|--------------------------------------|---------------------------------|---------------------------------------|
|
||||
| **Radiology** | Classification | X-Ray: RSNA Bone age | 🟢 | 6.85 avg L1* | No test results |
|
||||
| | Classification | X-Ray: IRMA2005 body-region/view categories | 🟢 | 0.99 mAUC* | No test results |
|
||||
| **Radiology** | Classification | X-Ray: RSNA Bone age | 🟢 | 6.19 Ab L1* | No test results |
|
||||
| | Classification | X-Ray: MGB Bone age | 🟢 | 6.57 Ab. L1 | No test results |
|
||||
| | Classification | X-Ray: IRMA2005 body-region/view categories (137) | 🟢 | 0.99 mAUC* | No test results |
|
||||
| | Classification | Chest X-Ray: LT-CXR (20) | 🟡 | 0.85 mAUC | No test results |
|
||||
| | Classification | Chest X-Ray: MGB CXR (80) | 🟡 | 0.94 mAUC | No test results |
|
||||
| | Classification | ChestXray14: Consolidation (finetuning) | 🟡 | 0.74 mAUC* | 0.74 mAUC (ELiXR)* |
|
||||
| | Classification | ChestXray14: Edema (finetuning) | 🟡 | 0.86 mAUC* | 0.85 mAUC* (ELiXR) |
|
||||
| | Classification | ChestXray14: Effusion (finetuning) | 🟡 | 0.83 mAUC* | 0.83 mAUC* (ELiXR) |
|
||||
| | Classification | MR/CT: Exam categories | 🟡 | 0.95 mAUC* | No test results |
|
||||
| | Classification | Chest CT: LIDC-IDRI Lung Nodules | 🟡 | 0.81 mAUC* | No model |
|
||||
| | Classification | Mammography: RSNA Mammography | 🟡 | 0.81 mAUC* | No model |
|
||||
| **Dermatology**| Classification | ISIC2019 | 🟡 | 0.84 mAUC* | No test results |
|
||||
| | Classification | SD-198 | 🟡 | 0.93 mAUC* | No test results |
|
||||
| | Classification | PADUFES20 | 🟡 | 0.96 mAUC | 0.97* (Med-PaLM-M 84B) |
|
||||
| **Pathology** | Classification | PCAM | 🟡 | 0.96 mAUC* (PaLM) | No test results |
|
||||
| | Classification | WILDS | 🟡 | 0.97 mAUC (PaLM) | No test results |
|
||||
|
||||
| | Classification | MR/CT: Exam categories (21) | 🟡 | 0.95 mAUC* | No test results |
|
||||
| | Classification | Chest CT: LIDC-IDRI Lung Nodules (4) | 🟡 | 0.81 mAUC* | No model |
|
||||
| | Classification | Mammography: RSNA Mammography (4) | 🟡 | 0.81 mAUC* | No model |
|
||||
| | Classification | US: USI (3) | 🟡 | 0.99 mAUC | No model |
|
||||
| | Classification | US: HMC-QU View (2) | 🟡 | 0.99 mAUC | No model |
|
||||
| | Classification | US: Bing Echo View (7) | 🟡 | 0.94 mAUC | No model |
|
||||
| **Dermatology**| Classification | ISIC2019 (8) | 🟡 | 0.97 mAUC* | No test results |
|
||||
| | Classification | SD-198 (198) | 🟡 | 0.99 mAUC* | No test results |
|
||||
| | Classification | PADUFES20 (6) | 🟡 | 0.95 mAUC | 0.97* (Med-PaLM-M 84B) |
|
||||
| **Pathology** | Classification | PCAM (2) | 🟡 | 0.96 mAUC* | No test results |
|
||||
| **Ophthalmology** | Classification | OCT2017 (4) | 🟡 | 1.00 mAUC* | No test results |
|
||||
| | Classification | OCT2018 (4) | 🟡 | 1.00 mAUC* | No test results |
|
||||
| | Classification | Fundus ODIR5K (79) | 🟡 | 0.95 mAUC | No test results |
|
||||
|
||||
*SOTA for this task
|
||||
|
||||
|
@ -100,7 +110,8 @@ Microsoft believes Responsible AI is a shared responsibility and we have identif
|
|||
While testing the model with images and/or text, ensure the the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.
|
||||
|
||||
The model is not designed for the following use cases:
|
||||
* **Use as a diagnostic tool or as a medical device** - Using information extracted by our service in diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional.
|
||||
* **Use by clinicians to inform clinical decision-making, as a diagnostic tool, or as a medical device** - MedImageInsight is not designed or intended to be deployed as-is in clinical settings nor is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional.
|
||||
|
||||
|
||||
* **Scenarios without consent for data** - Any scenario that uses health data for a purpose for which consent was not obtained.
|
||||
|
||||
|
@ -111,7 +122,7 @@ Please see Microsoft's Responsible AI Principles and approach available at [http
|
|||
|
||||
### Sample inputs and outputs (for real time inference)
|
||||
|
||||
Input:
|
||||
**Input:**
|
||||
```bash
|
||||
data = {
|
||||
"input_data": {
|
||||
|
@ -119,28 +130,37 @@ data = {
|
|||
"image",
|
||||
"text"
|
||||
],
|
||||
"index":[0, 1],
|
||||
"index":[0],
|
||||
"data": [
|
||||
[base64.encodebytes(read_image(sample_image_ct_8Bits_Mono)).decode("utf-8"), "This 3D volume depicts the pancreas with a single tumor, the largest of which measures 5.10 centimeters in length."],
|
||||
[base64.encodebytes(read_image(sample_image_mri_8Bits_Mono)).decode("utf-8"), "This 3D volume depicts the brain with a single tumor."]
|
||||
[base64.encodebytes(read_image(sample_image_1)).decode("utf-8"), "x-ray chest anteroposterior Cardiomegaly"]
|
||||
]
|
||||
},
|
||||
"params": {}
|
||||
"params":{
|
||||
"get_scaling_factor": True
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Output:
|
||||
**Output:**
|
||||
```bash
|
||||
[{"image_features": [[-0.040428221225738525, 0.015632804483175278, -0.034625787287950516, -0.013094332069158554, 0.023215821012854576, -0.010303247720003128, -0.003998206462711096, -0.00022746287868358195]]
|
||||
[
|
||||
{
|
||||
"image_features": [
|
||||
[-0.040428221225738525, 0.015632804483175278, -0.034625787287950516, -0.013094332069158554, ... , 0.023215821012854576, -0.010303247720003128, -0.003998206462711096, -0.00022746287868358195]
|
||||
]
|
||||
},
|
||||
{
|
||||
"text_features": [
|
||||
[-0.04121647855639458, 0.014923677921295166, -0.033598374396562576, -0.012765488520264626, ... , 0.02294582130014801, -0.009835227608680725, -0.004232016112744808, -0.00021812367581298325]
|
||||
]
|
||||
},
|
||||
{
|
||||
"scaling_factor": 4.513362407684326
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Data and Resource Specification for Deployment
|
||||
* **Supported Data Input Format**
|
||||
- Monochromatic 8-bit Images (i.e. PNG, TIFF)
|
||||
- RGB Images (i.e. JPEG, PNG)
|
||||
- Text (Maximum: 77 Tokens)
|
||||
|
||||
* **Hardware Requirement for Compute Instances**
|
||||
- Default: Single V100 GPU
|
||||
- Minimum: Single GPU instance with 8Gb Memory
|
||||
- Batch size: 4 (~6Gb Memory)
|
||||
## Hardware Requirement for Compute Instances
|
||||
- Supports CPU and GPU
|
||||
- Default: Single V100 GPU or Intel CPU
|
||||
- Minimum: Single GPU instance with 8Gb Memory (Fastest) or CPU
|
||||
|
|
|
@ -31,4 +31,4 @@ tags:
|
|||
Standard_ND96amsr_A100_v4,
|
||||
Standard_ND40rs_v2,
|
||||
]
|
||||
version: 2
|
||||
version: 3
|
|
@ -1,21 +1,21 @@
|
|||
### Overview
|
||||
Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. MedImageParse is a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 82 object types across 9 imaging modalities. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting all relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object.
|
||||
Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. MedImageParse is a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition across 9 imaging modalities. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting all relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object.
|
||||
|
||||
On image segmentation, we showed that MedImageParse is broadly applicable, outperforming state-of-the-art methods on 102,855 test image-mask-label triples across 9 imaging modalities.
|
||||
MedImageParse is broadly applicable, performing image segmentation across 9 imaging modalities.
|
||||
|
||||
MedImageParse is also able to identify invalid user inputs describing objects that do not exist in the image. On object detection, which aims to locate a specific object of interest, MedImageParse again attained state-of-the-art performance, especially on objects with irregular shapes.
|
||||
MedImageParse is also able to identify invalid user inputs describing objects that do not exist in the image. MedImageParse can perform object detection, which aims to locate a specific object of interest, including on objects with irregular shapes.
|
||||
|
||||
On object recognition, which aims to identify all objects in a given image along with their semantic types, we showed that MedImageParse can simultaneously segment and label all biomedical objects in an image.
|
||||
On object recognition, which aims to identify all objects in a given image along with their semantic types, MedImageParse can simultaneously segment and label all biomedical objects in an image.
|
||||
|
||||
In summary, MedImageParse is an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition.
|
||||
In summary, MedImageParse shows potential to be a building block for an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition.
|
||||
|
||||
It is broadly applicable to all major biomedical image modalities, paving the path for efficient and accurate image-based biomedical discovery.
|
||||
It is broadly applicable to all major biomedical image modalities, which may pave a future path for efficient and accurate image-based biomedical discovery when built upon and integrated into an application.
|
||||
|
||||
### Model Architecture
|
||||
MedImageParse is built upon a transformer-based architecture, optimized for processing large biomedical corpora. Leveraging multi-head attention mechanisms, it excels at identifying and understanding biomedical terminology, as well as extracting contextually relevant information from dense scientific texts. The model is pre-trained on vast biomedical datasets, allowing it to generalize across various biomedical domains with high accuracy.
|
||||
|
||||
### License and where to send questions or comments about the model
|
||||
The license for MedImageParse is the MIT license.
|
||||
The license for MedImageParse is the MIT license. Please cite our paper if you use the model for your research https://microsoft.github.io/BiomedParse/assets/BiomedParse_arxiv.pdf.
|
||||
For questions or comments, please contact: hlsfrontierteam@microsoft.com
|
||||
|
||||
### Training information
|
||||
|
@ -30,9 +30,7 @@ Please see the paper for detailed information about methods and results. https:/
|
|||
Bar plot comparing the Dice score between our method and competing methods on 102,855 test instances (image-mask-label
|
||||
triples) across 9 modalities. MedSAM and SAM require bounding box as input.
|
||||
|
||||
![MedImageParse comparison results on segmentation](medimageparseresults.png)
|
||||
|
||||
|
||||
<img src="https://automlcesdkdataresources.blob.core.windows.net/model-cards/model_card_images/MedImageParse/medimageparseresults.png" alt="MedImageParse comparison results on segmentation">
|
||||
|
||||
### Fairness evaluation
|
||||
We conducted fairness evaluation for different sex and age groups. Two-sided independent t-test
|
||||
|
@ -40,12 +38,12 @@ shows non-significant differences between female and male and between different
|
|||
|
||||
### Ethical Considerations and Limitations
|
||||
|
||||
Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse.
|
||||
Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices to help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse.
|
||||
|
||||
While testing the model with images and/or text, ensure the the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.
|
||||
While testing the model with images and/or text, ensure that the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.
|
||||
|
||||
The model is not designed for the following use cases:
|
||||
* **Use as a diagnostic tool or as a medical device** - Although MedImageParse is highly accurate in parsing biomedical it is not intended to be consumed directly and using information extracted by our service in diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional.
|
||||
* **Use by clinicians to inform clinical decision-making, as a diagnostic tool or as a medical device** - Although MedImageParse is highly accurate in parsing biomedical data, it is not desgined or intended to be deployed in clinical settings as-is not is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional.
|
||||
|
||||
* **Scenarios without consent for data** - Any scenario that uses health data for a purpose for which consent was not obtained.
|
||||
|
||||
|
@ -121,13 +119,6 @@ ultrasound: breast: benign tumor, malignant tumor, tumor
|
|||
heart: left heart atrium, left heart ventricle
|
||||
transperineal: fetal head, public symphysis
|
||||
|
||||
* **Hardware Requirement for Compute Instances**
|
||||
**Hardware Requirement for Compute Instances**
|
||||
- Default: Single V100 GPU
|
||||
- Minimum: Single GPU instance with 8Gb Memory
|
||||
|
||||
|
||||
* **Hardware Requirement for Compute Instances**
|
||||
Please suggest the following hardware requirements for the compute instances, for example:
|
||||
- Batch size: 4 (~6Gb Memory)
|
||||
- Image Compression Ratio: 75 (Default)
|
||||
- Image Size: 512 (Default for X-Y Dimension)
|
||||
- Minimum: Single GPU instance with 8Gb Memory
|
||||
|
|
|
@ -31,4 +31,4 @@ tags:
|
|||
Standard_ND96amsr_A100_v4,
|
||||
Standard_ND40rs_v2,
|
||||
]
|
||||
version: 2
|
||||
version: 3
|
Загрузка…
Ссылка в новой задаче