Add biomedclip and Virchow model assets (#3476)

* add virchow and ciomedclip model assets * fix json * update Virchow cards * biomedclip + virchow * update bilmedclip description
2024-10-09 20:33:06 -07:00 · 2024-10-09 20:33:06 -07:00 · 4aef45af5b
--- a/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/asset.yaml
+++ b/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/asset.yaml
@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Foundation Models"]
--- a/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/description.md
+++ b/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/description.md
@ -0,0 +1,71 @@
+BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the image encoder, with domain-specific adaptations. It can perform various vision-language processing (VLP) tasks such as cross-modal retrieval, image classification, and visual question answering. BiomedCLIP establishes new state of the art in a wide range of standard datasets, and substantially outperforms prior VLP approaches:
+
+![performance](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/biomed-vlp-eval.svg)
+
+**Citation**
+
+```
+@misc{https://doi.org/10.48550/arXiv.2303.00915,
+  doi = {10.48550/ARXIV.2303.00915},
+  url = {https://arxiv.org/abs/2303.00915},
+  author = {Zhang, Sheng and Xu, Yanbo and Usuyama, Naoto and Bagga, Jaspreet and Tinn, Robert and Preston, Sam and Rao, Rajesh and Wei, Mu and Valluri, Naveen and Wong, Cliff and Lungren, Matthew and Naumann, Tristan and Poon, Hoifung},
+  title = {Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing},
+  publisher = {arXiv},
+  year = {2023},
+}
+
+```
+
+## Model Use
+
+**Intended Use**
+This model is intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper.
+
+**Primary Intended Use**
+The primary intended use is to support AI researchers building on top of this work. BiomedCLIP and its associated models should be helpful for exploring various biomedical VLP research questions, especially in the radiology domain.
+
+**Out-of-Scope Use**
+Any deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to the associated paper for more details.
+
+**Data**
+This model builds upon PMC-15M dataset, which is a large-scale parallel image-text dataset for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. It covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more.
+
+**Limitations**
+This model was developed using English corpora, and thus can be considered English-only.
+
+**Further information**
+Please refer to the corresponding paper, "Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing" for additional details on the model training and evaluation.
+
+## Sample Input and Output (for real-time inference)
+
+### Sample Input
+
+```json
+{
+  "input_data": {
+    "columns": [
+      "image",
+      "text"
+    ],
+    "index":[0, 1, 2],
+    "data": [
+      ["image1", "labe1, label2, label3"],
+      ["image2", "labe1, label2, label3"],
+     ["image3", "labe1, label2, label3"],     
+   ]
+ }
+}
+```
+### Sample Output
+```json
+[
+    {
+        "probs": [0.95, 0.03, 0.02],
+        "labels": ["label1", "label2", "label3"]
+    },
+    {
+        "probs": [0.04, 0.93, 0.03],
+        "labels": ["label1", "label2", "label3"]
+    }
+]
+```
--- a/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/model.yaml
+++ b/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/model.yaml
@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: huggingface/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/mlflow_model_folder
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: mlflow_model
--- a/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/spec.yaml
+++ b/assets/models/system/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/spec.yaml
@ -0,0 +1,34 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+
+name: BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
+path: ./
+
+properties:
+  inference-min-sku-spec: 6|1|112|64
+  inference-recommended-sku: Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
+  languages: en
+  SharedComputeCapacityEnabled: true
+
+tags:
+  task: zero-shot-image-classification
+  industry: health-and-life-sciences
+  Preview: ""
+  inference_supported_envs:
+    - hf
+  license: mit
+  author: Microsoft
+  hiddenlayerscanned: ""
+  SharedComputeCapacityEnabled: ""
+  inference_compute_allow_list:
+    [
+      Standard_NC6s_v3,
+      Standard_NC12s_v3,
+      Standard_NC24s_v3,
+      Standard_NC24ads_A100_v4,
+      Standard_NC48ads_A100_v4,
+      Standard_NC96ads_A100_v4,
+      Standard_ND96asr_v4,
+      Standard_ND96amsr_A100_v4,
+      Standard_ND40rs_v2,
+    ]
+version: 1
--- a/assets/models/system/Virchow/asset.yaml
+++ b/assets/models/system/Virchow/asset.yaml
@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Foundation Models"]
--- a/assets/models/system/Virchow/description.md
+++ b/assets/models/system/Virchow/description.md
@ -0,0 +1,113 @@
+Virchow is a self-supervised vision transformer pretrained using 1.5M whole slide histopathology images. The model can be used as a tile-level feature extractor (frozen or finetuned) to achieve state-of-the-art results for a wide variety of downstream computational pathology use cases.
+
+## Model Details
+
+**Developed by:** Paige, NYC, USA and Microsoft Research, Cambridge, MA USA
+**Model Type:** Image feature backbone
+**Model Stats:**
+    Params (M): 632
+    Image size: 224 x 224
+    Model Architecture:
+**Architecture:** ViT-H/14
+    Patch size: 14
+    Layers: 32
+    Embedding dimension: 1280
+    Activation function: SwiGLU
+    Attention heads: 16
+    LayerScale: true
+**Training Details:**
+    Precision: Mixed precision (fp16)
+    Objective: Modified DINOv2 (https://doi.org/10.48550/arXiv.2304.07193)
+**Paper:**
+    A foundation model for clinical-grade computational pathology and rare cancers detection: https://www.nature.com/articles/s41591-024-03141-0
+**Pretraining Dataset:**  Internal dataset of 1.5 million whole slide images from Memorial Sloan Kettering Cancer Center, tiles sampled at 0.5 microns per pixel resolution (20x magnification).
+**License:** Apache 2.0
+
+## Model Usage
+
+**Direct use**
+Virchow intended to be used as a frozen feature extractor as the foundation for tile-level and whole slide-level classifiers.
+
+**Downstream use**
+Virchow can be finetuned to adapt to specific tasks and/or datasets.
+
+**Terms**
+
+The Virchow Model and associated code are released under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at:
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
+
+Additional Terms
+
+Please note that the primary email used to sign up for your Hugging Face account must match your institutional email to receive approval. By downloading the Virchow Model, you attest that all information (affiliation, research use) is correct and up-to-date. Downloading the Virchow Model requires prior registration on Hugging Face and agreeing to the terms of use.
+
+While the Apache 2.0 License grants broad permissions, we kindly request that users adhere to the following guidelines:
+
+Attribution: We encourage proper attribution when using or redistributing the Virchow Model or its derivatives. Please include a reference to the original source and creators.
+
+Responsible Use: Users are expected to use the Virchow Model responsibly and ethically. Please consider the potential impacts of your use on individuals and society.
+
+Medical or Clinical Use: The Virchow Model is not intended for use in medical diagnosis, treatment, or prevention of disease of real patients. It should not be used as a substitute for professional medical advice.
+
+Privacy and Data Protection: Users should respect privacy rights and comply with applicable data protection laws when using the Virchow Model.
+
+No Malicious Use: The Virchow Model should not be used to create malicious code, malware, or to interfere with the proper functioning of computer systems.
+
+Transparency: If you use the Virchow Model in a product or service, we encourage you to disclose this fact to your end users.
+
+Feedback and Contributions: We welcome feedback and contributions to improve the Virchow Model. Please consider sharing your improvements with the community.
+
+These additional terms are not intended to restrict your rights under the Apache 2.0 License but to promote responsible and ethical use of the Virchow Model.
+
+By using the Virchow Model, you acknowledge that you have read and understood these terms.
+
+**Citation**
+
+Please cite the following work if you used this model in your research.
+
+Vorontsov, E., Bozkurt, A., Casson, A. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med (2024). https://doi.org/10.1038/s41591-024-03141-0
+
+```
+@article{vorontsov2024virchow,
+  title={A foundation model for clinical-grade computational pathology and rare cancers detection},
+  author={Vorontsov, Eugene and Bozkurt, Alican and Casson, Adam and Shaikovski, George and Zelechowski, Michal and Severson, Kristen and Zimmermann, Eric and Hall, James and Tenenholtz, Neil and Fusi, Nicolo and Yang, Ellen and Mathieu, Philippe and van Eck, Alexander and Lee, Donghun and Viret, Julian and Robert, Eric and Wang, Yi Kan and Kunz, Jeremy D. and Lee, Matthew C. H. and Bernhard, Jan H. and Godrich, Ran A. and Oakley, Gerard and Millar, Ewan and Hanna, Matthew and Wen, Hannah and Retamero, Juan A. and Moye, William A. and Yousfi, Razik and Kanan, Christopher and Klimstra, David S. and Rothrock, Brandon and Liu, Siqi and Fuchs, Thomas J.},
+  journal={Nature Medicine},
+  year={2024},
+  publisher={Nature Publishing Group}
+}
+
+```
+
+## Sample Input and Output (for real-time inference)
+
+### Sample Input
+
+```json
+{
+  "input_data": {
+    "columns": [
+      "image"
+    ],
+    "index":[0],
+    "data": [
+      ["image1"]
+   ]
+ }
+}
+```
+Note:
+- "image1" and "image2" should be publicly accessible urls or strings in base64 format.
+
+### Sample Output
+```json
+[
+  {
+    "output": [
+      0.0, 0.0, 0.0, 0.0
+    ]
+  }
+]
+```
+Output will be image embeddings.
--- a/assets/models/system/Virchow/model.yaml
+++ b/assets/models/system/Virchow/model.yaml
@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: huggingface/Virchow/mlflow_model_folder
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: mlflow_model
--- a/assets/models/system/Virchow/spec.yaml
+++ b/assets/models/system/Virchow/spec.yaml
@ -0,0 +1,34 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+
+name: Virchow
+path: ./
+
+properties:
+  inference-min-sku-spec: 6|1|112|64
+  inference-recommended-sku: Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
+  languages: en
+  SharedComputeCapacityEnabled: true
+
+tags:
+  task: image-feature-extraction
+  industry: health-and-life-sciences
+  Preview: ""
+  inference_supported_envs:
+    - hf
+  license: apache-2.0
+  author: Paige
+  hiddenlayerscanned: ""
+  SharedComputeCapacityEnabled: ""
+  inference_compute_allow_list:
+    [
+      Standard_NC6s_v3,
+      Standard_NC12s_v3,
+      Standard_NC24s_v3,
+      Standard_NC24ads_A100_v4,
+      Standard_NC48ads_A100_v4,
+      Standard_NC96ads_A100_v4,
+      Standard_ND96asr_v4,
+      Standard_ND96amsr_A100_v4,
+      Standard_ND40rs_v2,
+    ]
+version: 1