зеркало из https://github.com/microsoft/SynapseML.git
chore: bump synapseml to 1.0.7 (#2298)
This commit is contained in:
Родитель
9905ae80a2
Коммит
bd5b4013a1
24
README.md
24
README.md
|
@ -11,10 +11,10 @@ SynapseML requires Scala 2.12, Spark 3.4+, and Python 3.8+.
|
|||
| Topics | Links |
|
||||
| :------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| Build | [![Build Status](https://msdata.visualstudio.com/A365/_apis/build/status/microsoft.SynapseML?branchName=master)](https://msdata.visualstudio.com/A365/_build/latest?definitionId=17563&branchName=master) [![codecov](https://codecov.io/gh/Microsoft/SynapseML/branch/master/graph/badge.svg)](https://codecov.io/gh/Microsoft/SynapseML) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) |
|
||||
| Version | [![Version](https://img.shields.io/badge/version-1.0.6-blue)](https://github.com/Microsoft/SynapseML/releases) [![Release Notes](https://img.shields.io/badge/release-notes-blue)](https://github.com/Microsoft/SynapseML/releases) [![Snapshot Version](https://mmlspark.blob.core.windows.net/icons/badges/master_version3.svg)](#sbt) |
|
||||
| Docs | [![Website](https://img.shields.io/badge/SynapseML-Website-blue)](https://aka.ms/spark) [![Scala Docs](https://img.shields.io/static/v1?label=api%20docs&message=scala&color=blue&logo=scala)](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/index.html#package) [![PySpark Docs](https://img.shields.io/static/v1?label=api%20docs&message=python&color=blue&logo=python)](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/index.html) [![Academic Paper](https://img.shields.io/badge/academic-paper-7fdcf7)](https://arxiv.org/abs/1810.08744) |
|
||||
| Version | [![Version](https://img.shields.io/badge/version-1.0.7-blue)](https://github.com/Microsoft/SynapseML/releases) [![Release Notes](https://img.shields.io/badge/release-notes-blue)](https://github.com/Microsoft/SynapseML/releases) [![Snapshot Version](https://mmlspark.blob.core.windows.net/icons/badges/master_version3.svg)](#sbt) |
|
||||
| Docs | [![Website](https://img.shields.io/badge/SynapseML-Website-blue)](https://aka.ms/spark) [![Scala Docs](https://img.shields.io/static/v1?label=api%20docs&message=scala&color=blue&logo=scala)](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/index.html#package) [![PySpark Docs](https://img.shields.io/static/v1?label=api%20docs&message=python&color=blue&logo=python)](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/index.html) [![Academic Paper](https://img.shields.io/badge/academic-paper-7fdcf7)](https://arxiv.org/abs/1810.08744) |
|
||||
| Support | [![Gitter](https://badges.gitter.im/Microsoft/MMLSpark.svg)](https://gitter.im/Microsoft/MMLSpark?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [![Mail](https://img.shields.io/badge/mail-synapseml--support-brightgreen)](mailto:synapseml-support@microsoft.com) |
|
||||
| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/SynapseML/v1.0.6?labpath=notebooks%2Ffeatures) |
|
||||
| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/SynapseML/v1.0.7?labpath=notebooks%2Ffeatures) |
|
||||
| Usage | [![Downloads](https://static.pepy.tech/badge/synapseml)](https://pepy.tech/project/synapseml) |
|
||||
<!-- markdownlint-disable MD033 -->
|
||||
<details open>
|
||||
|
@ -119,7 +119,7 @@ In Azure Synapse notebooks please place the following in the first cell of your
|
|||
{
|
||||
"name": "synapseml",
|
||||
"conf": {
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.6",
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.7",
|
||||
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
|
||||
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
|
||||
"spark.yarn.user.classpath.first": "true",
|
||||
|
@ -155,7 +155,7 @@ cloud](http://community.cloud.databricks.com), create a new [library from Maven
|
|||
coordinates](https://docs.databricks.com/user-guide/libraries.html#libraries-from-maven-pypi-or-spark-packages)
|
||||
in your workspace.
|
||||
|
||||
For the coordinates use: `com.microsoft.azure:synapseml_2.12:1.0.6`
|
||||
For the coordinates use: `com.microsoft.azure:synapseml_2.12:1.0.7`
|
||||
with the resolver: `https://mmlspark.azureedge.net/maven`. Ensure this library is
|
||||
attached to your target cluster(s).
|
||||
|
||||
|
@ -163,7 +163,7 @@ Finally, ensure that your Spark cluster has at least Spark 3.2 and Scala 2.12. I
|
|||
|
||||
You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:
|
||||
|
||||
`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.6.dbc`
|
||||
`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.7.dbc`
|
||||
|
||||
### Python Standalone
|
||||
|
||||
|
@ -174,7 +174,7 @@ the above example, or from python:
|
|||
```python
|
||||
import pyspark
|
||||
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
|
||||
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.6") \
|
||||
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.7") \
|
||||
.getOrCreate()
|
||||
import synapse.ml
|
||||
```
|
||||
|
@ -185,9 +185,9 @@ SynapseML can be conveniently installed on existing Spark clusters via the
|
|||
`--packages` option, examples:
|
||||
|
||||
```bash
|
||||
spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.6
|
||||
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.6
|
||||
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.6 MyApp.jar
|
||||
spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.7
|
||||
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.7
|
||||
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.7 MyApp.jar
|
||||
```
|
||||
|
||||
### SBT
|
||||
|
@ -196,7 +196,7 @@ If you are building a Spark application in Scala, add the following lines to
|
|||
your `build.sbt`:
|
||||
|
||||
```scala
|
||||
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "1.0.6"
|
||||
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "1.0.7"
|
||||
```
|
||||
|
||||
### Apache Livy and HDInsight
|
||||
|
@ -210,7 +210,7 @@ Excluding certain packages from the library may be necessary due to current issu
|
|||
{
|
||||
"name": "synapseml",
|
||||
"conf": {
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.6",
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.7",
|
||||
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind"
|
||||
}
|
||||
}
|
||||
|
|
|
@ -19,7 +19,7 @@ object PackageUtils {
|
|||
val PackageName = s"synapseml_$ScalaVersionSuffix"
|
||||
val PackageMavenCoordinate = s"$PackageGroup:$PackageName:${BuildInfo.version}"
|
||||
// Use a fixed version for local testing
|
||||
// val PackageMavenCoordinate = s"$PackageGroup:$PackageName:1.0.6"
|
||||
// val PackageMavenCoordinate = s"$PackageGroup:$PackageName:1.0.7"
|
||||
|
||||
private val AvroCoordinate = "org.apache.spark:spark-avro_2.12:3.4.1"
|
||||
val PackageRepository: String = SparkMLRepository
|
||||
|
|
|
@ -85,66 +85,66 @@
|
|||
"\n",
|
||||
"### Vision\n",
|
||||
"[**Computer Vision**](https://azure.microsoft.com/services/cognitive-services/computer-vision/)\n",
|
||||
"- Describe: provides description of an image in human readable language ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/DescribeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.DescribeImage))\n",
|
||||
"- Analyze (color, image type, face, adult/racy content): analyzes visual features of an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/AnalyzeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.AnalyzeImage))\n",
|
||||
"- OCR: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/OCR.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.OCR))\n",
|
||||
"- Recognize Text: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/RecognizeText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.RecognizeText))\n",
|
||||
"- Thumbnail: generates a thumbnail of user-specified size from the image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/GenerateThumbnails.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.GenerateThumbnails))\n",
|
||||
"- Recognize domain-specific content: recognizes domain-specific content (celebrity, landmark) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/RecognizeDomainSpecificContent.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.RecognizeDomainSpecificContent))\n",
|
||||
"- Tag: identifies list of words that are relevant to the input image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/vision/TagImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.TagImage))\n",
|
||||
"- Describe: provides description of an image in human readable language ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/DescribeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.DescribeImage))\n",
|
||||
"- Analyze (color, image type, face, adult/racy content): analyzes visual features of an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/AnalyzeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.AnalyzeImage))\n",
|
||||
"- OCR: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/OCR.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.OCR))\n",
|
||||
"- Recognize Text: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/RecognizeText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.RecognizeText))\n",
|
||||
"- Thumbnail: generates a thumbnail of user-specified size from the image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/GenerateThumbnails.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.GenerateThumbnails))\n",
|
||||
"- Recognize domain-specific content: recognizes domain-specific content (celebrity, landmark) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/RecognizeDomainSpecificContent.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.RecognizeDomainSpecificContent))\n",
|
||||
"- Tag: identifies list of words that are relevant to the input image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/vision/TagImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.vision.html#module-synapse.ml.services.vision.TagImage))\n",
|
||||
"\n",
|
||||
"[**Face**](https://azure.microsoft.com/services/cognitive-services/face/)\n",
|
||||
"- Detect: detects human faces in an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/face/DetectFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.DetectFace))\n",
|
||||
"- Verify: verifies whether two faces belong to a same person, or a face belongs to a person ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/face/VerifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.VerifyFaces))\n",
|
||||
"- Identify: finds the closest matches of the specific query person face from a person group ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/face/IdentifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.IdentifyFaces))\n",
|
||||
"- Find similar: finds similar faces to the query face in a face list ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/face/FindSimilarFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.FindSimilarFace))\n",
|
||||
"- Group: divides a group of faces into disjoint groups based on similarity ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/face/GroupFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.GroupFaces))\n",
|
||||
"- Detect: detects human faces in an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/face/DetectFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.DetectFace))\n",
|
||||
"- Verify: verifies whether two faces belong to a same person, or a face belongs to a person ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/face/VerifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.VerifyFaces))\n",
|
||||
"- Identify: finds the closest matches of the specific query person face from a person group ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/face/IdentifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.IdentifyFaces))\n",
|
||||
"- Find similar: finds similar faces to the query face in a face list ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/face/FindSimilarFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.FindSimilarFace))\n",
|
||||
"- Group: divides a group of faces into disjoint groups based on similarity ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/face/GroupFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.face.html#module-synapse.ml.services.face.GroupFaces))\n",
|
||||
"\n",
|
||||
"### Speech\n",
|
||||
"[**Speech Services**](https://azure.microsoft.com/products/ai-services/ai-speech)\n",
|
||||
"- Speech-to-text: transcribes audio streams ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/speech/SpeechToText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.speech.html#module-synapse.ml.services.speech.SpeechToText))\n",
|
||||
"- Conversation Transcription: transcribes audio streams into live transcripts with identified speakers. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/speech/ConversationTranscription.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.speech.html#module-synapse.ml.services.speech.ConversationTranscription))\n",
|
||||
"- Text to Speech: Converts text to realistic audio ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/speech/TextToSpeech.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.speech.html#module-synapse.ml.services.speech.TextToSpeech))\n",
|
||||
"- Speech-to-text: transcribes audio streams ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/speech/SpeechToText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.speech.html#module-synapse.ml.services.speech.SpeechToText))\n",
|
||||
"- Conversation Transcription: transcribes audio streams into live transcripts with identified speakers. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/speech/ConversationTranscription.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.speech.html#module-synapse.ml.services.speech.ConversationTranscription))\n",
|
||||
"- Text to Speech: Converts text to realistic audio ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/speech/TextToSpeech.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.speech.html#module-synapse.ml.services.speech.TextToSpeech))\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"### Language\n",
|
||||
"[**AI Language**](https://azure.microsoft.com/products/ai-services/ai-language)\n",
|
||||
"- Language detection: detects language of the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/text/LanguageDetector.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.LanguageDetector))\n",
|
||||
"- Key phrase extraction: identifies the key talking points in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/text/KeyPhraseExtractor.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.KeyPhraseExtractor))\n",
|
||||
"- Named entity recognition: identifies known entities and general named entities in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/text/NER.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.NER))\n",
|
||||
"- Sentiment analysis: returns a score between 0 and 1 indicating the sentiment in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/text/TextSentiment.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.TextSentiment))\n",
|
||||
"- Healthcare Entity Extraction: Extracts medical entities and relationships from text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/text/AnalyzeHealthText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.AnalyzeHealthText))\n",
|
||||
"- Language detection: detects language of the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/text/LanguageDetector.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.LanguageDetector))\n",
|
||||
"- Key phrase extraction: identifies the key talking points in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/text/KeyPhraseExtractor.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.KeyPhraseExtractor))\n",
|
||||
"- Named entity recognition: identifies known entities and general named entities in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/text/NER.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.NER))\n",
|
||||
"- Sentiment analysis: returns a score between 0 and 1 indicating the sentiment in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/text/TextSentiment.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.TextSentiment))\n",
|
||||
"- Healthcare Entity Extraction: Extracts medical entities and relationships from text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/text/AnalyzeHealthText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.text.html#module-synapse.ml.services.text.AnalyzeHealthText))\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"### Translation\n",
|
||||
"[**Translator**](https://azure.microsoft.com/products/ai-services/translator)\n",
|
||||
"- Translate: Translates text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/Translate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.Translate))\n",
|
||||
"- Transliterate: Converts text in one language from one script to another script. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/Transliterate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.Transliterate))\n",
|
||||
"- Detect: Identifies the language of a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/Detect.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.Detect))\n",
|
||||
"- BreakSentence: Identifies the positioning of sentence boundaries in a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/BreakSentence.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.BreakSentence))\n",
|
||||
"- Dictionary Lookup: Provides alternative translations for a word and a small number of idiomatic phrases. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/DictionaryLookup.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.DictionaryLookup))\n",
|
||||
"- Dictionary Examples: Provides examples that show how terms in the dictionary are used in context. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/DictionaryExamples.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.DictionaryExamples))\n",
|
||||
"- Document Translation: Translates documents across all supported languages and dialects while preserving document structure and data format. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/translate/DocumentTranslator.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.DocumentTranslator))\n",
|
||||
"- Translate: Translates text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/Translate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.Translate))\n",
|
||||
"- Transliterate: Converts text in one language from one script to another script. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/Transliterate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.Transliterate))\n",
|
||||
"- Detect: Identifies the language of a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/Detect.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.Detect))\n",
|
||||
"- BreakSentence: Identifies the positioning of sentence boundaries in a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/BreakSentence.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.BreakSentence))\n",
|
||||
"- Dictionary Lookup: Provides alternative translations for a word and a small number of idiomatic phrases. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/DictionaryLookup.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.DictionaryLookup))\n",
|
||||
"- Dictionary Examples: Provides examples that show how terms in the dictionary are used in context. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/DictionaryExamples.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.DictionaryExamples))\n",
|
||||
"- Document Translation: Translates documents across all supported languages and dialects while preserving document structure and data format. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/translate/DocumentTranslator.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.translate.html#module-synapse.ml.services.translate.DocumentTranslator))\n",
|
||||
"\n",
|
||||
"### Document Intelligence\n",
|
||||
"[**Document Intelligence**](https://azure.microsoft.com/products/ai-services/ai-document-intelligence/)\n",
|
||||
"- Analyze Layout: Extract text and layout information from a given document. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeLayout.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeLayout))\n",
|
||||
"- Analyze Receipts: Detects and extracts data from receipts using optical character recognition (OCR) and our receipt model, enabling you to easily extract structured data from receipts such as merchant name, merchant phone number, transaction date, transaction total, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeReceipts.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeReceipts))\n",
|
||||
"- Analyze Business Cards: Detects and extracts data from business cards using optical character recognition (OCR) and our business card model, enabling you to easily extract structured data from business cards such as contact names, company names, phone numbers, emails, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeBusinessCards.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeBusinessCards))\n",
|
||||
"- Analyze Invoices: Detects and extracts data from invoices using optical character recognition (OCR) and our invoice understanding deep learning models, enabling you to easily extract structured data from invoices such as customer, vendor, invoice ID, invoice due date, total, invoice amount due, tax amount, ship to, bill to, line items and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeInvoices.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeInvoices))\n",
|
||||
"- Analyze ID Documents: Detects and extracts data from identification documents using optical character recognition (OCR) and our ID document model, enabling you to easily extract structured data from ID documents such as first name, last name, date of birth, document number, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeIDDocuments.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeIDDocuments))\n",
|
||||
"- Analyze Custom Form: Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeCustomModel))\n",
|
||||
"- Get Custom Model: Get detailed information about a custom model. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/GetCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/ListCustomModels.html))\n",
|
||||
"- List Custom Models: Get information about all custom models. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/form/ListCustomModels.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.ListCustomModels))\n",
|
||||
"- Analyze Layout: Extract text and layout information from a given document. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeLayout.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeLayout))\n",
|
||||
"- Analyze Receipts: Detects and extracts data from receipts using optical character recognition (OCR) and our receipt model, enabling you to easily extract structured data from receipts such as merchant name, merchant phone number, transaction date, transaction total, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeReceipts.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeReceipts))\n",
|
||||
"- Analyze Business Cards: Detects and extracts data from business cards using optical character recognition (OCR) and our business card model, enabling you to easily extract structured data from business cards such as contact names, company names, phone numbers, emails, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeBusinessCards.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeBusinessCards))\n",
|
||||
"- Analyze Invoices: Detects and extracts data from invoices using optical character recognition (OCR) and our invoice understanding deep learning models, enabling you to easily extract structured data from invoices such as customer, vendor, invoice ID, invoice due date, total, invoice amount due, tax amount, ship to, bill to, line items and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeInvoices.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeInvoices))\n",
|
||||
"- Analyze ID Documents: Detects and extracts data from identification documents using optical character recognition (OCR) and our ID document model, enabling you to easily extract structured data from ID documents such as first name, last name, date of birth, document number, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeIDDocuments.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeIDDocuments))\n",
|
||||
"- Analyze Custom Form: Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/AnalyzeCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.AnalyzeCustomModel))\n",
|
||||
"- Get Custom Model: Get detailed information about a custom model. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/GetCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/ListCustomModels.html))\n",
|
||||
"- List Custom Models: Get information about all custom models. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/form/ListCustomModels.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.ListCustomModels))\n",
|
||||
"\n",
|
||||
"### Decision\n",
|
||||
"[**Anomaly Detector**](https://azure.microsoft.com/products/ai-services/ai-anomaly-detector)\n",
|
||||
"- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/anomaly/DetectLastAnomaly.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.anomaly.html#module-synapse.ml.services.anomaly.DetectLastAnomaly))\n",
|
||||
"- Find anomalies: generates a model using an entire series and finds anomalies in the series ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/anomaly/DetectAnomalies.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.anomaly.html#module-synapse.ml.services.anomaly.DetectAnomalies))\n",
|
||||
"- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/anomaly/DetectLastAnomaly.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.anomaly.html#module-synapse.ml.services.anomaly.DetectLastAnomaly))\n",
|
||||
"- Find anomalies: generates a model using an entire series and finds anomalies in the series ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/anomaly/DetectAnomalies.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.anomaly.html#module-synapse.ml.services.anomaly.DetectAnomalies))\n",
|
||||
"\n",
|
||||
"### Search\n",
|
||||
"- [**Bing Image search**](https://azure.microsoft.com/services/services-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/bing/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.bing.html#module-synapse.ml.services.bing.BingImageSearch))\n",
|
||||
"- [**Azure Cognitive search**](https://docs.microsoft.com/azure/search/search-what-is-azure-search) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/com/microsoft/azure/synapse/ml/services/search/AzureSearchWriter$.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.search.html#module-synapse.ml.services.search.AzureSearchWriter))"
|
||||
"- [**Bing Image search**](https://azure.microsoft.com/services/services-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/bing/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.bing.html#module-synapse.ml.services.bing.BingImageSearch))\n",
|
||||
"- [**Azure Cognitive search**](https://docs.microsoft.com/azure/search/search-what-is-azure-search) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/com/microsoft/azure/synapse/ml/services/search/AzureSearchWriter$.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.search.html#module-synapse.ml.services.search.AzureSearchWriter))"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -21,12 +21,12 @@ Restarting the cluster automatically installs horovod v0.25.0 with pytorch_light
|
|||
You could install the single synapseml-deep-learning wheel package to get the full functionality of deep vision classification.
|
||||
Run the following command:
|
||||
```powershell
|
||||
pip install synapseml==1.0.6
|
||||
pip install synapseml==1.0.7
|
||||
```
|
||||
|
||||
An alternative is installing the SynapseML jar package in library management section, by adding:
|
||||
```
|
||||
Coordinate: com.microsoft.azure:synapseml_2.12:1.0.6
|
||||
Coordinate: com.microsoft.azure:synapseml_2.12:1.0.7
|
||||
Repository: https://mmlspark.azureedge.net/maven
|
||||
```
|
||||
:::note
|
||||
|
|
|
@ -266,7 +266,7 @@
|
|||
"source": [
|
||||
"## 4 - Simplify form recognition output\n",
|
||||
"\n",
|
||||
"This code uses the [FormOntologyLearner](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.FormOntologyTransformer), a transformer that analyzes the output of Form Recognizer transformers (for Azure AI Document Intelligence) and infers a tabular data structure. The output of AnalyzeInvoices is dynamic and varies based on the features detected in your content.\n",
|
||||
"This code uses the [FormOntologyLearner](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.services.form.html#module-synapse.ml.services.form.FormOntologyTransformer), a transformer that analyzes the output of Form Recognizer transformers (for Azure AI Document Intelligence) and infers a tabular data structure. The output of AnalyzeInvoices is dynamic and varies based on the features detected in your content.\n",
|
||||
"\n",
|
||||
"FormOntologyLearner extends the utility of the AnalyzeInvoices transformer by looking for patterns that can be used to create a tabular data structure. Organizing the output into multiple columns and rows makes for simpler downstream analysis."
|
||||
]
|
||||
|
|
|
@ -18,50 +18,50 @@ sidebar_label: CyberML
|
|||
In other words, it returns a sample from the complement set.
|
||||
|
||||
## feature engineering: [indexers.py](https://github.com/microsoft/SynapseML/blob/master/core/src/main/python/synapse/ml/cyber/feature/indexers.py)
|
||||
1. [IdIndexer](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.IdIndexer)
|
||||
1. [IdIndexer](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.IdIndexer)
|
||||
is a SparkML [Estimator](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Estimator.html).
|
||||
Given a dataframe, it creates an IdIndexerModel (described next) for categorical features. The model
|
||||
maps each partition and column seen in the given dataframe to an ID,
|
||||
for each partition or one consecutive range for all partition and column values.
|
||||
2. [IdIndexerModel](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.IdIndexerModel)
|
||||
2. [IdIndexerModel](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.IdIndexerModel)
|
||||
is a SparkML [Transformer](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Transformer.html).
|
||||
Given a dataframe maps each partition and column field to a consecutive integer ID.
|
||||
Partitions or column values not encountered in the estimator are mapped to 0.
|
||||
The model can operate in two modes, either create consecutive integer ID independently
|
||||
3. [MultiIndexer](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.MultiIndexer)
|
||||
3. [MultiIndexer](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.MultiIndexer)
|
||||
is a SparkML [Estimator](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Estimator.html).
|
||||
Uses multiple IdIndexers to generate a MultiIndexerModel (described next) for categorical features. The model
|
||||
contains multiple IdIndexers for multiple partitions and columns.
|
||||
4. [MultiIndexerModel](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.MultiIndexerModel)
|
||||
4. [MultiIndexerModel](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.indexers.MultiIndexerModel)
|
||||
is a SparkML [Transformer](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Transformer.html).
|
||||
Given a dataframe maps each partition and column field to a consecutive integer ID.
|
||||
Partitions or column values not encountered in the estimator are mapped to 0.
|
||||
The model can operate in two modes, either create consecutive integer ID independently
|
||||
|
||||
## feature engineering: [scalers.py](https://github.com/microsoft/SynapseML/blob/master/core/src/main/python/synapse/ml/cyber/feature/scalers.py)
|
||||
1. [StandardScalarScaler](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.StandardScalarScaler)
|
||||
1. [StandardScalarScaler](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.StandardScalarScaler)
|
||||
is a SparkML [Estimator](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Estimator.html).
|
||||
Given a dataframe it creates a StandardScalarScalerModel (described next) which normalizes
|
||||
any given dataframe according to the mean and standard deviation calculated on the
|
||||
dataframe given to the estimator.
|
||||
2. [StandardScalarScalerModel](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.StandardScalarScalerModel)
|
||||
2. [StandardScalarScalerModel](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.StandardScalarScalerModel)
|
||||
is a SparkML [Transformer](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Transformer.html).
|
||||
Given a dataframe with a value column x, the transformer changes its value as follows:
|
||||
x'=(x-mean)/stddev. That is, if the transformer is given the same dataframe the estimator
|
||||
was given then the value column will have a mean of 0.0 and a standard deviation of 1.0.
|
||||
3. [LinearScalarScaler](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.LinearScalarScaler)
|
||||
3. [LinearScalarScaler](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.LinearScalarScaler)
|
||||
is a SparkML [Estimator](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Estimator.html).
|
||||
Given a dataframe it creates a LinearScalarScalerModel (described next) which normalizes
|
||||
any given dataframe according to the minimum and maximum values calculated on the
|
||||
dataframe given to the estimator.
|
||||
4. [LinearScalarScalerModel](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.LinearScalarScalerModel)
|
||||
4. [LinearScalarScalerModel](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.feature.html#synapse.ml.cyber.feature.scalers.LinearScalarScalerModel)
|
||||
is a SparkML [Transformer](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Transformer.html).
|
||||
Given a dataframe with a value column x, the transformer changes its value such that
|
||||
if the transformer is given the same dataframe the estimator
|
||||
was given then the value column will be scaled linearly to the given ranges.
|
||||
|
||||
## access anomalies: [collaborative_filtering.py](https://github.com/microsoft/SynapseML/blob/master/core/src/main/python/synapse/ml/cyber/anomaly/collaborative_filtering.py)
|
||||
1. [AccessAnomaly](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.AccessAnomaly)
|
||||
1. [AccessAnomaly](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.AccessAnomaly)
|
||||
is a SparkML [Estimator](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Estimator.html).
|
||||
Given a dataframe, the estimator generates an AccessAnomalyModel (described next). The model
|
||||
can detect anomalous access of users to resources where the access
|
||||
|
@ -69,14 +69,14 @@ sidebar_label: CyberML
|
|||
a resource from Finance. This result is based solely on access patterns rather than explicit features.
|
||||
Internally, the code is based on Collaborative Filtering as implemented in Spark, using
|
||||
Matrix Factorization with Alternating Least Squares.
|
||||
2. [AccessAnomalyModel](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.AccessAnomalyModel)
|
||||
2. [AccessAnomalyModel](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.AccessAnomalyModel)
|
||||
is a SparkML [Transformer](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Transformer.html).
|
||||
Given a dataframe the transformer computes a value between (-inf, inf) where positive
|
||||
values indicate an anomaly score. Anomaly scores are computed to have a mean of 1.0
|
||||
and a standard deviation of 1.0 over the original dataframe given to the estimator.
|
||||
3. [ModelNormalizeTransformer](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.ModelNormalizeTransformer)
|
||||
3. [ModelNormalizeTransformer](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.ModelNormalizeTransformer)
|
||||
is a SparkML [Transformer](https://spark.apache.org/docs/2.2.0/api/java/index.html?org/apache/spark/ml/Transformer.html).
|
||||
This transformer is used internally by AccessAnomaly to normalize a model to generate
|
||||
anomaly scores with mean 0.0 and standard deviation of 1.0.
|
||||
4. [AccessAnomalyConfig](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.AccessAnomalyConfig)
|
||||
4. [AccessAnomalyConfig](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.cyber.anomaly.html#synapse.ml.cyber.anomaly.collaborative_filtering.AccessAnomalyConfig)
|
||||
contains the default values for AccessAnomaly.
|
||||
|
|
|
@ -34,7 +34,7 @@
|
|||
"# Create an Azure Databricks cluster and install the following libs\n",
|
||||
"\n",
|
||||
"1. In Cluster Libraries install from library source Maven:\n",
|
||||
"Coordinates: com.microsoft.azure:synapseml_2.12:1.0.6\n",
|
||||
"Coordinates: com.microsoft.azure:synapseml_2.12:1.0.7\n",
|
||||
"Repository: https://mmlspark.azureedge.net/maven\n",
|
||||
"\n",
|
||||
"2. In Cluster Libraries install from PyPI the library called plotly"
|
||||
|
|
|
@ -16,11 +16,11 @@
|
|||
"\n",
|
||||
"This sample demonstrates how to use the following APIs:\n",
|
||||
"- [`TrainRegressor`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.train.html?#module-synapse.ml.train.TrainRegressor)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.train.html?#module-synapse.ml.train.TrainRegressor)\n",
|
||||
"- [`ComputePerInstanceStatistics`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.train.html?#module-synapse.ml.train.ComputePerInstanceStatistics)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.train.html?#module-synapse.ml.train.ComputePerInstanceStatistics)\n",
|
||||
"- [`DataConversion`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.featurize.html?#module-synapse.ml.featurize.DataConversion)"
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.featurize.html?#module-synapse.ml.featurize.DataConversion)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -15,15 +15,15 @@
|
|||
"\n",
|
||||
"This sample demonstrates the use of several members of the synapseml library:\n",
|
||||
"- [`TrainRegressor`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.train.html?#module-synapse.ml.train.TrainRegressor)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.train.html?#module-synapse.ml.train.TrainRegressor)\n",
|
||||
"- [`SummarizeData`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.stages.html?#module-synapse.ml.stages.SummarizeData)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.stages.html?#module-synapse.ml.stages.SummarizeData)\n",
|
||||
"- [`CleanMissingData`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.featurize.html?#module-synapse.ml.featurize.CleanMissingData)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.featurize.html?#module-synapse.ml.featurize.CleanMissingData)\n",
|
||||
"- [`ComputeModelStatistics`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.train.html?#module-synapse.ml.train.ComputeModelStatistics)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.train.html?#module-synapse.ml.train.ComputeModelStatistics)\n",
|
||||
"- [`FindBestModel`\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/synapse.ml.automl.html?#module-synapse.ml.automl.FindBestModel)\n",
|
||||
" ](https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/synapse.ml.automl.html?#module-synapse.ml.automl.FindBestModel)\n",
|
||||
"\n",
|
||||
"First, import the pandas package so that we can read and parse the datafile\n",
|
||||
"using `pandas.read_csv()`"
|
||||
|
|
|
@ -32,7 +32,7 @@ For Spark3.4 pools
|
|||
{
|
||||
"name": "synapseml",
|
||||
"conf": {
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.6",
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.7",
|
||||
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
|
||||
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
|
||||
"spark.yarn.user.classpath.first": "true",
|
||||
|
@ -64,8 +64,8 @@ installed via pip with `pip install pyspark`.
|
|||
```python
|
||||
import pyspark
|
||||
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.6 version for Spark3.4
|
||||
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.6") \
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.7 version for Spark3.4
|
||||
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.7") \
|
||||
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") \
|
||||
.getOrCreate()
|
||||
import synapse.ml
|
||||
|
@ -78,8 +78,8 @@ your `build.sbt`:
|
|||
|
||||
```scala
|
||||
resolvers += "SynapseML" at "https://mmlspark.azureedge.net/maven"
|
||||
// Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.6 version for Spark3.4
|
||||
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "1.0.6"
|
||||
// Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.7 version for Spark3.4
|
||||
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "1.0.7"
|
||||
```
|
||||
|
||||
## Spark package
|
||||
|
@ -88,10 +88,10 @@ SynapseML can be conveniently installed on existing Spark clusters via the
|
|||
`--packages` option, examples:
|
||||
|
||||
```bash
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.6 version for Spark3.4
|
||||
spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.6
|
||||
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.6
|
||||
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.6 MyApp.jar
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.7 version for Spark3.4
|
||||
spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.7
|
||||
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.7
|
||||
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.7 MyApp.jar
|
||||
```
|
||||
|
||||
A similar technique can be used in other Spark contexts too. For example, you can use SynapseML
|
||||
|
@ -106,7 +106,7 @@ cloud](http://community.cloud.databricks.com), create a new [library from Maven
|
|||
coordinates](https://docs.databricks.com/user-guide/libraries.html#libraries-from-maven-pypi-or-spark-packages)
|
||||
in your workspace.
|
||||
|
||||
For the coordinates use: `com.microsoft.azure:synapseml_2.12:1.0.6` for Spark3.4 Cluster and
|
||||
For the coordinates use: `com.microsoft.azure:synapseml_2.12:1.0.7` for Spark3.4 Cluster and
|
||||
`com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3` for Spark3.3 Cluster;
|
||||
Add the resolver: `https://mmlspark.azureedge.net/maven`. Ensure this library is
|
||||
attached to your target cluster(s).
|
||||
|
@ -115,7 +115,7 @@ Finally, ensure that your Spark cluster has at least Spark 3.2 and Scala 2.12.
|
|||
|
||||
You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks, import the following databricks archive:
|
||||
|
||||
`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.6.dbc`
|
||||
`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.7.dbc`
|
||||
|
||||
## Apache Livy and HDInsight
|
||||
|
||||
|
@ -128,8 +128,8 @@ Excluding certain packages from the library may be necessary due to current issu
|
|||
{
|
||||
"name": "synapseml",
|
||||
"conf": {
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.6 version for Spark3.4
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.6",
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.7 version for Spark3.4
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.7",
|
||||
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind"
|
||||
}
|
||||
}
|
||||
|
@ -142,8 +142,8 @@ In Azure Synapse, "spark.yarn.user.classpath.first" should be set to "true" to o
|
|||
{
|
||||
"name": "synapseml",
|
||||
"conf": {
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.6 version for Spark3.4
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.6",
|
||||
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.7 version for Spark3.4
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.7",
|
||||
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
|
||||
"spark.yarn.user.classpath.first": "true"
|
||||
}
|
||||
|
|
|
@ -32,7 +32,7 @@ You can now select one of the sample notebooks and run it, or create your own.
|
|||
In the preceding docker command, `mcr.microsoft.com/mmlspark/release` specifies the project and image name that you
|
||||
want to run. There's another component implicit here: the _tsag_ (=
|
||||
version) that you want to use. Specifying it explicitly looks like
|
||||
`mcr.microsoft.com/mmlspark/release:1.0.6` for the `1.0.6` tag.
|
||||
`mcr.microsoft.com/mmlspark/release:1.0.7` for the `1.0.7` tag.
|
||||
|
||||
Leaving `mcr.microsoft.com/mmlspark/release` by itself has an implicit `latest` tag, so it's
|
||||
equivalent to `mcr.microsoft.com/mmlspark/release:latest`. The `latest` tag is identical to the
|
||||
|
@ -48,7 +48,7 @@ that you'll probably want to use can look as follows:
|
|||
docker run -it --rm \
|
||||
-p 127.0.0.1:80:8888 \
|
||||
-v ~/myfiles:/notebooks/myfiles \
|
||||
mcr.microsoft.com/mmlspark/release:1.0.6
|
||||
mcr.microsoft.com/mmlspark/release:1.0.7
|
||||
```
|
||||
|
||||
In this example, backslashes are for readability; you
|
||||
|
@ -58,7 +58,7 @@ path and line breaks looks a little different:
|
|||
docker run -it --rm `
|
||||
-p 127.0.0.1:80:8888 `
|
||||
-v C:\myfiles:/notebooks/myfiles `
|
||||
mcr.microsoft.com/mmlspark/release:1.0.6
|
||||
mcr.microsoft.com/mmlspark/release:1.0.7
|
||||
|
||||
Let's break this command and go over the meaning of each part:
|
||||
|
||||
|
@ -141,7 +141,7 @@ Let's break this command and go over the meaning of each part:
|
|||
model.write().overwrite().save('myfiles/myTrainedModel.mml')
|
||||
```
|
||||
|
||||
- **`mcr.microsoft.com/mmlspark/release:1.0.6`**
|
||||
- **`mcr.microsoft.com/mmlspark/release:1.0.7`**
|
||||
|
||||
Finally, this argument specifies an explicit version tag for the image that we want to
|
||||
run.
|
||||
|
|
|
@ -37,7 +37,7 @@ for a Windows x64 machine or jdk-8u231-macosx-x64.dmg for macOS. Then, use the c
|
|||
### 3. Install Apache Spark
|
||||
|
||||
[Download and install Apache Spark](https://spark.apache.org/downloads.html) with version >= 3.2.0.
|
||||
(SynapseML v1.0.6 only supports spark version >= 3.2.0)
|
||||
(SynapseML v1.0.7 only supports spark version >= 3.2.0)
|
||||
|
||||
Extract downloaded zipped files (with 7-Zip app on Windows or `tar` on linux) and remember the location of
|
||||
extracted files, we take `~/bin/spark-3.2.0-bin-hadoop3.2/` as an example here.
|
||||
|
@ -127,7 +127,7 @@ In your command prompt or terminal, run the following command:
|
|||
dotnet add package Microsoft.Spark --version 2.1.1
|
||||
```
|
||||
:::note
|
||||
This tutorial uses Microsoft.Spark version 2.1.1 as SynapseML 1.0.6 depends on it.
|
||||
This tutorial uses Microsoft.Spark version 2.1.1 as SynapseML 1.0.7 depends on it.
|
||||
Change to corresponding version if necessary.
|
||||
:::
|
||||
|
||||
|
@ -137,7 +137,7 @@ In your command prompt or terminal, run the following command:
|
|||
```powershell
|
||||
# Update Nuget Config to include SynapseML Feed
|
||||
dotnet nuget add source https://mmlspark.blob.core.windows.net/synapsemlnuget/index.json -n SynapseMLFeed
|
||||
dotnet add package SynapseML.Cognitive --version 1.0.6
|
||||
dotnet add package SynapseML.Cognitive --version 1.0.7
|
||||
```
|
||||
The `dotnet nuget add` command adds SynapseML's resolver to the source, so that our package can be found.
|
||||
|
||||
|
@ -202,7 +202,7 @@ namespace SynapseMLApp
|
|||
of Apache Spark applications, which manages the context and information of your application. A DataFrame is a way of organizing
|
||||
data into a set of named columns.
|
||||
|
||||
Create a [TextSentiment](https://mmlspark.blob.core.windows.net/docs/1.0.6/dotnet/classSynapse_1_1ML_1_1Cognitive_1_1TextSentiment.html)
|
||||
Create a [TextSentiment](https://mmlspark.blob.core.windows.net/docs/1.0.7/dotnet/classSynapse_1_1ML_1_1Cognitive_1_1TextSentiment.html)
|
||||
instance, set corresponding subscription key and other configurations. Then, apply transformation to the dataframe,
|
||||
which analyzes the sentiment based on each row, and stores result into output column.
|
||||
|
||||
|
@ -218,9 +218,9 @@ dotnet build
|
|||
Navigate to your build output directory. For example, in Windows you could run `cd bin\Debug\net5.0`.
|
||||
Use the spark-submit command to submit your application to run on Apache Spark.
|
||||
```powershell
|
||||
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --packages com.microsoft.azure:synapseml_2.12:1.0.6 --master local microsoft-spark-3-2_2.12-2.1.1.jar dotnet SynapseMLApp.dll
|
||||
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --packages com.microsoft.azure:synapseml_2.12:1.0.7 --master local microsoft-spark-3-2_2.12-2.1.1.jar dotnet SynapseMLApp.dll
|
||||
```
|
||||
`--packages com.microsoft.azure:synapseml_2.12:1.0.6` specifies the dependency on synapseml_2.12 version 1.0.6;
|
||||
`--packages com.microsoft.azure:synapseml_2.12:1.0.7` specifies the dependency on synapseml_2.12 version 1.0.7;
|
||||
`microsoft-spark-3-2_2.12-2.1.1.jar` specifies Microsoft.Spark version 2.1.1 and Spark version 3.2
|
||||
:::note
|
||||
This command assumes you have downloaded Apache Spark and added it to your PATH environment variable so that you can use spark-submit.
|
||||
|
@ -238,7 +238,7 @@ When your app runs, the sentiment analysis result is written to the console.
|
|||
+-----------------------------------------+--------+-----+--------------------------------------------------+
|
||||
```
|
||||
Congratulations! You successfully authored and ran a .NET for SynapseML app.
|
||||
Refer to the [developer docs](https://mmlspark.blob.core.windows.net/docs/1.0.6/dotnet/index.html) for API guidance.
|
||||
Refer to the [developer docs](https://mmlspark.blob.core.windows.net/docs/1.0.7/dotnet/index.html) for API guidance.
|
||||
|
||||
## Next
|
||||
|
||||
|
|
|
@ -13,8 +13,8 @@ Make sure you have followed the guidance in [.NET installation](../Dotnet%20Setu
|
|||
Install NuGet packages by running following command:
|
||||
```powershell
|
||||
dotnet add package Microsoft.Spark --version 2.1.1
|
||||
dotnet add package SynapseML.Lightgbm --version 1.0.6
|
||||
dotnet add package SynapseML.Core --version 1.0.6
|
||||
dotnet add package SynapseML.Lightgbm --version 1.0.7
|
||||
dotnet add package SynapseML.Core --version 1.0.7
|
||||
```
|
||||
|
||||
Use the following code in your main program file:
|
||||
|
@ -91,7 +91,7 @@ namespace SynapseMLApp
|
|||
|
||||
Run `dotnet build` to build the project. Then navigate to build output directory, and run following command:
|
||||
```powershell
|
||||
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --packages com.microsoft.azure:synapseml_2.12:1.0.6,org.apache.hadoop:hadoop-azure:3.3.1 --master local microsoft-spark-3-2_2.12-2.1.1.jar dotnet SynapseMLApp.dll
|
||||
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --packages com.microsoft.azure:synapseml_2.12:1.0.7,org.apache.hadoop:hadoop-azure:3.3.1 --master local microsoft-spark-3-2_2.12-2.1.1.jar dotnet SynapseMLApp.dll
|
||||
```
|
||||
:::note
|
||||
Here we added two packages: synapseml_2.12 for SynapseML's scala source, and hadoop-azure to support reading files from ADLS.
|
||||
|
|
|
@ -55,7 +55,7 @@ Installing all dependencies may be time-consuming. When complete, run:
|
|||
library(sparklyr)
|
||||
library(dplyr)
|
||||
config <- spark_config()
|
||||
config$sparklyr.defaultPackages <- "com.microsoft.azure:synapseml_2.12:1.0.6"
|
||||
config$sparklyr.defaultPackages <- "com.microsoft.azure:synapseml_2.12:1.0.7"
|
||||
sc <- spark_connect(master = "local", config = config)
|
||||
...
|
||||
```
|
||||
|
@ -120,7 +120,7 @@ and then use spark_connect with method = "databricks":
|
|||
|
||||
```R
|
||||
install.packages("devtools")
|
||||
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-1.0.6.zip")
|
||||
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-1.0.7.zip")
|
||||
library(sparklyr)
|
||||
library(dplyr)
|
||||
sc <- spark_connect(method = "databricks")
|
||||
|
|
2
start
2
start
|
@ -4,7 +4,7 @@ export OPENMPI_VERSION="3.1.2"
|
|||
|
||||
export SPARK_VERSION="3.4.1"
|
||||
export HADOOP_VERSION="3.3"
|
||||
export SYNAPSEML_VERSION="1.0.6" # Binder compatibility version
|
||||
export SYNAPSEML_VERSION="1.0.7" # Binder compatibility version
|
||||
|
||||
echo "Beginning Spark Session..."
|
||||
exec "$@"
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
FROM mcr.microsoft.com/oss/mirror/docker.io/library/ubuntu:20.04
|
||||
|
||||
ARG SYNAPSEML_VERSION=1.0.6
|
||||
ARG SYNAPSEML_VERSION=1.0.7
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
ENV SPARK_VERSION=3.4.1
|
||||
|
|
|
@ -15,9 +15,9 @@ docker build . --build-arg SYNAPSEML_VERSION=<YOUR-VERSION-HERE> -f tools/docker
|
|||
|
||||
eg.
|
||||
|
||||
For building image with SynapseML version 1.0.6, run:
|
||||
For building image with SynapseML version 1.0.7, run:
|
||||
```
|
||||
docker build . --build-arg SYNAPSEML_VERSION=1.0.6 -f tools/docker/demo/Dockerfile -t synapseml:1.0.6
|
||||
docker build . --build-arg SYNAPSEML_VERSION=1.0.7 -f tools/docker/demo/Dockerfile -t synapseml:1.0.7
|
||||
```
|
||||
|
||||
# Run the image
|
||||
|
|
|
@ -27,7 +27,7 @@ syanpseMLConf = pyspark.SparkConf().setAll(
|
|||
(
|
||||
"spark.jars.packages",
|
||||
"com.microsoft.azure:synapseml_2.12:"
|
||||
+ os.getenv("SYNAPSEML_VERSION", "1.0.6")
|
||||
+ os.getenv("SYNAPSEML_VERSION", "1.0.7")
|
||||
+ ",org.apache.hadoop:hadoop-azure:2.7.0,org.apache.hadoop:hadoop-common:2.7.0,com.microsoft.azure:azure-storage:2.0.0",
|
||||
),
|
||||
(
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
FROM mcr.microsoft.com/oss/mirror/docker.io/library/ubuntu:20.04
|
||||
|
||||
ARG SYNAPSEML_VERSION=1.0.6
|
||||
ARG SYNAPSEML_VERSION=1.0.7
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
ENV SPARK_VERSION=3.4.1
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
const math = require('remark-math')
|
||||
const katex = require('rehype-katex')
|
||||
const path = require('path');
|
||||
let version = "1.0.6";
|
||||
let version = "1.0.7";
|
||||
|
||||
module.exports = {
|
||||
title: 'SynapseML',
|
||||
|
@ -13,7 +13,7 @@ module.exports = {
|
|||
projectName: 'SynapseML',
|
||||
trailingSlash: true,
|
||||
customFields: {
|
||||
version: "1.0.6",
|
||||
version: "1.0.7",
|
||||
},
|
||||
stylesheets: [
|
||||
{
|
||||
|
@ -88,11 +88,11 @@ module.exports = {
|
|||
},
|
||||
{
|
||||
label: 'Python API Reference',
|
||||
to: 'https://mmlspark.blob.core.windows.net/docs/1.0.6/pyspark/index.html',
|
||||
to: 'https://mmlspark.blob.core.windows.net/docs/1.0.7/pyspark/index.html',
|
||||
},
|
||||
{
|
||||
label: 'Scala API Reference',
|
||||
to: 'https://mmlspark.blob.core.windows.net/docs/1.0.6/scala/index.html',
|
||||
to: 'https://mmlspark.blob.core.windows.net/docs/1.0.7/scala/index.html',
|
||||
},
|
||||
],
|
||||
},
|
||||
|
|
|
@ -268,7 +268,7 @@ function Home() {
|
|||
{
|
||||
"name": "synapseml",
|
||||
"conf": {
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.6",
|
||||
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.7",
|
||||
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
|
||||
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
|
||||
"spark.yarn.user.classpath.first": "true",
|
||||
|
@ -314,9 +314,9 @@ function Home() {
|
|||
SynapseML can be conveniently installed on existing Spark
|
||||
clusters via the --packages option, examples:
|
||||
<CodeSnippet
|
||||
snippet={`spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.6 # Please use 1.0.6 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3
|
||||
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.6
|
||||
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.6 MyApp.jar `}
|
||||
snippet={`spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.7 # Please use 1.0.7 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3
|
||||
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.7
|
||||
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.7 MyApp.jar `}
|
||||
lang="bash"
|
||||
></CodeSnippet>
|
||||
This can be used in other Spark contexts too. For example, you
|
||||
|
@ -344,7 +344,7 @@ spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.6 MyApp.jar `}
|
|||
<p>For the coordinates:</p>
|
||||
Spark 3.4 Cluster:
|
||||
<CodeSnippet
|
||||
snippet={`com.microsoft.azure:synapseml_2.12:1.0.6`}
|
||||
snippet={`com.microsoft.azure:synapseml_2.12:1.0.7`}
|
||||
lang="bash"
|
||||
></CodeSnippet>
|
||||
Spark 3.3 Cluster:
|
||||
|
@ -367,7 +367,7 @@ spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.6 MyApp.jar `}
|
|||
notebooks. To get started with our example notebooks import
|
||||
the following databricks archive:
|
||||
<CodeSnippet
|
||||
snippet={`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.6.dbc`}
|
||||
snippet={`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.7.dbc`}
|
||||
lang="bash"
|
||||
></CodeSnippet>
|
||||
</TabItem>
|
||||
|
@ -405,7 +405,7 @@ spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.6 MyApp.jar `}
|
|||
<CodeSnippet
|
||||
snippet={`import pyspark
|
||||
spark = (pyspark.sql.SparkSession.builder.appName("MyApp")
|
||||
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.6") # Please use 1.0.6 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3
|
||||
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.7") # Please use 1.0.7 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3
|
||||
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
|
||||
.getOrCreate())
|
||||
import synapse.ml`}
|
||||
|
@ -417,7 +417,7 @@ import synapse.ml`}
|
|||
following lines to your build.sbt:
|
||||
<CodeSnippet
|
||||
snippet={`resolvers += "SynapseML" at "https://mmlspark.azureedge.net/maven"
|
||||
libraryDependencies += "com.microsoft.azure" %% "synapseml_2.12" % "1.0.6" // Please use 1.0.6 version for Spark3.2 and 1.0.6-spark3.3 version for Spark3.3`}
|
||||
libraryDependencies += "com.microsoft.azure" %% "synapseml_2.12" % "1.0.7" // Please use 1.0.7 version for Spark3.2 and 1.0.7-spark3.3 version for Spark3.3`}
|
||||
lang="jsx"
|
||||
></CodeSnippet>
|
||||
</TabItem>
|
||||
|
|
Загрузка…
Ссылка в новой задаче