replacing references to SparkCLR after repo rename

This commit is contained in:
skaarthik 2016-03-07 11:42:07 -08:00
Родитель cc9f61e5a4
Коммит b7d056cc6c
12 изменённых файлов: 178 добавлений и 178 удалений

Просмотреть файл

@ -1,6 +1,6 @@
# Mobius: C# API for Spark
[Mobius](https://github.com/Microsoft/SparkCLR) adds C# language binding to [Apache Spark](https://spark.apache.org/), enabling the implementation of Spark driver code and data processing operations in C#.
[Mobius](https://github.com/Microsoft/Mobius) adds C# language binding to [Apache Spark](https://spark.apache.org/), enabling the implementation of Spark driver code and data processing operations in C#.
For example, the word count sample in Apache Spark can be implemented in C# as follows :
@ -49,11 +49,11 @@ maxLatencyByDcDataFrame.ShowSchema();
maxLatencyByDcDataFrame.Show();
```
Refer to [SparkCLR\csharp\Samples](csharp/Samples) directory and [sample usage](csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md) for complete samples.
Refer to [Mobius\csharp\Samples](csharp/Samples) directory and [sample usage](csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md) for complete samples.
## API Documentation
Refer to [Mobius C# API documentation](csharp/Adapter/documentation/SparkCLR_API_Documentation.md) for the list of Spark's data processing operations supported in SparkCLR.
Refer to [Mobius C# API documentation](csharp/Adapter/documentation/SparkCLR_API_Documentation.md) for the list of Spark's data processing operations supported in Mobius.
## API Usage
@ -89,25 +89,25 @@ Note: Refer to [linux-compatibility.md](notes/linux-compatibility.md) for using
## Supported Spark Versions
Mobius is built and tested with [Spark 1.4.1](https://github.com/Microsoft/SparkCLR/tree/branch-1.4), [Spark 1.5.2](https://github.com/Microsoft/SparkCLR/tree/branch-1.5) and [Spark 1.6.0](https://github.com/Microsoft/SparkCLR/tree/master).
Mobius is built and tested with [Spark 1.4.1](https://github.com/Microsoft/Mobius/tree/branch-1.4), [Spark 1.5.2](https://github.com/Microsoft/Mobius/tree/branch-1.5) and [Spark 1.6.0](https://github.com/Microsoft/Mobius/tree/master).
## License
[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=plastic)](https://github.com/Microsoft/SparkCLR/blob/master/LICENSE)
[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=plastic)](https://github.com/Microsoft/Mobius/blob/master/LICENSE)
Mobius is licensed under the MIT license. See [LICENSE](LICENSE) file for full license information.
## Community
[![Issue Stats](http://issuestats.com/github/Microsoft/SparkCLR/badge/pr)](http://issuestats.com/github/Microsoft/SparkCLR)
[![Issue Stats](http://issuestats.com/github/Microsoft/SparkCLR/badge/issue)](http://issuestats.com/github/Microsoft/SparkCLR)
[![Join the chat at https://gitter.im/Microsoft/SparkCLR](https://badges.gitter.im/Microsoft/SparkCLR.svg)](https://gitter.im/Microsoft/SparkCLR?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Issue Stats](http://issuestats.com/github/Microsoft/Mobius/badge/pr)](http://issuestats.com/github/Microsoft/Mobius)
[![Issue Stats](http://issuestats.com/github/Microsoft/Mobius/badge/issue)](http://issuestats.com/github/Microsoft/Mobius)
[![Join the chat at https://gitter.im/Microsoft/Mobius](https://badges.gitter.im/Microsoft/Mobius.svg)](https://gitter.im/Microsoft/Mobius?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
* Mobius project welcomes contributions. To contribute, follow the instructions in [CONTRIBUTING.md](notes/CONTRIBUTING.md)
* Options to ask your question to the Mobius community
* create issue on [GitHub](https://github.com/Microsoft/SparkCLR)
* create issue on [GitHub](https://github.com/Microsoft/Mobius)
* create post with "sparkclr" tag in [Stack Overflow](https://stackoverflow.com/questions/tagged/sparkclr)
* send email to sparkclr-user@googlegroups.com
* join chat at [Mobius room in Gitter](https://gitter.im/Microsoft/SparkCLR)
* join chat at [Mobius room in Gitter](https://gitter.im/Microsoft/Mobius)

Просмотреть файл

@ -154,7 +154,7 @@
</Target>
-->
<Target Name="AfterBuild">
<XslTransformation XslInputPath="..\documentation\DocFormatter.xsl" XmlInputPaths="..\documentation\Microsoft.Spark.CSharp.Adapter.Doc.XML" OutputPaths="..\documentation\SparkCLR_API_Documentation.md" Condition="'$(OS)' == 'Windows_NT'" />
<Exec Command="xsltproc -o ../documentation/SparkCLR_API_Documentation.md ../documentation/DocFormatter.xsl ../documentation/Microsoft.Spark.CSharp.Adapter.Doc.XML" Condition="'$(OS)' != 'Windows_NT'" />
<XslTransformation XslInputPath="..\documentation\DocFormatter.xsl" XmlInputPaths="..\documentation\Microsoft.Spark.CSharp.Adapter.Doc.XML" OutputPaths="..\documentation\Mobius_API_Documentation.md" Condition="'$(OS)' == 'Windows_NT'" />
<Exec Command="xsltproc -o ../documentation/Mobius_API_Documentation.md ../documentation/DocFormatter.xsl ../documentation/Microsoft.Spark.CSharp.Adapter.Doc.XML" Condition="'$(OS)' != 'Windows_NT'" />
</Target>
</Project>

Просмотреть файл

@ -2,7 +2,7 @@
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
##<center><H1><font color="darkorchid4">SparkCLR API Documentation<!--xsl:value-of select="$AssemblyName"/--></font></H1></center>
##<center><H1><font color="darkorchid4">Mobius API Documentation<!--xsl:value-of select="$AssemblyName"/--></font></H1></center>
<xsl:apply-templates select="//member[contains(@name,'T:') and not(contains(@name,'Helper')) and not(contains(@name,'Wrapper')) and not(contains(@name,'Configuration')) and not(contains(@name,'Proxy')) and not(contains(@name,'Interop')) and not(contains(@name,'Services'))]"/>
</xsl:template>

Просмотреть файл

@ -10,6 +10,21 @@
to be used in SparkCLR runtime
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Configuration.IConfigurationService">
<summary>
Helps getting config settings to be used in SparkCLR runtime
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Configuration.IConfigurationService.GetCSharpWorkerExePath">
<summary>
The full path of the CSharp external backend worker process executable.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Configuration.IConfigurationService.BackendPortNumber">
<summary>
The port number used for communicating with the CSharp external backend worker process.
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Configuration.ConfigurationService.SparkCLRConfiguration">
<summary>
Default configuration for SparkCLR jobs.
@ -41,21 +56,6 @@
The full path of the CSharp external backend worker process.
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Configuration.IConfigurationService">
<summary>
Helps getting config settings to be used in SparkCLR runtime
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Configuration.IConfigurationService.BackendPortNumber">
<summary>
The port number used for communicating with the CSharp external backend worker process.
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Configuration.IConfigurationService.GetCSharpWorkerExePath">
<summary>
The full path of the CSharp external backend worker process executable.
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.Accumulator">
<summary>
A shared variable that can be accumulated, i.e., has a commutative and associative "add"
@ -131,17 +131,17 @@
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.Broadcast`1.Value">
<summary>
Return the broadcasted value
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Core.Broadcast`1.Unpersist(System.Boolean)">
<summary>
Delete cached copies of this broadcast on the executors.
</summary>
<param name="blocking"></param>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.Broadcast`1.Value">
<summary>
Return the broadcasted value
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.Option`1">
<summary>
Container for an optional value of type T. If the value of type T is present, the Option.IsDefined is TRUE and GetValue() return the value.
@ -154,6 +154,11 @@
Used for collect operation on RDD
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.IRDDCollector">
<summary>
Interface for collect operation on RDD
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Core.DoubleRDDFunctions.Sum(Microsoft.Spark.CSharp.Core.RDD{System.Double})">
<summary>
Add up the elements in this RDD.
@ -266,11 +271,6 @@
<param name="self"></param>
<returns></returns>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.IRDDCollector">
<summary>
Interface for collect operation on RDD
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Core.OrderedRDDFunctions.SortByKey``2(Microsoft.Spark.CSharp.Core.RDD{System.Collections.Generic.KeyValuePair{``0,``1}},System.Boolean,System.Nullable{System.Int32})">
<summary>
Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling
@ -347,7 +347,7 @@
<summary>
Return an RDD with the keys of each tuple.
>>> m = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(1, 2), new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(3, 4) }, 1).Keys().Collect()
&gt;&gt;&gt; m = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(1, 2), new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(3, 4) }, 1).Keys().Collect()
[1, 3]
</summary>
<typeparam name="K"></typeparam>
@ -359,7 +359,7 @@
<summary>
Return an RDD with the values of each tuple.
>>> m = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(1, 2), new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(3, 4) }, 1).Values().Collect()
&gt;&gt;&gt; m = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(1, 2), new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(3, 4) }, 1).Values().Collect()
[2, 4]
</summary>
@ -384,7 +384,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.ReduceByKey((x, y) => x + y).Collect()
.ReduceByKey((x, y) =&gt; x + y).Collect()
[('a', 2), ('b', 1)]
@ -410,7 +410,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.ReduceByKeyLocally((x, y) => x + y).Collect()
.ReduceByKeyLocally((x, y) =&gt; x + y).Collect()
[('a', 2), ('b', 1)]
@ -431,7 +431,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.CountByKey((x, y) => x + y).Collect()
.CountByKey((x, y) =&gt; x + y).Collect()
[('a', 2), ('b', 1)]
@ -555,7 +555,7 @@
<summary>
Return a copy of the RDD partitioned using the specified partitioner.
sc.Parallelize(new[] { 1, 2, 3, 4, 2, 4, 1 }, 1).Map(x => new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(x, x)).PartitionBy(3).Glom().Collect()
sc.Parallelize(new[] { 1, 2, 3, 4, 2, 4, 1 }, 1).Map(x =&gt; new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(x, x)).PartitionBy(3).Glom().Collect()
</summary>
<param name="self"></param>
<param name="numPartitions"></param>
@ -587,7 +587,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.CombineByKey(() => string.Empty, (x, y) => x + y.ToString(), (x, y) => x + y).Collect()
.CombineByKey(() =&gt; string.Empty, (x, y) =&gt; x + y.ToString(), (x, y) =&gt; x + y).Collect()
[('a', '11'), ('b', '1')]
</summary>
@ -618,7 +618,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.CombineByKey(() => string.Empty, (x, y) => x + y.ToString(), (x, y) => x + y).Collect()
.CombineByKey(() =&gt; string.Empty, (x, y) =&gt; x + y.ToString(), (x, y) =&gt; x + y).Collect()
[('a', 2), ('b', 1)]
</summary>
@ -646,7 +646,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.CombineByKey(() => string.Empty, (x, y) => x + y.ToString(), (x, y) => x + y).Collect()
.CombineByKey(() =&gt; string.Empty, (x, y) =&gt; x + y.ToString(), (x, y) =&gt; x + y).Collect()
[('a', 2), ('b', 1)]
</summary>
@ -674,7 +674,7 @@
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 1),
new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1)
}, 2)
.GroupByKey().MapValues(l => string.Join(" ", l)).Collect()
.GroupByKey().MapValues(l =&gt; string.Join(" ", l)).Collect()
[('a', [1, 1]), ('b', [1])]
@ -696,7 +696,7 @@
new <see cref="!:KeyValuePair&lt;string, string[]&gt;"/>("a", new[]{"apple", "banana", "lemon"}),
new <see cref="!:KeyValuePair&lt;string, string[]&gt;"/>("b", new[]{"grapes"})
}, 2)
.MapValues(x => x.Length).Collect()
.MapValues(x =&gt; x.Length).Collect()
[('a', 3), ('b', 1)]
@ -719,7 +719,7 @@
new <see cref="!:KeyValuePair&lt;string, string[]&gt;"/>("a", new[]{"x", "y", "z"}),
new <see cref="!:KeyValuePair&lt;string, string[]&gt;"/>("b", new[]{"p", "r"})
}, 2)
.FlatMapValues(x => x).Collect()
.FlatMapValues(x =&gt; x).Collect()
[('a', 'x'), ('a', 'y'), ('a', 'z'), ('b', 'p'), ('b', 'r')]
@ -775,7 +775,7 @@
var y = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 1), new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 4) }, 2);
var z = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("a", 2) }, 1);
var w = sc.Parallelize(new[] { new <see cref="!:KeyValuePair&lt;string, int&gt;"/>("b", 42) }, 1);
var m = x.GroupWith(y, z, w).MapValues(l => string.Join(" ", l.Item1) + " : " + string.Join(" ", l.Item2) + " : " + string.Join(" ", l.Item3) + " : " + string.Join(" ", l.Item4)).Collect();
var m = x.GroupWith(y, z, w).MapValues(l =&gt; string.Join(" ", l.Item1) + " : " + string.Join(" ", l.Item2) + " : " + string.Join(" ", l.Item3) + " : " + string.Join(" ", l.Item4)).Collect();
</summary>
<typeparam name="K"></typeparam>
<typeparam name="V"></typeparam>
@ -814,9 +814,9 @@
is done efficiently if the RDD has a known partitioner by only
searching the partition that the key maps to.
>>> l = range(1000)
>>> rdd = sc.Parallelize(Enumerable.Range(0, 1000).Zip(Enumerable.Range(0, 1000), (x, y) => new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(x, y)), 10)
>>> rdd.lookup(42)
&gt;&gt;&gt; l = range(1000)
&gt;&gt;&gt; rdd = sc.Parallelize(Enumerable.Range(0, 1000).Zip(Enumerable.Range(0, 1000), (x, y) =&gt; new <see cref="!:KeyValuePair&lt;int, int&gt;"/>(x, y)), 10)
&gt;&gt;&gt; rdd.lookup(42)
[42]
</summary>
@ -907,14 +907,6 @@
</summary>
</member>
<!-- Badly formed XML comment ignored for member "T:Microsoft.Spark.CSharp.Core.PipelinedRDD`1" -->
<member name="T:Microsoft.Spark.CSharp.Core.PipelinedRDD`1.MapPartitionsWithIndexHelper`2">
<summary>
This class is defined explicitly instead of using anonymous method as delegate to prevent C# compiler from generating
private anonymous type that is not serializable. Since the delegate has to be serialized and sent to the Spark workers
for execution, it is necessary to have the type marked [Serializable]. This class is to work around the limitation
on the serializability of compiler generated types
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.RDD`1">
<summary>
Represents a Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable,
@ -924,21 +916,6 @@
</summary>
<typeparam name="T">Type of the RDD</typeparam>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.RDD`1.IsCached">
<summary>
Return whether this RDD has been cached or not
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.RDD`1.IsCheckpointed">
<summary>
Return whether this RDD has been checkpointed or not
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.RDD`1.Name">
<summary>
Return the name of this RDD.
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Core.RDD`1.Cache">
<summary>
Persist this RDD with the default storage level <see cref="F:Microsoft.Spark.CSharp.Core.StorageLevelType.MEMORY_ONLY_SER"/>.
@ -980,7 +957,7 @@
<summary>
Return a new RDD by applying a function to each element of this RDD.
sc.Parallelize(new string[]{"b", "a", "c"}, 1).Map(x => new <see cref="!:KeyValuePair&lt;string, int&gt;"/>(x, 1)).Collect()
sc.Parallelize(new string[]{"b", "a", "c"}, 1).Map(x =&gt; new <see cref="!:KeyValuePair&lt;string, int&gt;"/>(x, 1)).Collect()
[('a', 1), ('b', 1), ('c', 1)]
</summary>
@ -1021,7 +998,7 @@
Return a new RDD by applying a function to each partition of this RDD,
while tracking the index of the original partition.
<see cref="!:sc.Parallelize(new int[]&lt;1, 2, 3, 4&gt;, 4).MapPartitionsWithIndex&lt;double&gt;"/>((pid, iter) => (double)pid).Sum()
<see cref="!:sc.Parallelize(new int[]&lt;1, 2, 3, 4&gt;, 4).MapPartitionsWithIndex&lt;double&gt;"/>((pid, iter) =&gt; (double)pid).Sum()
6
</summary>
<typeparam name="U"></typeparam>
@ -1489,7 +1466,7 @@
n is the number of partitions. So there may exist gaps, but this
method won't trigger a spark job, which is different from <see cref="M:Microsoft.Spark.CSharp.Core.RDD`1.ZipWithIndex"/>
>>> sc.Parallelize(new string[] { "a", "b", "c", "d" }, 1).ZipWithIndex().Collect()
&gt;&gt;&gt; sc.Parallelize(new string[] { "a", "b", "c", "d" }, 1).ZipWithIndex().Collect()
[('a', 0), ('b', 1), ('c', 4), ('d', 2), ('e', 5)]
</summary>
@ -1543,6 +1520,29 @@
<param name="seed">the seed for the Random number generator</param>
<returns>A random sub-sample of the RDD without replacement.</returns>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.RDD`1.IsCached">
<summary>
Return whether this RDD has been cached or not
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.RDD`1.IsCheckpointed">
<summary>
Return whether this RDD has been checkpointed or not
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.RDD`1.Name">
<summary>
Return the name of this RDD.
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.PipelinedRDD`1.MapPartitionsWithIndexHelper`2">
<summary>
This class is defined explicitly instead of using anonymous method as delegate to prevent C# compiler from generating
private anonymous type that is not serializable. Since the delegate has to be serialized and sent to the Spark workers
for execution, it is necessary to have the type marked [Serializable]. This class is to work around the limitation
on the serializability of compiler generated types
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Core.StringRDDFunctions">
<summary>
Some useful utility functions for <c>RDD{string}</c>
@ -1740,36 +1740,6 @@
<param name="key">Key to use</param>
<param name="defaultValue">Default value to use</param>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.Version">
<summary>
The version of Spark on which this application is running.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.StartTime">
<summary>
Return the epoch time when the Spark Context was started.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.DefaultParallelism">
<summary>
Default level of parallelism to use when not given by user (e.g. for reduce tasks)
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.DefaultMinPartitions">
<summary>
Default min number of partitions for Hadoop RDDs when not given by user
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.SparkUser">
<summary>
Get SPARK_USER for user who is running SparkContext.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.StatusTracker">
<summary>
Return :class:`StatusTracker` object
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Core.SparkContext.#ctor(Microsoft.Spark.CSharp.Proxy.ISparkContextProxy,Microsoft.Spark.CSharp.Core.SparkConf)">
<summary>
when created from checkpoint
@ -2083,6 +2053,36 @@
Cancel all jobs that have been scheduled or are running.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.Version">
<summary>
The version of Spark on which this application is running.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.StartTime">
<summary>
Return the epoch time when the Spark Context was started.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.DefaultParallelism">
<summary>
Default level of parallelism to use when not given by user (e.g. for reduce tasks)
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.DefaultMinPartitions">
<summary>
Default min number of partitions for Hadoop RDDs when not given by user
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.SparkUser">
<summary>
Get SPARK_USER for user who is running SparkContext.
</summary>
</member>
<member name="P:Microsoft.Spark.CSharp.Core.SparkContext.StatusTracker">
<summary>
Return :class:`StatusTracker` object
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Core.StatCounter.Merge(System.Double)">
<summary>
Add a value into this StatCounter, updating the internal statistics.
@ -2193,6 +2193,11 @@
Utility methods for C#-JVM interaction
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Interop.SparkCLREnvironment">
<summary>
Contains everything needed to setup an environment for using C# with Spark
</summary>
</member>
<!-- Badly formed XML comment ignored for member "T:Microsoft.Spark.CSharp.Interop.Ipc.IJvmBridge" -->
<!-- Badly formed XML comment ignored for member "T:Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge" -->
<member name="T:Microsoft.Spark.CSharp.Interop.Ipc.JvmObjectReference">
@ -2211,11 +2216,6 @@
</summary>
</member>
<!-- Badly formed XML comment ignored for member "T:Microsoft.Spark.CSharp.Interop.Ipc.SerDe" -->
<member name="T:Microsoft.Spark.CSharp.Interop.SparkCLREnvironment">
<summary>
Contains everything needed to setup an environment for using C# with Spark
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Proxy.Ipc.DataFrameIpcProxy.Intersect(Microsoft.Spark.CSharp.Proxy.IDataFrameProxy)">
<summary>
Call https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala, intersect(other: DataFrame): DataFrame
@ -3624,7 +3624,7 @@
<member name="M:Microsoft.Spark.CSharp.Sql.SqlContext.RegisterFunction``1(System.String,System.Func{``0})">
<summary>
Register UDF with no input argument, e.g:
<see cref="!:SqlContext.RegisterFunction&lt;bool&gt;"/>("MyFilter", () => true);
<see cref="!:SqlContext.RegisterFunction&lt;bool&gt;"/>("MyFilter", () =&gt; true);
sqlContext.Sql("SELECT * FROM MyTable where MyFilter()");
</summary>
<typeparam name="RT"></typeparam>
@ -3634,7 +3634,7 @@
<member name="M:Microsoft.Spark.CSharp.Sql.SqlContext.RegisterFunction``2(System.String,System.Func{``1,``0})">
<summary>
Register UDF with 1 input argument, e.g:
<see cref="!:SqlContext.RegisterFunction&lt;bool, string&gt;"/>("MyFilter", (arg1) => arg1 != null);
<see cref="!:SqlContext.RegisterFunction&lt;bool, string&gt;"/>("MyFilter", (arg1) =&gt; arg1 != null);
sqlContext.Sql("SELECT * FROM MyTable where MyFilter(columnName1)");
</summary>
<typeparam name="RT"></typeparam>
@ -3687,11 +3687,6 @@
</summary>
<typeparam name="T"></typeparam>
</member>
<member name="P:Microsoft.Spark.CSharp.Streaming.DStream`1.SlideDuration">
<summary>
Return the slideDuration in seconds of this DStream
</summary>
</member>
<member name="M:Microsoft.Spark.CSharp.Streaming.DStream`1.Count">
<summary>
Return a new DStream in which each RDD has a single element
@ -3949,6 +3944,11 @@
<param name="numPartitions">number of partitions of each RDD in the new DStream.</param>
<returns></returns>
</member>
<member name="P:Microsoft.Spark.CSharp.Streaming.DStream`1.SlideDuration">
<summary>
Return the slideDuration in seconds of this DStream
</summary>
</member>
<member name="T:Microsoft.Spark.CSharp.Streaming.MapPartitionsWithIndexHelper`2">
<summary>
Following classes are defined explicitly instead of using anonymous method as delegate to prevent C# compiler from generating

Просмотреть файл

@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
##<center><H1><font color="darkorchid4">SparkCLR API Documentation</font></H1></center>
##<center><H1><font color="darkorchid4">Mobius API Documentation</font></H1></center>
###<font color="#68228B">Microsoft.Spark.CSharp.Core.Accumulator</font>
####Summary

Просмотреть файл

@ -22,7 +22,7 @@ SparkCLRSamples.exe supports following options:
Example 1 - run default samples:
SparkCLRSamples.exe --temp C:\gitsrc\SparkCLR\run\Temp --data C:\gitsrc\SparkCLR\run\data
SparkCLRSamples.exe --temp C:\gitsrc\Mobius\run\Temp --data C:\gitsrc\Mobius\run\data
Example 2 - dryrun default samples:

Просмотреть файл

@ -4,8 +4,8 @@ This page documents the various steps required in order to contribute Mobius cod
### Overview
Generally, Mobius uses:
* [Github issues](https://github.com/Microsoft/SparkCLR/issues) to track logical issues, including bugs and improvements
* [Github pull requests](https://github.com/Microsoft/SparkCLR/pulls) to manage the *code review* and merge of *code changes*.
* [Github issues](https://github.com/Microsoft/Mobius/issues) to track logical issues, including bugs and improvements
* [Github pull requests](https://github.com/Microsoft/Mobius/pulls) to manage the *code review* and merge of *code changes*.
### Github Issues
[Issue Guide](issue-guide.md) explains Github labels used for managing Mobius *issues*. Even though Github allows only committers to apply labels, **Prefix titles with labels** section explains how to lable an *issue*. The steps below help you assess whether and how to file a *Github issue*, before a *pull request*:
@ -24,11 +24,11 @@ Generally, Mobius uses:
6. If the change is a major one, consider inviting discussion on the issue at *[sparkclr-dev](https://groups.google.com/d/forum/sparkclr-dev)* mailing list first before proceeding to implement the change. Note that a design doc helps the discussion and the review of *major* changes.
### Pull Request
1. Fork the Github repository at http://github.com/Microsoft/SparkCLR if you haven't already.
1. Fork the Github repository at http://github.com/Microsoft/Mobius if you haven't already.
2. Clone your fork, create a new dev branch, push commits to the dev branch.
3. Consider whether documentation or tests need to be added or updated as part of the change, and add them as needed (doc changes should be submitted along with code change in the same PR).
4. Run all tests and samples as described in the project's [README](../../README.md).
5. Open a *pull request* against the master branch of Microsoft/SparkCLR. (Only in special cases would the PR be opened against other branches.)
5. Open a *pull request* against the master branch of Microsoft/Mobius. (Only in special cases would the PR be opened against other branches.)
1. Always associate the PR with corresponding *Github issues* execpt for trial changes when no *Github issue* is created.
2. For trivial cases where an *Github issue* is not required, **MINOR:** or **HOTFIX:** can be used as the PR title prefix.
3. If the *pull request* is still a work in progress, not ready to be merged, but needs to be pushed to Github to facilitate review, then prefix the PR title with **[WIP]**.

Просмотреть файл

@ -17,37 +17,37 @@ Sometimes after debate, we'll decide an issue isn't a good fit for Mobius. In t
### Labels
We use GitHub labels to manage workflow on *issues*. We have the following categories per issue:
* **Area**: These labels call out the feature areas where the issue applies to.
* [RDD](https://github.com/Microsoft/SparkCLR/labels/RDD): Issues relating to RDD
* [DataFrame/SQL](https://github.com/Microsoft/SparkCLR/labels/DataFrame%2FSQL): Issues relating to DataFrame/SQL.
* [DataFrame UDF](https://github.com/Microsoft/SparkCLR/labels/DataFrame%20UDF): Issues relating to DataFrame UDF.
* [Streaming](https://github.com/Microsoft/SparkCLR/labels/Streaming): Issues relating to Streaming.
* [Job Submission](https://github.com/Microsoft/SparkCLR/labels/Job%20Submission): Issues relating to Job Submission.
* [Packaging](https://github.com/Microsoft/SparkCLR/labels/Packaging): Issues relating to packaging.
* [Deployment](https://github.com/Microsoft/SparkCLR/labels/Deployment): Issues relating to deployment.
* [Spark Compatibility](https://github.com/Microsoft/SparkCLR/labels/Spark%20Compatibility): Issues relating to supporting different/newer Apache Spark releases.
* [RDD](https://github.com/Microsoft/Mobius/labels/RDD): Issues relating to RDD
* [DataFrame/SQL](https://github.com/Microsoft/Mobius/labels/DataFrame%2FSQL): Issues relating to DataFrame/SQL.
* [DataFrame UDF](https://github.com/Microsoft/Mobius/labels/DataFrame%20UDF): Issues relating to DataFrame UDF.
* [Streaming](https://github.com/Microsoft/Mobius/labels/Streaming): Issues relating to Streaming.
* [Job Submission](https://github.com/Microsoft/Mobius/labels/Job%20Submission): Issues relating to Job Submission.
* [Packaging](https://github.com/Microsoft/Mobius/labels/Packaging): Issues relating to packaging.
* [Deployment](https://github.com/Microsoft/Mobius/labels/Deployment): Issues relating to deployment.
* [Spark Compatibility](https://github.com/Microsoft/Mobius/labels/Spark%20Compatibility): Issues relating to supporting different/newer Apache Spark releases.
* **Type**: These labels classify the type of issue. We use the following types:
* [documentation](https://github.com/Microsoft/SparkCLR/labels/documentation): Issues relating to documentation (e.g. incorrect documentation, enhancement requests)
* [debuggability/supportability](https://github.com/Microsoft/SparkCLR/labels/debuggability%2Fsupportability): Issues relating to making debugging and support easy. For instance, throwing meaningful errors when things fail.
* [user experience](https://github.com/Microsoft/SparkCLR/labels/user%20experience): Issues relating to making Mobius more user-friendly. For instance, improving the first time user experience, helping run Mobius on a single node or cluster mode etc.
* [bug](https://github.com/Microsoft/SparkCLR/labels/bug).
* [enhancement](https://github.com/Microsoft/SparkCLR/labels/enhancement): Issues related to improving existing implementations.
* [test bug](https://github.com/Microsoft/SparkCLR/labels/test%20bug): Issues related to invalid or missing tests/unit tests.
* [design change request](https://github.com/Microsoft/SparkCLR/labels/design%20change%20request): Alternative design change suggestions.
* [suggestion](https://github.com/Microsoft/SparkCLR/labels/suggestion): Feature or API suggestions.
* [documentation](https://github.com/Microsoft/Mobius/labels/documentation): Issues relating to documentation (e.g. incorrect documentation, enhancement requests)
* [debuggability/supportability](https://github.com/Microsoft/Mobius/labels/debuggability%2Fsupportability): Issues relating to making debugging and support easy. For instance, throwing meaningful errors when things fail.
* [user experience](https://github.com/Microsoft/Mobius/labels/user%20experience): Issues relating to making Mobius more user-friendly. For instance, improving the first time user experience, helping run Mobius on a single node or cluster mode etc.
* [bug](https://github.com/Microsoft/Mobius/labels/bug).
* [enhancement](https://github.com/Microsoft/Mobius/labels/enhancement): Issues related to improving existing implementations.
* [test bug](https://github.com/Microsoft/Mobius/labels/test%20bug): Issues related to invalid or missing tests/unit tests.
* [design change request](https://github.com/Microsoft/Mobius/labels/design%20change%20request): Alternative design change suggestions.
* [suggestion](https://github.com/Microsoft/Mobius/labels/suggestion): Feature or API suggestions.
* **Ownership**: These labels are used to specify who owns specific issue. Issues without an ownership tag are still considered "up for discussion" and haven't been approved yet. We have the following different types of ownership:
* [up for grabs](https://github.com/Microsoft/SparkCLR/labels/up%20for%20grabs): Small sections of work which we believe are well scoped. These sorts of issues are a good place to start if you are new. Anyone is free to work on these issues.
* [feature approved](https://github.com/Microsoft/SparkCLR/labels/feature%20approved): Larger scale issues having priority and the design approved, anyone is free to work on these issues, but they may be trickier or require more work.
* [grabbed by assignee](https://github.com/Microsoft/SparkCLR/labels/grabbed%20by%20assignee): the person the issue is assigned to is making a fix.
* [up for grabs](https://github.com/Microsoft/Mobius/labels/up%20for%20grabs): Small sections of work which we believe are well scoped. These sorts of issues are a good place to start if you are new. Anyone is free to work on these issues.
* [feature approved](https://github.com/Microsoft/Mobius/labels/feature%20approved): Larger scale issues having priority and the design approved, anyone is free to work on these issues, but they may be trickier or require more work.
* [grabbed by assignee](https://github.com/Microsoft/Mobius/labels/grabbed%20by%20assignee): the person the issue is assigned to is making a fix.
* **Project Management**: These labels are used to communicate task status. Issues/tasks without a Project Management tag are still considered as "pendig/under triage".
* [0 - Backlog](https://github.com/Microsoft/SparkCLR/labels/0%20-%20Backlog): Tasks that are not yet ready for development or are not yet prioritized for the current milestone.
* [1 - Up Next](https://github.com/Microsoft/SparkCLR/labels/1%20-%20Up%20Next): Tasks that are ready for development and prioritized above the rest of the backlog.
* [2 - In Progress](https://github.com/Microsoft/SparkCLR/labels/2%20-%20In%20Progress): Tasks that are under active development.
* [3 - Done](https://github.com/Microsoft/SparkCLR/labels/3%20-%20Done): Tasks that are finished. There should be no open issues in the Done stage.
* [0 - Backlog](https://github.com/Microsoft/Mobius/labels/0%20-%20Backlog): Tasks that are not yet ready for development or are not yet prioritized for the current milestone.
* [1 - Up Next](https://github.com/Microsoft/Mobius/labels/1%20-%20Up%20Next): Tasks that are ready for development and prioritized above the rest of the backlog.
* [2 - In Progress](https://github.com/Microsoft/Mobius/labels/2%20-%20In%20Progress): Tasks that are under active development.
* [3 - Done](https://github.com/Microsoft/Mobius/labels/3%20-%20Done): Tasks that are finished. There should be no open issues in the Done stage.
* **Review Status**: These labels are used to indicate that the issue/bug cannot be worked on after review. Issues without Review Status, Project Management or Ownership tags are ones pending reviews.
* [duplicate](https://github.com/Microsoft/SparkCLR/labels/duplicate): Issues/bugs are duplicates of ones submitted already.
* [invalid](https://github.com/Microsoft/SparkCLR/labels/invalid): Issues/bugs are unrelated to Mobius.
* [wontfix](https://github.com/Microsoft/SparkCLR/labels/wontfix): Issues/bugs are considered as limitations that will not be fixed.
* [needs more info](https://github.com/Microsoft/SparkCLR/labels/needs%20more%20info): Issues/bugs need more information. Usually this indicates we can't reproduce a reported bug. We'll close these issues after a little while if we haven't gotten actionable information, but we welcome folks who have acquired more information to reopen the issue.
* [duplicate](https://github.com/Microsoft/Mobius/labels/duplicate): Issues/bugs are duplicates of ones submitted already.
* [invalid](https://github.com/Microsoft/Mobius/labels/invalid): Issues/bugs are unrelated to Mobius.
* [wontfix](https://github.com/Microsoft/Mobius/labels/wontfix): Issues/bugs are considered as limitations that will not be fixed.
* [needs more info](https://github.com/Microsoft/Mobius/labels/needs%20more%20info): Issues/bugs need more information. Usually this indicates we can't reproduce a reported bug. We'll close these issues after a little while if we haven't gotten actionable information, but we welcome folks who have acquired more information to reopen the issue.
In addition to the above, we may introduce new labels to help classify our issues. Some of these tag may cross cutting concerns (e.g. *performance*, *serialization impact*), where as others are used to help us track additional work needed before closing an issue (e.g. *needs design review*).

Просмотреть файл

@ -20,7 +20,7 @@ Consider these options:
Contributing Code Changes
-------------------------
If you are looking for something to work on, the list of [up-for-grabs issues](https://github.com/Microsoft/SparkCLR/labels/up%20for%20grabs) is a good starting point.
If you are looking for something to work on, the list of [up-for-grabs issues](https://github.com/Microsoft/Mobius/labels/up%20for%20grabs) is a good starting point.
Before opening a *pull request*, review [Contributing Code Changes](/docs/project-docs/CONTRIBUTING.md).
It lists steps that are required before creating a PR. In particular, consider:

Просмотреть файл

@ -14,32 +14,32 @@ The following environment variables should be set properly:
## Instructions
* With `JAVA_HOME` set properly, navigate to [SparkCLR/build](../build) directory:
* With `JAVA_HOME` set properly, navigate to [Mobius/build](../build) directory:
```
./build.sh
```
* Optional:
- Under [SparkCLR/scala](../scala) directory, run the following command to clean spark-clr*.jar built above:
- Under [Mobius/scala](../scala) directory, run the following command to clean spark-clr*.jar built above:
```
mvn clean
```
- Under [SparkCLR/csharp](../csharp) directory, run the following command to clean the .NET binaries built above:
- Under [Mobius/csharp](../csharp) directory, run the following command to clean the .NET binaries built above:
```
./clean.sh
```
[build.sh](../build/build.sh) prepares the following directories under `SparkCLR\build\runtime` after the build is done:
[build.sh](../build/build.sh) prepares the following directories under `Mobius\build\runtime` after the build is done:
* **lib** ( `spark-clr*.jar` )
* **bin** ( `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`)
* **samples** ( The contents of `SparkCLR/csharp/Samples/Microsoft.Spark.CSharp/bin/Release/*`, including `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`, `SparkCLRSamples.exe`, `SparkCLRSamples.exe.Config` etc. )
* **samples** ( The contents of `Mobius/csharp/Samples/Microsoft.Spark.CSharp/bin/Release/*`, including `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`, `SparkCLRSamples.exe`, `SparkCLRSamples.exe.Config` etc. )
* **scripts** ( `sparkclr-submit.sh` )
* **data** ( `SparkCLR/csharp/Samples/Microsoft.Spark.CSharp/data/*` )
* **data** ( `Mobius/csharp/Samples/Microsoft.Spark.CSharp/data/*` )
# Running Samples
@ -52,7 +52,7 @@ JDK is installed, and the following environment variables should be set properly
## Running in Local mode
With `JAVA_HOME` set properly, navigate to [SparkCLR\build\localmode](../build/localmode) directory:
With `JAVA_HOME` set properly, navigate to [Mobius\build\localmode](../build/localmode) directory:
```
./run-samples.sh
@ -62,7 +62,7 @@ It is **required** to run [build.sh](../build/build.sh) prior to running [run-sa
**Note that Mobius requires a customized Apache Spark for use in Linux** (see [linux-compatibility.md](./linux-compatibility.md) for details).
[run-samples.sh](../build/localmode/run-samples.sh) downloads Apache Spark 1.6.0 and builds a customized version of Spark, sets up `SPARK_HOME` environment variable, points `SPARKCLR_HOME` to `SparkCLR/build/runtime` directory created by [build.sh](../build/build.sh), and invokes [sparkclr-submit.sh](../scripts/sparkclr-submit.sh), with `spark.local.dir` set to `SparkCLR/build/runtime/Temp`.
[run-samples.sh](../build/localmode/run-samples.sh) downloads Apache Spark 1.6.0 and builds a customized version of Spark, sets up `SPARK_HOME` environment variable, points `SPARKCLR_HOME` to `Mobius/build/runtime` directory created by [build.sh](../build/build.sh), and invokes [sparkclr-submit.sh](../scripts/sparkclr-submit.sh), with `spark.local.dir` set to `Mobius/build/runtime/Temp`.
A few more [run-samples.sh](../build/localmode/run-samples.sh) examples:
- To display all options supported by [run-samples.sh](../build/localmode/run-samples.sh):
@ -98,7 +98,7 @@ sparkclr-submit.sh --verbose --master yarn-cluster --exe SparkCLRSamples.exe $SP
# Running Unit Tests
* Install NUnit Runner 3.0 or above using NuGet (see [https://www.nuget.org/packages/NUnit.Runners/](https://www.nuget.org/packages/NUnit.Runners/)), set `NUNITCONSOLE` to the path to nunit console, navigate to `SparkCLR/csharp` and run the following command:
* Install NUnit Runner 3.0 or above using NuGet (see [https://www.nuget.org/packages/NUnit.Runners/](https://www.nuget.org/packages/NUnit.Runners/)), set `NUNITCONSOLE` to the path to nunit console, navigate to `Mobius/csharp` and run the following command:
```
./test.sh
```

Просмотреть файл

@ -6,7 +6,7 @@ The following software need to be installed and appropriate environment variable
|JDK |7u85 or 8u60 ([OpenJDK](http://www.azul.com/downloads/zulu/zulu-windows/) or [Oracle JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html)) |JAVA_HOME | After setting JAVA_HOME, run `set PATH=%PATH%;%JAVA_HOME%\bin` to add java to PATH |
|Spark |[1.5.2 or 1.6.0](http://spark.apache.org/downloads.html) | SPARK_HOME |Spark can be downloaded from Spark download website. Alternatively, if you used [`RunSamples.cmd`](../csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md) to run Mobius samples, you can find `toos\spark*` directory (under [`build`](../build) directory) that can be used as SPARK_HOME |
|winutils.exe | see [Running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems) for details |HADOOP_HOME |Spark in Windows needs this utility in `%HADOOP_HOME%\bin` directory. It can be copied over from any Hadoop distribution. Alternative, if you used [`RunSamples.cmd`](../csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md) to run Mobius samples, you can find `toos\winutils` directory (under [`build`](../build) directory) that can be used as HADOOP_HOME |
|Mobius |[v1.5.200](https://github.com/Microsoft/SparkCLR/releases) or v1.6.000-SNAPSHOT | SPARKCLR_HOME |If you downloaded a [Mobius release](https://github.com/Microsoft/SparkCLR/releases), SPARKCLR_HOME should be set to the directory named `runtime` (for example, `D:\downloads\spark-clr_2.10-1.5.200\runtime`). Alternatively, if you used [`RunSamples.cmd`](../csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md) to run Mobius samples, you can find `runtime` directory (under [`build`](../build) directory) that can be used as SPARKCLR_HOME. **Note** - setting SPARKCLR_HOME is _optional_ and it is set by sparkclr-submit.cmd if not set. |
|Mobius |[v1.5.200](https://github.com/Microsoft/Mobius/releases) or v1.6.000-SNAPSHOT | SPARKCLR_HOME |If you downloaded a [Mobius release](https://github.com/Microsoft/Mobius/releases), SPARKCLR_HOME should be set to the directory named `runtime` (for example, `D:\downloads\spark-clr_2.10-1.5.200\runtime`). Alternatively, if you used [`RunSamples.cmd`](../csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md) to run Mobius samples, you can find `runtime` directory (under [`build`](../build) directory) that can be used as SPARKCLR_HOME. **Note** - setting SPARKCLR_HOME is _optional_ and it is set by sparkclr-submit.cmd if not set. |
## Windows Instructions
### Local Mode
@ -44,7 +44,7 @@ Debug mode is used to step through the C# code in Visual Studio during a debuggi
### Standalone Cluster
#### Client Mode
Mobius `runtime` folder and the build output of Mobius driver application must be copied over to the machine where you submit Mobius apps to a Spark Standalone cluster. Once copying is done, instructions are same as that of [localmode](RunningSparkCLRApp.md#local-mode) but specifying master URL (`--master <spark://host:port>`) is required in addition.
Mobius `runtime` folder and the build output of Mobius driver application must be copied over to the machine where you submit Mobius apps to a Spark Standalone cluster. Once copying is done, instructions are same as that of [localmode](running-mobius-app.md#local-mode) but specifying master URL (`--master <spark://host:port>`) is required in addition.
**Sample Commands**
* `sparkclr-submit.cmd` `--master spark://93.184.216.34:7077` `--total-executor-cores 2` `--exe SparkClrPi.exe C:\Git\Mobius\examples\Pi\bin\Debug`

Просмотреть файл

@ -13,32 +13,32 @@ JDK should be downloaded manually, and the following environment variables shoul
## Instructions
* In the Developer Command Prompt for Visual Studio where `JAVA_HOME` is set properly, navigate to [SparkCLR\build](../build/) directory:
* In the Developer Command Prompt for Visual Studio where `JAVA_HOME` is set properly, navigate to [Mobius\build](../build/) directory:
```
Build.cmd
```
* Optional:
- Under [SparkCLR\scala](../scala) directory, run the following command to clean spark-clr*.jar built above:
- Under [Mobius\scala](../scala) directory, run the following command to clean spark-clr*.jar built above:
```
mvn clean
```
- Under [SparkCLR\csharp](../csharp) directory, run the following command to clean the .NET binaries built above:
- Under [Mobius\csharp](../csharp) directory, run the following command to clean the .NET binaries built above:
```
Clean.cmd
```
[Build.cmd](../build/Build.cmd) downloads necessary build tools; after the build is done, it prepares the folowing directories under `SparkCLR\build\runtime`:
[Build.cmd](../build/Build.cmd) downloads necessary build tools; after the build is done, it prepares the folowing directories under `Mobius\build\runtime`:
* **lib** ( `spark-clr*.jar` )
* **bin** ( `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`)
* **samples** ( The contents of `SparkCLR\csharp\Samples\Microsoft.Spark.CSharp\bin\Release\*`, including `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`, `SparkCLRSamples.exe`, `SparkCLRSamples.exe.Config` etc. )
* **samples** ( The contents of `Mobius\csharp\Samples\Microsoft.Spark.CSharp\bin\Release\*`, including `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`, `SparkCLRSamples.exe`, `SparkCLRSamples.exe.Config` etc. )
* **scripts** ( `sparkclr-submit.cmd` )
* **data** ( `SparkCLR\csharp\Samples\Microsoft.Spark.CSharp\data\*` )
* **data** ( `Mobius\csharp\Samples\Microsoft.Spark.CSharp\data\*` )
# Running Samples
@ -50,7 +50,7 @@ JDK should be downloaded manually, and the following environment variables shoul
## Running in Local mode
In the Developer Command Prompt for Visual Studio where `JAVA_HOME` is set properly, navigate to [SparkCLR\build](../build/) directory:
In the Developer Command Prompt for Visual Studio where `JAVA_HOME` is set properly, navigate to [Mobius\build](../build/) directory:
```
RunSamples.cmd
@ -58,7 +58,7 @@ RunSamples.cmd
It is **required** to run [Build.cmd](../build/Build.cmd) prior to running [RunSamples.cmd](../build/RunSamples.cmd).
[RunSamples.cmd](../build/localmode/RunSamples.cmd) downloads Apache Spark 1.6.0, sets up `SPARK_HOME` environment variable, points `SPARKCLR_HOME` to `SparkCLR\build\runtime` directory created by [Build.cmd](../build/Build.cmd), and invokes [sparkclr-submit.cmd](../scripts/sparkclr-submit.cmd), with `spark.local.dir` set to `SparkCLR\build\runtime\Temp`.
[RunSamples.cmd](../build/localmode/RunSamples.cmd) downloads Apache Spark 1.6.0, sets up `SPARK_HOME` environment variable, points `SPARKCLR_HOME` to `Mobius\build\runtime` directory created by [Build.cmd](../build/Build.cmd), and invokes [sparkclr-submit.cmd](../scripts/sparkclr-submit.cmd), with `spark.local.dir` set to `Mobius\build\runtime\Temp`.
A few more [RunSamples.cmd](../build/localmode/RunSamples.cmd) examples:
- To display all options supported by [RunSamples.cmd](../build/localmode/RunSamples.cmd):
@ -82,21 +82,21 @@ A few more [RunSamples.cmd](../build/localmode/RunSamples.cmd) examples:
## Running in Standalone mode
```
sparkclr-submit.cmd --verbose --master spark://host:port --exe SparkCLRSamples.exe %SPARKCLR_HOME%\samples sparkclr.sampledata.loc hdfs://path/to/sparkclr/sampledata
sparkclr-submit.cmd --verbose --master spark://host:port --exe SparkCLRSamples.exe %SPARKCLR_HOME%\samples sparkclr.sampledata.loc hdfs://path/to/mobius/sampledata
```
- When option `--deploy-mode` is specified with `cluster`, option `--remote-sparkclr-jar` is required and needs to be specified with a valid file path of spark-clr*.jar on HDFS.
## Running in YARN mode
```
sparkclr-submit.cmd --verbose --master yarn-cluster --exe SparkCLRSamples.exe %SPARKCLR_HOME%\samples sparkclr.sampledata.loc hdfs://path/to/sparkclr/sampledata
sparkclr-submit.cmd --verbose --master yarn-cluster --exe SparkCLRSamples.exe %SPARKCLR_HOME%\samples sparkclr.sampledata.loc hdfs://path/to/mobius/sampledata
```
# Running Unit Tests
* In Visual Studio: Install NUnit3 Test Adapter. Run the tests through "Test" -> "Run" -> "All Tests"
* Install NUnit Runner 3.0 or above using NuGet (see [https://www.nuget.org/packages/NUnit.Runners/](https://www.nuget.org/packages/NUnit.Runners/)). In Developer Command Prompt for VS, set `NUNITCONSOLE` to the path to nunit console, and navigate to `SparkCLR\csharp` and run the following command:
* Install NUnit Runner 3.0 or above using NuGet (see [https://www.nuget.org/packages/NUnit.Runners/](https://www.nuget.org/packages/NUnit.Runners/)). In Developer Command Prompt for VS, set `NUNITCONSOLE` to the path to nunit console, and navigate to `Mobius\csharp` and run the following command:
```
Test.cmd
```