fix format in readme, incorproate feedback

2015-11-03 09:42:48 -08:00 · 2015-11-03 09:42:48 -08:00 · 431892a33d
--- a/README.md
+++ b/README.md
@ -2,41 +2,44 @@
 SparkCLR (pronounced sparkler) adds C# language binding to Apache Spark enabling the implementation of Spark driver code and data processing operations in C#.
 For example, the word count sample in Apache Spark can be implemented in C# as follows  
 ```c#
-    var lines = sparkContext.TextFile(@"hdfs://path/to/input.txt");  
-    var words = lines.FlatMap(s => s.Split(new[] { " " }, StringSplitOptions.None));
-    var wordCounts = words.Map(w => new KeyValuePair<string, int>(w.Trim(), 1))  
-                                    .ReduceByKey((x, y) => x + y);  
-    var wordCountCollection = wordCounts.Collect();  
-    wordCounts.SaveAsTextFile(@"hdfs://path/to/wordcount.txt");  
+var lines = sparkContext.TextFile(@"hdfs://path/to/input.txt");  
+var words = lines.FlatMap(s => s.Split(new[] { " " }, StringSplitOptions.None));
+var wordCounts = words.Map(w => new KeyValuePair<string, int>(w.Trim(), 1))  
+                      .ReduceByKey((x, y) => x + y);  
+var wordCountCollection = wordCounts.Collect();  
+wordCounts.SaveAsTextFile(@"hdfs://path/to/wordcount.txt");  
 ```
 A simple DataFrame application using TempTable may look like the following
 ```c#
-    var requestsDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv");
-    var metricsDateFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv");
-    requestsDataFrame.RegisterTempTable("requests");
-    metricsDateFrame.RegisterTempTable("metrics");
-    // C0 - guid in requests DF, C3 - guid in metrics DF  
-    var join = GetSqlContext().Sql(  
-        "SELECT joinedtable.datacenter, MAX(joinedtable.latency) maxlatency, AVG(joinedtable.latency) avglatency " +
-        "FROM (SELECT a.C1 as datacenter, b.C6 as latency " +  
-               "FROM requests a JOIN metrics b ON a.C0  = b.C3) joinedtable " +   
-        "GROUP BY datacenter");
-	join.ShowSchema();
-	join.Show();
+var requestDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv");
+var metricsDateFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv");
+requestDataFrame.RegisterTempTable("requests");
+metricsDateFrame.RegisterTempTable("metrics");
+// C0 - guid in requests DF, C3 - guid in metrics DF  
+var join = GetSqlContext().Sql(  
+    "SELECT joinedtable.datacenter" +
+         ", MAX(joinedtable.latency) maxlatency" +
+         ", AVG(joinedtable.latency) avglatency " + 
+    "FROM (" +
+       "SELECT a.C1 as datacenter, b.C6 as latency " +  
+       "FROM requests a JOIN metrics b ON a.C0  = b.C3) joinedtable " +   
+       "GROUP BY datacenter");
+join.ShowSchema();
+join.Show();
 ```
 A simple DataFrame application using DataFrame DSL may look like the following
 ```  c#
-    // C0 - guid, C1 - datacenter
-    var requestsDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv")  
-                                      .Select("C0", "C1");    
-    // C3 - guid, C6 - latency   
-    var metricsDateFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv", ",", false, true)
-                                     .Select("C3", "C6"); //override delimiter, hasHeader & inferSchema
-    var joinDataFrame = requestsDataFrame.Join(metricsDateFrame, requestsDataFrame["C0"] == metricsDateFrame["C3"])
-                                         .GroupBy("C1");
-    var maxLatencyByDcDataFrame = joinDataFrame.Agg(new Dictionary<string, string> { { "C6", "max" } });
-    maxLatencyByDcDataFrame.ShowSchema();
-    maxLatencyByDcDataFrame.Show();
+// C0 - guid, C1 - datacenter
+var requestDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv")  
+                                 .Select("C0", "C1");    
+// C3 - guid, C6 - latency   
+var metricsDateFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv", ",", false, true)
+                                 .Select("C3", "C6"); //override delimiter, hasHeader & inferSchema
+var joinDataFrame = requestDataFrame.Join(metricsDateFrame, requestDataFrame["C0"] == metricsDateFrame["C3"])
+                                    .GroupBy("C1");
+var maxLatencyByDcDataFrame = joinDataFrame.Agg(new Dictionary<string, string> { { "C6", "max" } });
+maxLatencyByDcDataFrame.ShowSchema();
+maxLatencyByDcDataFrame.Show();
 ```
 Refer to SparkCLR\csharp\Samples directory for complete samples

@ -52,18 +55,22 @@ Refer to the docs @ https://github.com/Microsoft/SparkCLR/tree/master/docs

 ### Instructions
 * Navigate to SparkCLR\scala directory and run the following command to build spark-clr*.jar   
-```Batchfile
-mvn package
-```
-* Start Developer Command Prompt for Visual Studio, navigate to SparkCLR\csharp directory, run the following commands to add nuget.exe to the path and build the rest of .Net binaries  
-```Batchfile
-set PATH=<fullpath to nuget.exe>;%PATH%  
-build.cmd
-```
-* Under SparkCLR|csharp directory, run the following command to clean the .NET binaries built above  
-```Batchfile
-clean.cmd
-```   
+        ```
+        mvn package
+        ```
+* Start Developer Command Prompt for Visual Studio, navigate to SparkCLR\csharp directory, run the following commands to add nuget.exe to the path  
+        ```  
+        set PATH=<fullpath to nuget.exe>;%PATH%  
+        ```  
+        And build the rest of .Net binaries  
+        ```  
+        build.cmd  
+        ```  
+* Under SparkCLR\csharp directory, run the following command to clean the .NET binaries built above  
+        ```
+        clean.cmd
+        ```   
+
 ## Running Samples
 ### Prerequisites
 Set the following environment variables  
@ -81,21 +88,15 @@ Directory pointed by ```SPARKCLR_HOME``` should have the following directories a

 ### Running in Local mode
 Set ```CSharpWorkerPath``` in SparkCLRSamples.exe.config and run the following. Note that SparkCLR jar version (**1.4.1**) should be aligned with Apache Spark version.  
-
-```Batchfile
-sparkclr-submit.cmd --verbose D:\SparkCLRHome\lib\spark-clr-1.4.1-SNAPSHOT.jar^ 
-    D:\SparkCLRHome\SparkCLRSamples.exe^  
-    spark.local.dir D:\temp\SparkCLRTemp^   
-    sparkclr.sampledata.loc D:\SparkCLRHome\data
+```
+sparkclr-submit.cmd --verbose D:\SparkCLRHome\lib\spark-clr-1.4.1-SNAPSHOT.jar D:\SparkCLRHome\SparkCLRSamples.exe spark.local.dir D:\temp\SparkCLRTemp sparkclr.sampledata.loc D:\SparkCLRHome\data
 ```   

 Setting spark.local.dir parameter is optional and it is useful if local setup of Spark uses %TEMP% directory in windows to which adding SparkCLR driver exe file may cause problems (AV programs might automatically delete executables placed in these directories)

 ### Running in Standalone cluster mode
-```Batchfile
-sparkclr-submit.cmd --verbose D:\SparkCLRHome\lib\spark-clr-1.4.1-SNAPSHOT.jar^ 
-    D:\SparkCLRHome\SparkCLRSamples.exe^ 
-    sparkclr.sampledata.loc hdfs://path/to/sparkclr/sampledata
+```
+sparkclr-submit.cmd --verbose D:\SparkCLRHome\lib\spark-clr-1.4.1-SNAPSHOT.jar D:\SparkCLRHome\SparkCLRSamples.exe sparkclr.sampledata.loc hdfs://path/to/sparkclr/sampledata
 ```

 ### Running in YARN mode
@ -104,9 +105,9 @@ To be added
 ## Running Unit Tests
 * In Visual Studio: "Test" -> "Run" -> "All Tests"
 * In Developer Command Prompt for VS, navigate to SparkCLR\csharp and run the following command  
-```Batchfile
-test.cmd
-```
+        ```
+        test.cmd
+        ```

 ## Debugging Tips
 CSharpBackend and C# driver are separately launched for debugging SparkCLR Adapter or driver