Smooth out some formatting wrinkles in README.

2015-11-05 23:58:18 -08:00 · 2015-11-05 23:58:18 -08:00 · c5acaa99da
--- a/README.md
+++ b/README.md
@ -1,6 +1,9 @@
 # SparkCLR
-SparkCLR (pronounced sparkler) adds C# language binding to Apache Spark enabling the implementation of Spark driver code and data processing operations in C#.
-For example, the word count sample in Apache Spark can be implemented in C# as follows  
+
+[SparkCLR](https://github.com/Microsoft/SparkCLR) (pronounced Sparkler) adds C# language binding to [Apache Spark](https://spark.apache.org/), enabling the implementation of Spark driver code and data processing operations in C#.
+
+For example, the word count sample in Apache Spark can be implemented in C# as follows :
+
 ```c#
 var lines = sparkContext.TextFile(@"hdfs://path/to/input.txt");  
 var words = lines.FlatMap(s => s.Split(new[] { " " }, StringSplitOptions.None));
@ -9,7 +12,9 @@ var wordCounts = words.Map(w => new KeyValuePair<string, int>(w.Trim(), 1))
 var wordCountCollection = wordCounts.Collect();  
 wordCounts.SaveAsTextFile(@"hdfs://path/to/wordcount.txt");  
 ```
-A simple DataFrame application using TempTable may look like the following
+
+A simple DataFrame application using TempTable may look like the following:
+
 ```c#
 var reqDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv");
 var metricDataFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv");
@ -27,7 +32,9 @@ var joinDataFrame = GetSqlContext().Sql(
 joinDataFrame.ShowSchema();
 joinDataFrame.Show();
 ```
-A simple DataFrame application using DataFrame DSL may look like the following
+
+A simple DataFrame application using DataFrame DSL may look like the following:
+
 ```  c#
 // C0 - guid, C1 - datacenter
 var reqDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv")  
@ -41,59 +48,86 @@ var maxLatencyByDcDataFrame = joinDataFrame.Agg(new Dictionary<string, string> {
 maxLatencyByDcDataFrame.ShowSchema();
 maxLatencyByDcDataFrame.Show();
 ```
-Refer to `SparkCLR\csharp\Samples` directory for complete samples
+
+Refer to `SparkCLR\csharp\Samples` directory for complete samples.

 ## Documents
-Refer to the docs @ https://github.com/Microsoft/SparkCLR/tree/master/docs
+
+Refer to the [docs folder](https://github.com/Microsoft/SparkCLR/tree/master/docs).

 ## Building SparkCLR
+
 ### Prerequisites
-* [Apache Maven](http://maven.apache.org) for spark-clr project implemented in scala
-* MSBuild in Visual Studio 2013 and above
-* .NET Framework 4.5 and above
-* [Nuget command-line utility](https://docs.nuget.org/release-notes) 3.2 and above
+
+* [Apache Maven](http://maven.apache.org) for spark-clr project implemented in Scala.
+* MSBuild in [Visual Studio](https://www.visualstudio.com/) 2013 and above.
+* .NET Framework 4.5 and above.
+* [Nuget command-line utility](https://docs.nuget.org/release-notes) 3.2 and above.

 ### Instructions
-* Navigate to `SparkCLR\scala` directory and run the following command to build spark-clr*.jar   
-        ```
-        mvn package
-        ```
-* Start Developer Command Prompt for Visual Studio, navigate to `SparkCLR\csharp` directory, run the following commands to add `nuget.exe` to the path  
-        ```  
-        set PATH=<fullpath to nuget.exe>;%PATH%  
-        ```  
-        And build the rest of .Net binaries  
-        ```  
-        build.cmd  
-        ```  
-* Optional. Under `SparkCLR\csharp` directory, run the following command to clean the .NET binaries built above  
-        ```
-        clean.cmd
-        ```   
+
+* Navigate to `SparkCLR\scala` directory and run the following command to build spark-clr*.jar
+
+	```
+	mvn package
+	```
+
+* Start Developer Command Prompt for Visual Studio, and navigate to `SparkCLR\csharp` directory.
+
+	- If `nuget.exe` is not already in your PATH, then run the following commands to add it.
+
+		```  
+		set PATH=<fullpath to nuget.exe>;%PATH%  
+		```
+  
+	- Then  build the rest of the .NET binaries  
+
+		```  
+		Build.cmd  
+		```
+  
+* Optional: Under `SparkCLR\csharp` directory, run the following command to clean the .NET binaries built above  
+
+    ```
+    Clean.cmd
+    ```   

 ## Running Samples
+
 ### Prerequisites
-DataFrame TextFile API uses `spark-csv` package to load data from CSV file. Latest [commons-csv-*.jar](http://commons.apache.org/proper/commons-csv/download_csv.cgi) and [spark-csv*.jar (Scala version:2.10)](http://spark-packages.org/package/databricks/spark-csv) should be downloaded manually.
+
+DataFrame TextFile API uses `spark-csv` package to load data from CSV file. 
+Latest [commons-csv-*.jar](http://commons.apache.org/proper/commons-csv/download_csv.cgi) and [spark-csv*.jar (Scala version:2.10)](http://spark-packages.org/package/databricks/spark-csv) should be downloaded manually.

 The following environment variables should be set properly:
+
 * ```JAVA_HOME```  
 * ```SCALA_HOME```  
-* ```SPARKCSV_JARS``` should include fullpaths to `commons-csv*.jar` and `spark-csv*.jar`. For example:     
-    ```
-    set SPARKCSV_JARS=%SPARKCLR_HOME%\lib\commons-csv-1.2.jar;%SPARKCLR_HOME%\lib\spark-csv_2.10-1.2.0.jar
-    ```
-* ```SPARKCLR_HOME``` should point to a directory prapared with following subdirectories:  
-  * **lib** (`spark-clr*.jar`)  
-  * **bin** (`SparkCLR\csharp\Samples\Microsoft.Spark.CSharp\bin\[Debug|Release]\*`, including `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`, `SparkCLRSamples.exe`, `SparkCLRSamples.exe.Config` and etc.)  
-  * **scripts** (`sparkclr-submit.cmd`)  
-  * **data** (`SparkCLR\csharp\Samples\Microsoft.Spark.CSharp\data\*`)  
+
+* ```SPARKCSV_JARS``` should include full paths to `commons-csv*.jar` and `spark-csv*.jar`. 
+
+	For example:     
+	```
+	set SPARKCSV_JARS=%SPARKCLR_HOME%\lib\commons-csv-1.2.jar;%SPARKCLR_HOME%\lib\spark-csv_2.10-1.2.0.jar
+	```
+
+* ```SPARKCLR_HOME``` should point to a directory prepared with following sub-directories:  
+
+  * **lib** ( `spark-clr*.jar` )  
+  * **bin** ( The contents of `SparkCLR\csharp\Samples\Microsoft.Spark.CSharp\bin\[Debug|Release]\*`, including `Microsoft.Spark.CSharp.Adapter.dll`, `CSharpWorker.exe`, `SparkCLRSamples.exe`, `SparkCLRSamples.exe.Config` etc. )  
+  * **scripts** ( `sparkclr-submit.cmd` )  
+  * **data** ( `SparkCLR\csharp\Samples\Microsoft.Spark.CSharp\data\*` )  

 ### Running in Local mode
-Set `CSharpWorkerPath` in `SparkCLRSamples.exe.config` and run the following. Note that SparkCLR jar version (**1.4.1**) should be aligned with Apache Spark version.  
+
+Set `CSharpWorkerPath` in `SparkCLRSamples.exe.config` and run the following command: 
+
 ```
 sparkclr-submit.cmd --verbose %SPARKCLR_HOME%\lib\spark-clr-1.4.1-SNAPSHOT.jar %SPARKCLR_HOME%\bin\SparkCLRSamples.exe spark.local.dir C:\temp\SparkCLRTemp sparkclr.sampledata.loc %SPARKCLR_HOME%\data
 ```   

+Note that SparkCLR jar version (**1.4.1**) should be aligned with Apache Spark version.  
+
 Setting `spark.local.dir` parameter is important. When local Spark instance distributes SparkCLR driver executables to Windows `%TEMP%` directory, anti-virus software may detect and report the executables showed up in `%TEMP%` directory as malware.

 ### Running in Standalone cluster mode
@ -102,21 +136,28 @@ sparkclr-submit.cmd --verbose  %SPARKCLR_HOME%\lib\spark-clr-1.4.1-SNAPSHOT.jar
 ```

 ### Running in YARN mode
+
 To be added

 ## Running Unit Tests
+
 * In Visual Studio: "Test" -> "Run" -> "All Tests"
-* In Developer Command Prompt for VS, navigate to `SparkCLR\csharp` and run the following command  
-        ```
-        test.cmd
-        ```
+
+* In Developer Command Prompt for VS, navigate to `SparkCLR\csharp` and run the following command: 
+    ```
+    Test.cmd
+    ```

 ## Debugging Tips
-CSharpBackend and C# driver are separately launched for debugging SparkCLR Adapter or driver
-For example, to debug SparkCLR samples  
-* Launch CSharpBackend using ```sparkclr-submit.cmd debug``` and get the port number displayed in the console  
-* Navigate to `csharp/Samples/Microsoft.Spark.CSharp` and edit `App.Config` to use the port number from the previous step for CSharpBackendPortNumber config and also set CSharpWorkerPath config  
-* Run `SparkCLRSamples.exe` in Visual Studio
+
+CSharpBackend and C# driver are separately launched for debugging SparkCLR Adapter or driver.
+
+For example, to debug SparkCLR samples:
+
+* Launch CSharpBackend.exe using ```sparkclr-submit.cmd debug``` and get the port number displayed in the console.  
+* Navigate to `csharp/Samples/Microsoft.Spark.CSharp` and edit `App.Config` to use the port number from the previous step for `CSharpBackendPortNumber` config and also set `CSharpWorkerPath` config values.  
+* Run `SparkCLRSamples.exe` in Visual Studio.

 ## License
+
 SparkCLR is licensed under the MIT license. See LICENSE file in the project root for full license information.