Fixes suggested by Patrick

2013-08-31 17:40:33 -07:00 · 2013-08-31 17:40:33 -07:00 · 9ddad0dcb4
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@ -4,7 +4,7 @@
 # spark-env.sh and edit that to configure Spark for your site.
 #
 # The following variables can be set in this file:
-# - SPARK_LOCAL_IP, to override the IP address binds to
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
 # - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
 #   we recommend setting app-wide options in the application's driver program.
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@ -21,7 +21,6 @@ Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html)
 [Hadoop YARN](running-on-yarn.html).

 * If this is not possible, run Spark on different nodes in the same local-area network as HDFS.
-If your cluster spans multiple racks, include some Spark nodes on each rack.

 * For low-latency data stores like HBase, it may be preferrable to run computing jobs on different
 nodes than the storage system to avoid interference.
--- a/docs/index.md
+++ b/docs/index.md
@ -40,12 +40,13 @@ Python interpreter (`./pyspark`). These are a great way to learn Spark.
 Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
 storage systems. Because the HDFS protocol has changed in different versions of
 Hadoop, you must build Spark against the same version that your cluster uses.
-You can do this by setting the `SPARK_HADOOP_VERSION` variable when compiling:
+By default, Spark links to Hadoop 1.0.4. You can change this by setting the
+`SPARK_HADOOP_VERSION` variable when compiling:

    SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly

-In addition, if you wish to run Spark on [YARN](running-on-yarn.md), you should also
-set `SPARK_YARN`:
+In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
+`SPARK_YARN` to `true`:

    SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly

@ -94,7 +95,7 @@ set `SPARK_YARN`:
  exercises about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012),
  [slides](http://ampcamp.berkeley.edu/agenda-2012) and [exercises](http://ampcamp.berkeley.edu/exercises-2012) are
  available online for free.
-* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
+* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/) of Spark
 * [Paper Describing Spark](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
 * [Paper Describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)

--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@ -126,7 +126,7 @@ object SimpleJob {

 This job simply counts the number of lines containing 'a' and the number containing 'b' in the Spark README. Note that you'll need to replace $YOUR_SPARK_HOME with the location where Spark is installed. Unlike the earlier examples with the Spark shell, which initializes its own SparkContext, we initialize a SparkContext as part of the job. We pass the SparkContext constructor four arguments, the type of scheduler we want to use (in this case, a local scheduler), a name for the job, the directory where Spark is installed, and a name for the jar file containing the job's sources. The final two arguments are needed in a distributed setting, where Spark is running across several nodes, so we include them for completeness. Spark will automatically ship the jar files you list to slave nodes.

-This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds two repositories which host Spark dependencies:
+This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds a repository that Spark depends on:

 {% highlight scala %}
 name := "Simple Project"
@ -137,9 +137,7 @@ scalaVersion := "{{site.SCALA_VERSION}}"

 libraryDependencies += "org.spark-project" %% "spark-core" % "{{site.SPARK_VERSION}}"

-resolvers ++= Seq(
-  "Akka Repository" at "http://repo.akka.io/releases/",
-  "Spray Repository" at "http://repo.spray.cc/")
+resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
 {% endhighlight %}

 If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
@ -210,10 +208,6 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
  <packaging>jar</packaging>
  <version>1.0</version>
  <repositories>
-    <repository>
-      <id>Spray.cc repository</id>
-      <url>http://repo.spray.cc</url>
-    </repository>
    <repository>
      <id>Akka repository</id>
      <url>http://repo.akka.io/releases</url>