зеркало из https://github.com/microsoft/spark.git
Fixes suggested by Patrick
This commit is contained in:
Родитель
4819baa658
Коммит
9ddad0dcb4
|
@ -4,7 +4,7 @@
|
|||
# spark-env.sh and edit that to configure Spark for your site.
|
||||
#
|
||||
# The following variables can be set in this file:
|
||||
# - SPARK_LOCAL_IP, to override the IP address binds to
|
||||
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
|
||||
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
|
||||
# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
|
||||
# we recommend setting app-wide options in the application's driver program.
|
||||
|
|
|
@ -21,7 +21,6 @@ Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html)
|
|||
[Hadoop YARN](running-on-yarn.html).
|
||||
|
||||
* If this is not possible, run Spark on different nodes in the same local-area network as HDFS.
|
||||
If your cluster spans multiple racks, include some Spark nodes on each rack.
|
||||
|
||||
* For low-latency data stores like HBase, it may be preferrable to run computing jobs on different
|
||||
nodes than the storage system to avoid interference.
|
||||
|
|
|
@ -40,12 +40,13 @@ Python interpreter (`./pyspark`). These are a great way to learn Spark.
|
|||
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
|
||||
storage systems. Because the HDFS protocol has changed in different versions of
|
||||
Hadoop, you must build Spark against the same version that your cluster uses.
|
||||
You can do this by setting the `SPARK_HADOOP_VERSION` variable when compiling:
|
||||
By default, Spark links to Hadoop 1.0.4. You can change this by setting the
|
||||
`SPARK_HADOOP_VERSION` variable when compiling:
|
||||
|
||||
SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
|
||||
|
||||
In addition, if you wish to run Spark on [YARN](running-on-yarn.md), you should also
|
||||
set `SPARK_YARN`:
|
||||
In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
|
||||
`SPARK_YARN` to `true`:
|
||||
|
||||
SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
|
||||
|
||||
|
@ -94,7 +95,7 @@ set `SPARK_YARN`:
|
|||
exercises about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012),
|
||||
[slides](http://ampcamp.berkeley.edu/agenda-2012) and [exercises](http://ampcamp.berkeley.edu/exercises-2012) are
|
||||
available online for free.
|
||||
* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
|
||||
* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/) of Spark
|
||||
* [Paper Describing Spark](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
|
||||
* [Paper Describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)
|
||||
|
||||
|
|
|
@ -126,7 +126,7 @@ object SimpleJob {
|
|||
|
||||
This job simply counts the number of lines containing 'a' and the number containing 'b' in the Spark README. Note that you'll need to replace $YOUR_SPARK_HOME with the location where Spark is installed. Unlike the earlier examples with the Spark shell, which initializes its own SparkContext, we initialize a SparkContext as part of the job. We pass the SparkContext constructor four arguments, the type of scheduler we want to use (in this case, a local scheduler), a name for the job, the directory where Spark is installed, and a name for the jar file containing the job's sources. The final two arguments are needed in a distributed setting, where Spark is running across several nodes, so we include them for completeness. Spark will automatically ship the jar files you list to slave nodes.
|
||||
|
||||
This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds two repositories which host Spark dependencies:
|
||||
This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds a repository that Spark depends on:
|
||||
|
||||
{% highlight scala %}
|
||||
name := "Simple Project"
|
||||
|
@ -137,9 +137,7 @@ scalaVersion := "{{site.SCALA_VERSION}}"
|
|||
|
||||
libraryDependencies += "org.spark-project" %% "spark-core" % "{{site.SPARK_VERSION}}"
|
||||
|
||||
resolvers ++= Seq(
|
||||
"Akka Repository" at "http://repo.akka.io/releases/",
|
||||
"Spray Repository" at "http://repo.spray.cc/")
|
||||
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
|
||||
{% endhighlight %}
|
||||
|
||||
If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
|
||||
|
@ -210,10 +208,6 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
|
|||
<packaging>jar</packaging>
|
||||
<version>1.0</version>
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>Spray.cc repository</id>
|
||||
<url>http://repo.spray.cc</url>
|
||||
</repository>
|
||||
<repository>
|
||||
<id>Akka repository</id>
|
||||
<url>http://repo.akka.io/releases</url>
|
||||
|
|
Загрузка…
Ссылка в новой задаче