This commit is contained in:
Matei Zaharia 2013-08-27 20:02:30 -07:00
Родитель d8a4008685
Коммит e11bc18294
1 изменённых файлов: 34 добавлений и 37 удалений

Просмотреть файл

@ -8,43 +8,14 @@ title: Building Spark with Maven
Building Spark using Maven Requires Maven 3 (the build process is tested with Maven 3.0.4) and Java 1.6 or newer.
## Specifying the Hadoop version ##
To enable support for HDFS and other Hadoop-supported storage systems, specify the exact Hadoop version by setting the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default.
## Setting up Maven's Memory Usage ##
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions without YARN, use:
You'll need to configure Maven to use more memory than usual by setting `MAVEN_OPTS`. We recommend the following settings:
# Apache Hadoop 1.2.1
$ mvn -Dhadoop.version=1.2.1 clean package
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
# Cloudera CDH 4.2.0 with MapReduce v1
$ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 clean package
For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, enable the "hadoop2-yarn" profile:
# Apache Hadoop 2.0.5-alpha
$ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha clean package
# Cloudera CDH 4.2.0 with MapReduce v2
$ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 clean package
## Spark Tests in Maven ##
Tests are run by default via the scalatest-maven-plugin. With this you can do things like:
Skip test execution (but not compilation):
$ mvn -Dhadoop.version=... -DskipTests clean package
To run a specific test suite:
$ mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
## Setting up JVM Memory Usage Via Maven ##
You might run into the following errors if you're using a vanilla installation of Maven:
If you don't run this, you may see errors like the following:
[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
[ERROR] PermGen space -> [Help 1]
@ -52,9 +23,36 @@ You might run into the following errors if you're using a vanilla installation o
[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
[ERROR] Java heap space -> [Help 1]
To fix these, you can do the following:
You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=128M"
## Specifying the Hadoop version ##
Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default.
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions without YARN, use:
# Apache Hadoop 1.2.1
$ mvn -Dhadoop.version=1.2.1 -DskipTests clean package
# Cloudera CDH 4.2.0 with MapReduce v1
$ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package
For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you should also enable the "hadoop2-yarn" profile:
# Apache Hadoop 2.0.5-alpha
$ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha -DskipTests clean package
# Cloudera CDH 4.2.0 with MapReduce v2
$ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package
## Spark Tests in Maven ##
Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin). Some of the require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. You can then run the tests with `mvn -Dhadoop.version=... test`.
The ScalaTest plugin also supports running only a specific test suite as follows:
$ mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
## Continuous Compilation ##
@ -63,8 +61,7 @@ We use the scala-maven-plugin which supports incremental and continuous compilat
$ mvn scala:cc
…should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.
should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.
## Using With IntelliJ IDEA ##