зеркало из https://github.com/microsoft/spark.git
This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly |
||
---|---|---|
.. | ||
lib | ||
src/main/assembly | ||
README | ||
pom.xml |
README
This is an assembly module for Spark project. It creates a single tar.gz file that includes all needed dependency of the project except for org.apache.hadoop.* jars that are supposed to be available from the deployed Hadoop cluster. This module is off by default to avoid spending extra time on top of repl-bin module. To activate it specify the profile in the command line -Passembly In case you want to avoid building time-expensive repl-bin module, that shaders all the dependency into a big flat jar supplement maven command with -DnoExpensive