ADD_JARS environment variable for spark-shell

2013-06-22 17:12:39 -07:00 · 2013-06-22 17:12:39 -07:00 · b5df1cd668
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@ -43,12 +43,18 @@ new SparkContext(master, appName, [sparkHome], [jars])

 The `master` parameter is a string specifying a [Spark or Mesos cluster URL](#master-urls) to connect to, or a special "local" string to run in local mode, as described below. `appName` is a name for your application, which will be shown in the cluster web UI. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.

-In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable. For example, to run on four cores, use
+In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable, and you can add JARs to the classpath with the `ADD_JARS` variable. For example, to run `spark-shell` on four cores, use

 {% highlight bash %}
 $ MASTER=local[4] ./spark-shell
 {% endhighlight %}

+Or, to also add `code.jar` to its classpath, use:
+
+{% highlight bash %}
+$ MASTER=local[4] ADD_JARS=code.jar ./spark-shell
+{% endhighlight %}
+
 ### Master URLs

 The master URL passed to Spark can be in one of the following formats:
@ -78,7 +84,7 @@ If you want to run your job on a cluster, you will need to specify the two optio
 * `sparkHome`: The path at which Spark is installed on your worker machines (it should be the same on all of them).
 * `jars`: A list of JAR files on the local machine containing your job's code and any dependencies, which Spark will deploy to all the worker nodes. You'll need to package your job into a set of JARs using your build system. For example, if you're using SBT, the [sbt-assembly](https://github.com/sbt/sbt-assembly) plugin is a good way to make a single JAR with your code and dependencies.

-If you run `spark-shell` on a cluster, any classes you define in the shell will automatically be distributed.
+If you run `spark-shell` on a cluster, you can add JARs to it by specifying the `ADD_JARS` environment variable before you launch it.  This variable should contain a comma-separated list of JARs. For example, `ADD_JARS=a.jar,b.jar ./spark-shell` will launch a shell with `a.jar` and `b.jar` on its classpath. In addition, any new classes you define in the shell will automatically be distributed.


 # Resilient Distributed Datasets (RDDs)
--- a/repl/src/main/scala/spark/repl/SparkILoop.scala
+++ b/repl/src/main/scala/spark/repl/SparkILoop.scala
@ -822,7 +822,7 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master:
        spark.repl.Main.interp.out.println("Spark context available as sc.");
        spark.repl.Main.interp.out.flush();
        """)
-      command("import spark.SparkContext._");
+      command("import spark.SparkContext._")
    }
    echo("Type in expressions to have them evaluated.")
    echo("Type :help for more information.")
@ -838,7 +838,8 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master:
        if (prop != null) prop else "local"
      }
    }
-    sparkContext = new SparkContext(master, "Spark shell")
+    val jars = Option(System.getenv("ADD_JARS")).map(_.split(',')).getOrElse(new Array[String](0))
+    sparkContext = new SparkContext(master, "Spark shell", System.getenv("SPARK_HOME"), jars)
    sparkContext
  }

@ -850,6 +851,10 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master:
    printWelcome()
    echo("Initializing interpreter...")

+    // Add JARS specified in Spark's ADD_JARS variable to classpath
+    val jars = Option(System.getenv("ADD_JARS")).map(_.split(',')).getOrElse(new Array[String](0))
+    jars.foreach(settings.classpath.append(_))
+
    this.settings = settings
    createInterpreter()