зеркало из https://github.com/microsoft/spark.git
Don't use deprecated Application in example
As of 2.9.0 extending from Application is not recommended http://www.scala-lang.org/api/2.9.3/index.html#scala.Application
This commit is contained in:
Родитель
bc36ee4fbb
Коммит
4e2c965383
|
@ -111,7 +111,8 @@ We'll create a very simple Spark job in Scala. So simple, in fact, that it's nam
|
|||
import spark.SparkContext
|
||||
import SparkContext._
|
||||
|
||||
object SimpleJob extends Application {
|
||||
object SimpleJob {
|
||||
def main(args: Array[String]) {
|
||||
val logFile = "/var/log/syslog" // Should be some file on your system
|
||||
val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",
|
||||
List("target/scala-{{site.SCALA_VERSION}}/simple-project_{{site.SCALA_VERSION}}-1.0.jar"))
|
||||
|
@ -120,6 +121,7 @@ object SimpleJob extends Application {
|
|||
val numBs = logData.filter(line => line.contains("b")).count()
|
||||
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
|
||||
}
|
||||
}
|
||||
{% endhighlight %}
|
||||
|
||||
This job simply counts the number of lines containing 'a' and the number containing 'b' in a system log file. Unlike the earlier examples with the Spark shell, which initializes its own SparkContext, we initialize a SparkContext as part of the job. We pass the SparkContext constructor four arguments, the type of scheduler we want to use (in this case, a local scheduler), a name for the job, the directory where Spark is installed, and a name for the jar file containing the job's sources. The final two arguments are needed in a distributed setting, where Spark is running across several nodes, so we include them for completeness. Spark will automatically ship the jar files you list to slave nodes.
|
||||
|
|
Загрузка…
Ссылка в новой задаче