Граф коммитов

3903 Коммитов

Автор SHA1 Сообщение Дата
Matei Zaharia a30fac16ca Merge pull request #883 from alig/master
Don't require the spark home environment variable to be set for standalone mode (change needed by SIMR)
2013-09-01 12:27:50 -07:00
Matei Zaharia 03cc76506a Merge pull request #881 from pwendell/master
Extend QuickStart to include next steps
2013-09-01 10:20:56 -07:00
Patrick Wendell 0e375a3cc2 Add assmebly plug in links 2013-09-01 09:43:42 -07:00
Patrick Wendell 6371febe18 Better docs 2013-08-31 19:09:06 -07:00
Matei Zaharia 0e9565a704 Merge pull request #880 from mateiz/ui-tweaks
Various UI tweaks
2013-08-31 18:55:41 -07:00
Matei Zaharia 2c5a4b89ee Small fixes to README 2013-08-31 18:08:05 -07:00
Matei Zaharia 2b29a1d43f Merge pull request #877 from mateiz/docs
Doc improvements for 0.8
2013-08-31 17:49:45 -07:00
Matei Zaharia e34bc3a8ee Small tweak 2013-08-31 17:47:15 -07:00
Matei Zaharia 7862c4a3c8 Another fix suggested by Patrick 2013-08-31 17:41:47 -07:00
Matei Zaharia 9ddad0dcb4 Fixes suggested by Patrick 2013-08-31 17:40:33 -07:00
Matei Zaharia 2ee6a7e32a Print output from spark-daemon only when it fails to launch 2013-08-31 17:31:07 -07:00
Ali Ghodsi 250bddc255 Don't require spark home to be set for standalone mode 2013-08-31 17:29:05 -07:00
Matei Zaharia 25ac50668b Various web UI improvements:
- Use "fluid" layout that can expand to wide browser windows, instead of
  the old one's limit of 1200 px
- Remove unnecessary <hr> elements
- Switch back to Bootstrap's default theme and tweak progress bar colors
- Make headers more consistent between deploy and app UIs
- Replace some inline CSS with stylesheets
2013-08-31 16:55:40 -07:00
Matei Zaharia 89a20b83e9 Delete some code that was added back in a merge and print less info in
spark-daemon
2013-08-31 16:55:25 -07:00
Matei Zaharia 4819baa658 More updates, describing changes to recommended use of environment vars
and new Python stuff
2013-08-31 14:21:10 -07:00
Matei Zaharia 6edef9c833 Merge pull request #861 from AndreSchumacher/pyspark_sampling_function
Pyspark sampling function
2013-08-31 13:39:24 -07:00
Matei Zaharia fd89835965 Merge pull request #870 from JoshRosen/spark-885
Don't send SIGINT / ctrl-c to Py4J gateway subprocess
2013-08-31 13:18:12 -07:00
Matei Zaharia 618f0ecb43 Merge pull request #869 from AndreSchumacher/subtract
PySpark: implementing subtractByKey(), subtract() and keyBy()
2013-08-30 18:17:13 -07:00
Matei Zaharia 4293533032 Update docs about HDFS versions 2013-08-30 15:04:43 -07:00
Matei Zaharia f3a964848d More doc improvements + better warnings when you haven't built Spark 2013-08-30 12:41:25 -07:00
Reynold Xin 94bb7fd46e Merge pull request #876 from mbautin/master_hadoop_rdd_conf
Make HadoopRDD's configuration accessible
2013-08-30 12:05:13 -07:00
Mikhail Bautin 35090958b3 Also add getConf to NewHadoopRDD 2013-08-30 11:03:57 -07:00
Mikhail Bautin 5e30172f70 Make HadoopRDD's configuration accessible 2013-08-30 11:01:06 -07:00
Matei Zaharia 23762efda2 New hardware provisioning doc, and updates to menus 2013-08-30 10:16:26 -07:00
Matei Zaharia 1b0f69c623 Change docs color theme for 0.8 2013-08-30 10:15:58 -07:00
Reynold Xin 9e17e456d2 Merge pull request #875 from shivaram/build-fix
Fix broken build by removing addIntercept
2013-08-30 00:22:53 -07:00
Shivaram Venkataraman adc700582b Fix broken build by removing addIntercept 2013-08-30 00:16:32 -07:00
Evan Sparks 016787de32 Merge pull request #863 from shivaram/etrain-ridge
Adding linear regression and refactoring Ridge regression to use SGD
2013-08-29 22:15:14 -07:00
Evan Sparks 852d810787 Merge pull request #819 from shivaram/sgd-cleanup
Change SVM to use {0,1} labels
2013-08-29 22:13:15 -07:00
Matei Zaharia ca71620950 Merge pull request #857 from mateiz/assembly
Change build and run instructions to use assemblies
2013-08-29 21:51:14 -07:00
Reynold Xin 1528776628 Merge pull request #874 from jerryshao/fix-report-bug
Fix removed block zero size log reporting
2013-08-29 21:30:47 -07:00
Matei Zaharia e11bc18294 Update Maven docs 2013-08-29 21:19:07 -07:00
Matei Zaharia d8a4008685 Fix path to assembly in make-distribution.sh 2013-08-29 21:19:07 -07:00
Matei Zaharia 2de756ff19 Update some build instructions because only sbt assembly and mvn package
are now needed
2013-08-29 21:19:06 -07:00
Matei Zaharia 666d93c294 Update Maven build to create assemblies expected by new scripts
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
  assembly containing both hadoop-client and Spark, unlike the old
  BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
  don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
  so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
  Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia d7dec938e5 Don't use SPARK_LAUNCH_WITH_SCALA in pyspark 2013-08-29 21:19:06 -07:00
Matei Zaharia 3ff105f87d Find assembly correctly in pyspark 2013-08-29 21:19:06 -07:00
Matei Zaharia aab345c463 Fix finding of assembly JAR, as well as some pointers to ./run 2013-08-29 21:19:06 -07:00
Matei Zaharia 8d81358a05 Provide more memory for tests 2013-08-29 21:19:06 -07:00
Matei Zaharia ab0e625d9e Fix PySpark for assembly run and include it in dist 2013-08-29 21:19:06 -07:00
Matei Zaharia 53cd50c069 Change build and run instructions to use assemblies
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
jerryshao f3dbe6b215 Fix removed block zero size log reporting 2013-08-30 09:39:01 +08:00
Patrick Wendell abdbacf252 Merge pull request #871 from pwendell/expose-local
Expose `isLocal` in SparkContext.
2013-08-28 21:11:31 -07:00
Matei Zaharia afcade3ca8 Merge pull request #873 from pwendell/master
Hot fix for command runner
2013-08-28 20:15:40 -07:00
Patrick Wendell 1798e69e71 Adding extra args 2013-08-28 19:56:46 -07:00
Patrick Wendell 30d2421112 Make local variable public 2013-08-28 19:53:31 -07:00
Patrick Wendell 2fc9a028f2 Hot fix for command runner 2013-08-28 19:03:06 -07:00
Andre Schumacher a511c5379e RDD sample() and takeSample() prototypes for PySpark 2013-08-28 16:46:13 -07:00
Josh Rosen 742c44eae6 Don't send SIGINT to Py4J gateway subprocess.
This addresses SPARK-885, a usability issue where PySpark's
Java gateway process would be killed if the user hit ctrl-c.

Note that SIGINT still won't cancel the running s

This fix is based on http://stackoverflow.com/questions/5045771
2013-08-28 16:39:44 -07:00
Andre Schumacher 457bcd3343 PySpark: implementing subtractByKey(), subtract() and keyBy() 2013-08-28 16:14:22 -07:00