spark

Граф коммитов

Автор	SHA1	Сообщение	Дата
Matei Zaharia	32a4f4623c	Merge pull request #129 from mesos/rxin Force serialize/deserialize task results in local execution mode.	2012-04-24 16:18:39 -07:00
Reynold Xin	761ea65a98	Added a test for the previous commit (failing to serialize task results would throw an exception for local tasks).	2012-04-24 15:14:35 -07:00
Reynold Xin	9821cd4d42	Force serialize/deserialize task results in local execution mode.	2012-04-24 14:55:28 -07:00
Antonio	3e48818993	Removed commented-out System.exit call	2012-04-23 11:42:58 -07:00
Antonio	39d99168dc	Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions	2012-04-20 14:46:43 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	3b745176e0	Bug fix to pluggable closure serialization change	2012-04-12 17:53:02 +00:00
Matei Zaharia	112655f032	Merge pull request #121 from rxin/kryo-closure Added an option (spark.closure.serializer) to specify the serializer for closures.	2012-04-10 14:21:02 -07:00
Reynold Xin	d295ccb43c	Added a closureSerializer field in SparkEnv and use it to serialize tasks.	2012-04-10 13:29:46 -07:00
Reynold Xin	968f75f6af	Added an option (spark.closure.serializer) to specify the serializer for closures. This enables using Kryo as the closure serializer.	2012-04-09 21:59:56 -07:00
Matei Zaharia	a69c0738d1	Merge branch 'master' into mesos-0.9	2012-04-08 23:41:36 -07:00
Matei Zaharia	a633974143	Merge branch 'master' of github.com:mesos/spark	2012-04-08 23:41:25 -07:00
Matei Zaharia	0229d5390f	Merge branch 'master' into mesos-0.9	2012-04-08 23:39:37 -07:00
Matei Zaharia	d401e1b3e8	Fix a possible deadlock in MesosScheduler	2012-04-08 23:38:49 -07:00
Ankur Dave	7be1c7b331	Report entry dropping in BoundedMemoryCache	2012-04-06 15:49:32 -07:00
Matei Zaharia	a8bb324ed9	Merge branch 'master' into mesos-0.9	2012-04-05 14:53:22 -07:00
Matei Zaharia	816d4e5840	Pass local IP address instead of hostname in spark.master.host. Fixes #117 .	2012-04-05 14:53:17 -07:00
Matei Zaharia	335a6036ad	Converted some tabs to spaces	2012-04-05 11:58:01 -07:00
Matei Zaharia	8c95a85438	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:39:35 -04:00
Matei Zaharia	03d5b3b48d	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:38:19 -04:00
Matei Zaharia	95fb1a16b8	Use Mesos 0.9 RC3 JAR and protobuf 2.4.1	2012-03-30 11:38:49 -04:00
Matei Zaharia	dfa3b6b544	Fixes to work with the very latest Mesos 0.9 API	2012-03-29 22:12:35 -04:00
Matei Zaharia	4d52cc6738	Merge branch 'master' into mesos-0.9	2012-03-29 21:29:39 -04:00
Reynold Xin	42dcdbcb2f	Removed the extra spaces in OrderedRDDFunctions and SortedRDD.	2012-03-29 15:21:57 -07:00
Matei Zaharia	08cda89e8a	Further fixes to how Mesos is found and used	2012-03-17 13:39:14 -07:00
Matei Zaharia	3c3fdf6eca	Merge branch 'master' into mesos-0.9	2012-03-17 13:09:21 -07:00
Matei Zaharia	c7af538ac1	Some fixes to sorting for when the RDD has fewer elements than the number of partitions we ask to partition it into. Also, removed a test that was taking way too long to run.	2012-03-17 13:08:36 -07:00
Matei Zaharia	a099a63a8a	Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0	2012-03-17 12:31:34 -07:00
Matei Zaharia	a5e2b6a6bd	Merge pull request #112 from cengle/master Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection	2012-03-06 13:38:32 -08:00
Matei Zaharia	97eee50825	Fixes a nasty bug that could happen when tasks fail, because calling wait() with a timeout of 0 on a Java object means "wait forever".	2012-03-01 13:43:17 -08:00
Cliff Engle	dd68cb6099	Get key and value container from RecordReader	2012-02-29 16:33:23 -08:00
Matei Zaharia	1e10df0a46	Merge pull request #111 from alupher/master Adding sorting to RDDs	2012-02-24 15:50:14 -08:00
Antonio	0d93d95bcf	Removed unnecessary import	2012-02-21 19:57:12 -08:00
Antonio	2990298f71	Added sorting testing suite	2012-02-21 19:54:21 -08:00
Matei Zaharia	aa04f87cd2	Added support for parallel execution of jobs in DAGScheduler.	2012-02-19 22:50:23 -08:00
Antonio	620798161b	Added fixes to sorting	2012-02-13 00:07:39 -08:00
Matei Zaharia	2587ce1690	Fixed a deadlock that occured with MesosScheduler due to an earlier synchronization change	2012-02-11 21:22:45 -08:00
Antonio	e93f622665	Added sorting by key for pair RDDs	2012-02-11 00:56:28 -08:00
Matei Zaharia	98f008b721	Formatting fixes	2012-02-10 10:52:03 -08:00
Matei Zaharia	7660a8b12f	Merge branch 'formatting' Conflicts: core/src/main/scala/spark/DAGScheduler.scala core/src/main/scala/spark/SimpleShuffleFetcher.scala core/src/main/scala/spark/SparkContext.scala	2012-02-10 10:42:14 -08:00
haoyuan	194c42ab79	Code format.	2012-02-10 08:19:53 -08:00
Matei Zaharia	8f5ed51234	Delete Spark's temporary directories when the JVM exits.	2012-02-09 22:58:24 -08:00
Matei Zaharia	c0a0df3285	Made the default cache BoundedMemoryCache, and reduced its default size	2012-02-09 22:32:02 -08:00
Matei Zaharia	a766780f4c	Added some tests for multithreaded access to Spark.	2012-02-09 22:27:53 -08:00
Matei Zaharia	0e93891d3d	Replaced LocalFileShuffle with a non-singleton ShuffleManager class and made DAGScheduler automatically set SparkEnv.	2012-02-09 22:14:56 -08:00
haoyuan	445e0bb1b5	Format the code a bit mroe.	2012-02-09 15:50:26 -08:00
haoyuan	651932e703	Format the code as coding style agreed by Matei/TD/Haoyuan	2012-02-09 13:26:23 -08:00
Matei Zaharia	e02dc83a5b	IO optimizations	2012-02-06 20:40:39 -08:00
Matei Zaharia	c40e766368	Use java.util.HashMap in shuffles	2012-02-06 19:20:25 -08:00
Matei Zaharia	b267175ab5	Synchronization fix in case SparkContext is used from multiple threads.	2012-02-06 14:28:18 -08:00
Matei Zaharia	43a3335090	Simplifying test	2012-02-05 22:46:51 -08:00
Hiral Patel	b47952342e	Add register immutable map to kryo serializer	2012-01-26 15:24:20 -08:00
Matei Zaharia	fabcc82528	Merge pull request #103 from edisontung/master Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans	2012-01-13 19:20:03 -08:00
Matei Zaharia	fd5581a0d3	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:17:27 -08:00
Matei Zaharia	eb05154b7a	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:08:25 -08:00
Edison Tung	1ecc221f84	Fixed bugs I've fixed the bugs detailed in the diff. One of the bugs was already fixed on the local file (forgot to commit).	2012-01-09 11:59:52 -08:00
Matei Zaharia	e269f6f7ea	Register RDDs with the MapOutputTracker even if they have no partitions. Fixes #105.	2012-01-05 15:59:20 -05:00
Matei Zaharia	3034fc0d91	Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'	2011-12-14 18:19:43 +01:00
Matei Zaharia	6a650cbbdf	Make Spark port default to 7077 so that it's not an ephemeral port that might be taken	2011-12-14 18:18:22 +01:00
Matei Zaharia	735843a049	Merge remote-tracking branch 'origin/charles-newhadoop'	2011-12-02 21:59:30 -08:00
Charles Reiss	66f05f383e	Add new Hadoop API reading support.	2011-12-01 14:02:10 -08:00
Charles Reiss	02d43e6986	Add new Hadoop API writing support.	2011-12-01 14:01:28 -08:00
Edison Tung	42f8847a21	Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD	2011-12-01 13:43:25 -08:00
Edison Tung	de01b6deaa	Fixed bug in RDD Math.min takes 2 args, not 1. This was not committed earlier for some reason	2011-12-01 13:34:37 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Matei Zaharia	09dd58b3a7	Send SPARK_JAVA_OPTS to slave nodes.	2011-11-30 11:34:58 -08:00
Edison Tung	a3bc012af8	added takeSamples method takeSamples method takes a specified number of samples from the RDD and outputs it in an array.	2011-11-21 16:38:44 -08:00
Ankur Dave	ad4ebff42c	Deduplicate exceptions when printing them The first time they appear, exceptions are printed in full, including a stack trace. After that, they are printed in abbreviated form. They are periodically reprinted in full; the reprint interval defaults to 5 seconds and is configurable using the property spark.logging.exceptionPrintInterval.	2011-11-14 01:54:53 +00:00
Ankur Dave	35b6358a7c	Report errors in tasks to the driver via a Mesos status update When a task throws an exception, the Spark executor previously just logged it to a local file on the slave and exited. This commit causes Spark to also report the exception back to the driver using a Mesos status update, so the user doesn't have to look through a log file on the slave. Here's what the reporting currently looks like: # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050 [...] 11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1) 11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling [...] 11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s	2011-11-14 01:54:53 +00:00
Matei Zaharia	07532021fe	Bug fix: reject offers that we didn't find any tasks for	2011-11-08 23:05:54 -08:00
Matei Zaharia	9e4c79a4d3	Closure cleaner unit test	2011-11-08 00:40:15 -08:00
Matei Zaharia	f346e64637	Updates to the closure cleaner to work better with closures in classes. Before, the cleaner attempted to clone $outer objects that were classes (as opposed to nested closures) and preserve only their used fields, which was bad because it would miss fields that are accessed indirectly by methods, and in general it would confuse user code. Now we keep a reference to those objects without cloning them. This is not perfect because the user still needs to be careful of what they'll carry along into closures, but it works better in some cases that seemed confusing before. We need to improve the documentation on what variables get passed along with a closure and possibly add some debugging tools for it as well. Fixes #71 -- that code now works in the REPL.	2011-11-08 00:33:28 -08:00
Matei Zaharia	c2b7fd6899	Make parallelize() work efficiently for ranges of Long, Double, etc (splitting them into sub-ranges). Fixes #87.	2011-11-02 15:16:02 -07:00
Matei Zaharia	157279e9eb	Update Spark to work with the latest Mesos API	2011-10-30 14:10:56 -07:00
root	3a0e6c4363	Miscellaneous fixes: - Executor should initialize logging properly - groupByKey should allow custom partitioner	2011-10-17 18:07:35 +00:00
root	62aa820084	Merge branch 'ankur-master'	2011-10-14 02:14:07 +00:00
Ankur Dave	2d7057bf5d	Implement PairRDDFunctions.partitionBy	2011-10-09 15:52:09 -07:00
Ankur Dave	06637cb69e	Fix PairRDDFunctions.groupWith partitioning This commit fixes a bug in groupWith that was causing it to destroy partitioning information. It replaces a call to map with a call to mapValues, which preserves partitioning.	2011-10-09 15:48:46 -07:00
Ankur Dave	2911a783d6	Add custom partitioner support to PairRDDFunctions.combineByKey	2011-10-09 15:47:20 -07:00
Ankur Dave	6c6e47e3cd	Use BufferedOutputStream in ShuffleMapTask	2011-10-09 15:43:31 -07:00
Matei Zaharia	1069740264	Added a jarOfObject method to get the JAR of the class that an object belongs to, which seems like a more common case.	2011-08-29 23:27:10 -07:00
Matei Zaharia	0aa23bf17e	Added a convenience method for getting the JAR file that loaded a class (useful for jobs to pass their own JAR files to SparkContext).	2011-08-29 22:59:44 -07:00
Matei Zaharia	a161f00610	Made a log message slightly less ugly	2011-08-27 16:58:54 -07:00
Matei Zaharia	3759bcd061	New mesos.jar	2011-08-10 14:03:48 -07:00
Matei Zaharia	c22043f150	Minor fix: can use >= when checking memory	2011-08-02 19:11:17 -07:00
Ismael Juma	6ff57f5594	Use scala.math instead of Math as the latter is deprecated.	2011-08-02 10:25:47 +01:00
Ismael Juma	620de2dd1d	Change currentThread to Thread.currentThread as the former is deprecated.	2011-08-02 10:25:16 +01:00
Ismael Juma	0fba22b3d2	Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch Note that we use scala.Serializable introduced in Scala 2.9 instead of java.io.Serializable. Also, case classes inherit from scala.Serializable by default.	2011-08-02 10:16:33 +01:00
Matei Zaharia	711575391d	Merge branch 'scala-2.9' Conflicts: project/build/SparkProject.scala	2011-08-01 15:25:26 -07:00
Matei Zaharia	4050d661c5	Updated to newest Mesos API, which includes better memory accounting by specifying per-executor memory.	2011-08-01 13:54:48 -07:00
Matei Zaharia	d12122502b	Various improvements to Kryo serializer: - Replaced modified Kryo version with the standard one augmented with the kryo-serializers package, which includes support for classes with no-arg constructors (that was why we had a modified Kryo before) - The kryo-serializers version also fixes issue #72. - Added a bunch of tests. - Serialize maps and a few other common types properly by default.	2011-07-21 22:09:33 -07:00
Matei Zaharia	baa72e2747	Removed a debug statement that slipped in as a println	2011-07-21 16:09:33 -07:00
Matei Zaharia	2bfd7931e8	Merge branch 'new-rdds-protobuf' Conflicts: core/src/main/scala/spark/Executor.scala core/src/main/scala/spark/RDD.scala	2011-07-21 16:08:39 -07:00
Matei Zaharia	1450fd74d9	Merge branch 'master' into scala-2.9	2011-07-14 17:37:24 -04:00
Matei Zaharia	ccf48388cd	Lowered default number of splits for files	2011-07-14 17:37:04 -04:00
Matei Zaharia	146a18c2a4	Merge branch 'master' into scala-2.9	2011-07-14 17:29:17 -04:00
Matei Zaharia	c8eb8b2b90	Set class loader for remote actors to fix a bug that happens in 2.9	2011-07-14 17:29:11 -04:00
Matei Zaharia	8ea67307b9	Merge branch 'master' into scala-2.9	2011-07-14 14:47:12 -04:00
Matei Zaharia	e4c3402d2d	Renamed ParallelArray to ParallelCollection	2011-07-14 14:47:01 -04:00
Matei Zaharia	9ac461d85d	Remove RDD.toString because it looked confusing	2011-07-14 14:39:32 -04:00
Matei Zaharia	797b4547c3	Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter	2011-07-14 14:08:34 -04:00
Matei Zaharia	3efd9e94d8	Merge branch 'master' into scala-2.9	2011-07-14 12:42:57 -04:00
Matei Zaharia	0ccfe20755	Forgot to add a file	2011-07-14 12:42:50 -04:00
Matei Zaharia	38f38dda5b	Merge branch 'master' into scala-2.9	2011-07-14 12:42:02 -04:00
Matei Zaharia	969644df8e	Cleaned up a few issues to do with default parallelism levels. Also renamed HadoopFileWriter to HadoopWriter (since it's not only for files) and fixed a bug for lookup().	2011-07-14 12:40:56 -04:00
Matei Zaharia	2fb906e8e5	Merge branch 'master' into scala-2.9	2011-07-14 00:20:14 -04:00
Matei Zaharia	2604939f64	Simplified and documented code a little and added test	2011-07-14 00:19:00 -04:00
Matei Zaharia	2439e51a03	Merge branch 'master' into implicit-sequencefile	2011-07-13 23:20:22 -04:00
Matei Zaharia	d0c7958364	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/HadoopFileWriter.scala	2011-07-13 23:09:33 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	da8a3b8926	Increase default value of spark.locality.wait a little	2011-07-13 20:07:24 -04:00
Matei Zaharia	080869c6ef	Merge branch 'master' into scala-2.9	2011-07-13 00:20:08 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Matei Zaharia	9b568d37f7	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/RDD.scala	2011-07-11 22:25:53 -04:00
Matei Zaharia	d05fea24f3	Simplified parallel shuffle fetcher to use URLConnection	2011-07-11 22:12:36 -04:00
Matei Zaharia	25c3a7781c	Moved PairRDD and SequenceFileRDD functions to separate source files	2011-07-10 00:06:15 -04:00
Matei Zaharia	b7f1f62ff5	bug fix	2011-07-09 18:53:02 -04:00
Matei Zaharia	003480f374	Register byte[] with Kryo serializer	2011-07-09 18:08:07 -04:00
Matei Zaharia	aea5cb4413	Added parallel shuffle fetcher	2011-07-09 17:25:56 -04:00
Matei Zaharia	4b1646a25f	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:55 -04:00
Matei Zaharia	07a97d47c2	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:34 -04:00
Matei Zaharia	3488c386a9	Initial work to make stuff like sequenceFile[Int, Int] work without requiring the user to provide a Writable type. The approach here might not be the best but it seems to work correctly.	2011-06-28 17:07:04 -07:00
Matei Zaharia	5633299ec6	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-27 22:50:59 -07:00
Matei Zaharia	b0ecf1ee41	Don't pass a null context when running tasks locally	2011-06-27 22:50:43 -07:00
Matei Zaharia	85cad5d9dd	Fixed HadoopFileWriter to compile for Scala 2.9	2011-06-27 22:44:14 -07:00
Matei Zaharia	393607d5ef	Merge branch 'master' into scala-2.9	2011-06-27 18:08:25 -07:00
Matei Zaharia	2f652f1656	Fix a compile error	2011-06-27 18:07:16 -07:00
Tathagata Das	3f08e1129f	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/SparkContext.scala	2011-06-27 13:43:44 -07:00
Tathagata Das	ad842ac823	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/RDD.scala	2011-06-27 13:39:11 -07:00
Matei Zaharia	bae8a97968	Merge branch 'master' into scala-2.9 Conflicts: repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala	2011-06-26 19:22:27 -07:00
Matei Zaharia	c4dd68ae21	Merge branch 'mos-bt' This merge keeps only the broadcast work in mos-bt because the structure of shuffle has changed with the new RDD design. We still need some kind of parallel shuffle but that will be added later. Conflicts: core/src/main/scala/spark/BitTorrentBroadcast.scala core/src/main/scala/spark/ChainedBroadcast.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala core/src/main/scala/spark/shuffle/DfsShuffle.scala	2011-06-26 18:22:12 -07:00
Tathagata Das	38f2ba99cc	Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles. 1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type) 2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options 3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile 4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile 5> SparkContext.objectFile() opens the saved ObjectFiles	2011-06-24 19:51:21 -07:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Tathagata Das	3d2befe831	Improved HadoopFileWriter (saves key and value classes to jobconf)	2011-06-23 08:11:22 -07:00
Olivier Grisel	005d1605a4	add missing test for RDD.groupWith	2011-06-23 02:10:52 +02:00
Matei Zaharia	214250016a	Added simple version of lookup	2011-06-20 11:59:16 -07:00
Matei Zaharia	23b42af70a	Merge branch 'master' into scala-2.9	2011-06-19 23:06:21 -07:00
Matei Zaharia	23b1c309fb	Added pipe() operation on RDDs for mapping through a shell command.	2011-06-19 23:05:19 -07:00
Tathagata Das	b5e6645505	Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext. 1> HadoopFileWriter works correctly with task failures 2> It can also take an user specified JobConf object for configuration settings 3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class 4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter	2011-06-16 20:57:57 -07:00
Tathagata Das	869836a2fa	Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task	2011-06-10 19:47:28 -07:00
Tathagata Das	389e56156f	HadoopFileWriter changed to use Hadoop's OutputCommitter	2011-06-09 15:29:22 -07:00
Tathagata Das	24d845833c	First-cut implementation of RDD.SaveAsText	2011-06-05 04:14:43 -07:00
Matei Zaharia	3297706ab2	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-01 11:46:31 -07:00
Matei Zaharia	9bb448a151	Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57 .	2011-06-01 11:45:47 -07:00
Matei Zaharia	850fe3274e	Make the runJob API public. Fixes #56 .	2011-06-01 11:38:44 -07:00
Ismael Juma	82f10bd794	Remove unnecessary toStream calls.	2011-06-01 16:12:42 +01:00
Matei Zaharia	10fe324845	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-05-31 23:48:11 -07:00
Matei Zaharia	5166d76843	Ensure logging is initialized before spawning any threads to fix issue #45	2011-05-31 23:47:32 -07:00
Matei Zaharia	0afd35a8dd	Some docs in ClosureCleaner	2011-05-31 22:06:30 -07:00
Matei Zaharia	8b0390d344	Instantiate NullWritable properly in HadoopFile	2011-05-30 23:54:14 -07:00
Matei Zaharia	4096c2287e	Various fixes	2011-05-29 18:46:01 -07:00
Matei Zaharia	ef706ae959	Merge branch 'master' into new-rdds-protobuf Conflicts: run	2011-05-29 16:20:23 -07:00
Matei Zaharia	c501cff924	Executor was looking for the wrong constructor for ExecutorClassLoader	2011-05-29 16:15:59 -07:00
Ismael Juma	1396678baa	Move REPL classes to separate module.	2011-05-27 11:22:50 +01:00
Ismael Juma	051da8b4ad	Delete liblzf from lib as it's no longer used.	2011-05-27 11:22:10 +01:00
Ismael Juma	ae1a1f91f1	Remove several dependencies from git and configure them as SBT managed dependencies. Upgrade some of the dependencies while at it.	2011-05-27 11:22:01 +01:00
Ismael Juma	164ef4c751	Use explicit asInstanceOf instead of misleading unchecked pattern matching. Also enable -unchecked warnings in SBT build file.	2011-05-27 07:57:10 +01:00
Ismael Juma	89c8ea2bb2	Replace deprecated `-` and `--` with suggested filterNot (which is uglier).	2011-05-26 22:22:37 +01:00
Ismael Juma	94f05683bd	Replace deprecated `first` with `head`.	2011-05-26 22:13:41 +01:00
Ismael Juma	0b6a862b68	Use math instead of Math as the latter is deprecated.	2011-05-26 22:06:36 +01:00
Ismael Juma	1f27d94c48	Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.	2011-05-26 22:04:42 +01:00
Ismael Juma	1993a8e556	Use += instead of + for mutable sequences as the latter is deprecated.	2011-05-26 21:59:48 +01:00
root	5ef938615f	Initial work on making stuff compile with protobuf Mesos	2011-05-24 22:27:08 +00:00
Matei Zaharia	cec427e777	Fixed a bug with preferred locations having changed meaning in new RDDs	2011-05-22 17:12:29 -07:00
Matei Zaharia	4c888b2933	Fix queue type for executor	2011-05-22 16:42:05 -07:00
Matei Zaharia	bea3a33012	doc tweak	2011-05-22 16:03:41 -07:00
Matei Zaharia	9bde5a54cb	class loader fix	2011-05-22 16:00:41 -07:00
Matei Zaharia	91c07a33d9	Various fixes to serialization	2011-05-21 22:50:08 -07:00
Matei Zaharia	f61b61c4ac	Merge branch 'master' into new-rdds	2011-05-21 21:25:58 -07:00
Matei Zaharia	24a1e7f838	Scheduler can now recover from lost map outputs	2011-05-20 00:19:53 -07:00
Matei Zaharia	82329b0b28	Updated scheduler to support running on just some partitions of final RDD	2011-05-19 12:47:09 -07:00
Matei Zaharia	328e51b693	Various minor fixes	2011-05-19 11:19:25 -07:00
Matei Zaharia	fd1d255821	Stop objectifying various trackers, caches, etc.	2011-05-17 12:41:13 -07:00
Matei Zaharia	4db50e26c7	Fixed unit tests by making them clean up the SparkContext after use and thus clean up the various singletons (RDDCache, MapOutputTracker, etc). This isn't perfect yet (ideally we shouldn't use singleton objects at all) but we can fix that later.	2011-05-13 12:03:58 -07:00
Matei Zaharia	aca8150c52	Ensure that AddedToCache messages make it home before tasks finish	2011-05-13 11:43:52 -07:00
Matei Zaharia	16c886a581	Optimization for count()	2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury	db7a2c4897	Issue #42 fixed.	2011-04-28 14:30:48 -07:00
Ankur Dave	a4c04f3f6f	Error handling for disk I/O in DiskSpillingCache Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.	2011-04-27 23:23:29 -07:00
Ankur Dave	12ff0d2dc3	Bring an entry back into memory after fetching it from disk	2011-04-27 22:59:05 -07:00
Ankur Dave	e30313aa2c	Added DiskSpillingCache DiskSpillingCache is a BoundedMemoryCache that spills entries to disk when it runs out of space. Currently the implementation is very simple. In particular, it's missing the following features: - Error handling for disk I/O, including checking of disk space levels - Bringing an entry back into memory after fetching it from disk In addition, here are some features that aren't critical but should be implemented soon: - Spilling based on a user-set priority in addition to LRU - Caching into a subdirectory of spark.DiskSpillingCache.cacheDir rather than the root directory	2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury	60d1121343	Refactoring: daemonThreadFactories have all been moved to the Utils object instead of having multiple copies in Broadcast and Shuffle objects.	2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury	e898e108a3	Cleanup + refactoring...	2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury	0567646180	Shuffle is also working from its own subpackage.	2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury	2742de707a	Removed some shuffle implementations. Remaining ones all use local files to write map outputs.	2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury	9d78779257	Merge branch 'mos-shuffle-tracked' into mos-bt Conflicts: core/src/main/scala/spark/Broadcast.scala	2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury	ac7e066383	Merge branch 'master' into mos-shuffle-tracked Conflicts: .gitignore core/src/main/scala/spark/LocalFileShuffle.scala src/scala/spark/BasicLocalFileShuffle.scala src/scala/spark/Broadcast.scala src/scala/spark/LocalFileShuffle.scala	2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury	4e4c41026c	Added support for custom classes. (from 49ea48)	2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury	65848da8df	Refacoring...	2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury	b8ab7862b8	Moved broadcast-related code to separate directory under spark.broadcast package.	2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury	e31007248c	Merge branch 'master' into mos-bt	2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury	9257a55e3a	Refactoring...	2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury	9d2d533493	Temporary fix for issue #42 .	2011-04-21 17:40:26 -07:00
Timothy Hunter	5c9535228a	fixed small bug when classpath has some strange formatting	2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury	a8f47a62b9	Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots respectively for clarity (most probably they were misunderstood and misused)	2011-04-13 16:24:19 -07:00
Matei Zaharia	94ba95bcb2	Added flatMapValues	2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury	b67a968b5d	hasBlocks is now AtomicInteger (even though it was ok)	2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury	5bf3c83b13	BroadcastSuperTracker (right now for BT) is contacted over TCP instead of direct procedure call. Need to do the same for others and consolidate all broadcast mechanisms.	2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury	733a130108	Formatting...	2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury	4636aea598	Formatting...	2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury	addd569e52	Each broadcasted variable can have different blockSize. Corresponding logic to adapt blockSize based on network condition is not yet implemented. Formatting + consolidation.	2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury	815f3411ec	Consolidated Broadcast config params.	2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury	a18a28b08e	Removed gossip-related code that were already commented out. More formatting.	2011-03-30 14:22:09 -07:00
Mosharaf Chowdhury	43aceafd70	Formatting...	2011-03-30 12:18:50 -07:00
Mosharaf Chowdhury	73b165220d	Random is the default choice; rarestFirst didn't work well in experiments.	2011-03-29 13:06:43 -07:00
Matei Zaharia	d840fa8d0c	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-09 00:40:07 -08:00
root	ff5b13799a	Some tweaks to make Kryo cache work better	2011-03-09 03:31:50 -05:00
Matei Zaharia	7febdfbe29	Better reuse of buffers in Kryo serialization	2011-03-08 12:36:36 -08:00
Matei Zaharia	8ee3ec29ee	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-08 11:58:19 -08:00
Matei Zaharia	7408230bfa	Updated modified Kryo to use objenesis	2011-03-08 11:58:08 -08:00
Matei Zaharia	ab1216cb14	Register None and Nil properly	2011-03-08 11:52:58 -08:00
Matei Zaharia	d39f5dd15e	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-08 10:28:50 -08:00
Matei Zaharia	4f0d0a7b73	stuff	2011-03-08 10:28:26 -08:00
Matei Zaharia	8b6f3db415	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-07 19:20:28 -08:00
Matei Zaharia	38f6bce33d	Added SerializingCache	2011-03-07 19:16:24 -08:00
Matei Zaharia	6316c7979d	Remove some logging	2011-03-07 18:56:36 -08:00
Matei Zaharia	e7b4b047a6	Added pluggable serializers and Kryo serialization	2011-03-07 18:41:53 -08:00
Matei Zaharia	467f056e29	Remove commented code	2011-03-06 23:38:41 -08:00
Matei Zaharia	bce95b8458	Finished cogroup stuff	2011-03-06 23:38:16 -08:00
Matei Zaharia	04c2d6a60c	stuff	2011-03-06 19:27:03 -08:00
Matei Zaharia	0fb691dd28	Various fixes to get MesosScheduler working with new RDDs	2011-03-06 16:16:38 -08:00
Matei Zaharia	1df5a65a01	Pass cache locations correctly to DAGScheduler.	2011-03-06 12:16:38 -08:00
Matei Zaharia	e1436f1eaa	Merge remote branch 'origin/master' into new-rdds	2011-03-06 11:11:47 -08:00
Matei Zaharia	370b95816f	Added sampling for large arrays in SizeEstimator	2011-03-06 11:11:20 -08:00
Matei Zaharia	a789e9aaea	Merge remote branch 'origin/master' into new-rdds	2011-03-01 10:33:37 -08:00
Matei Zaharia	021c50a8d4	Remove unnecessary lock which was there to work around a bug in Configuration in Hadoop 0.20.0	2011-03-01 10:28:38 -08:00
Matei Zaharia	adaba4d550	Removed old slf4j jars that came with Hadoop	2011-03-01 10:28:21 -08:00
Matei Zaharia	447debb771	Updated Hadoop to 0.20.2 to include some bug fixes	2011-03-01 10:27:48 -08:00
Matei Zaharia	9e59afd710	More work on new RDD design	2011-02-27 19:15:52 -08:00
Matei Zaharia	f38f86d59e	More stuff	2011-02-27 14:27:12 -08:00
Matei Zaharia	2e6023f2bf	stuff	2011-02-26 23:41:44 -08:00
Matei Zaharia	309367c477	Initial work towards new RDD design	2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury	0416cc22d2	Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size)	2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury	cf81da9485	Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers.	2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury	2b946fb2d1	pickBlockRarestFirst and gossips commented OUT for now. Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around. Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.	2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury	ca2895ebb0	Fix in rarestFirst implemenation. If there are more than one rarest blocks, pick randomly between them (was deterministic before)	2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury	520bbdc7e3	Peers now gossip about their neighbors when they talk.	2011-02-10 20:15:30 -08:00
Matei Zaharia	dc24aecd8f	Close record readers in HadoopFile after finishing a split	2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury	441462bc7f	Fixed some warnings during compilation.	2011-02-09 12:11:43 -08:00
Mosharaf Chowdhury	1a73c0d265	Merged with master. Using sbt.	2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury	495b38658e	Merge branch 'master' into mos-bt	2011-02-09 10:40:23 -08:00
Matei Zaharia	99f3f23efa	Changed default shuffle to LocalFileShuffle because it's way faster for small files	2011-02-08 17:03:03 -08:00
Matei Zaharia	ec28b607fd	Merge branch 'master' into sbt Conflicts: Makefile core/src/main/java/spark/compress/lzf/LZF.java core/src/main/java/spark/compress/lzf/LZFInputStream.java core/src/main/java/spark/compress/lzf/LZFOutputStream.java core/src/main/native/spark_compress_lzf_LZF.c run	2011-02-02 00:25:54 -08:00
Matei Zaharia	e5c4cd8a5e	Made examples and core subprojects	2011-02-01 15:11:08 -08:00

... 19 20 21 22 23 ...

1243 Коммитов