Matei Zaharia
32a4f4623c
Merge pull request #129 from mesos/rxin
...
Force serialize/deserialize task results in local execution mode.
2012-04-24 16:18:39 -07:00
Reynold Xin
761ea65a98
Added a test for the previous commit (failing to serialize task results
...
would throw an exception for local tasks).
2012-04-24 15:14:35 -07:00
Reynold Xin
9821cd4d42
Force serialize/deserialize task results in local execution mode.
2012-04-24 14:55:28 -07:00
Antonio
3e48818993
Removed commented-out System.exit call
2012-04-23 11:42:58 -07:00
Antonio
39d99168dc
Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions
2012-04-20 14:46:43 -07:00
Reynold Xin
e601b3b9e5
Added the ability to set environmental variables in piped rdd.
2012-04-17 16:40:56 -07:00
Matei Zaharia
3b745176e0
Bug fix to pluggable closure serialization change
2012-04-12 17:53:02 +00:00
Matei Zaharia
112655f032
Merge pull request #121 from rxin/kryo-closure
...
Added an option (spark.closure.serializer) to specify the serializer for closures.
2012-04-10 14:21:02 -07:00
Reynold Xin
d295ccb43c
Added a closureSerializer field in SparkEnv and use it to serialize
...
tasks.
2012-04-10 13:29:46 -07:00
Reynold Xin
968f75f6af
Added an option (spark.closure.serializer) to specify the serializer for
...
closures. This enables using Kryo as the closure serializer.
2012-04-09 21:59:56 -07:00
Matei Zaharia
a69c0738d1
Merge branch 'master' into mesos-0.9
2012-04-08 23:41:36 -07:00
Matei Zaharia
a633974143
Merge branch 'master' of github.com:mesos/spark
2012-04-08 23:41:25 -07:00
Matei Zaharia
0229d5390f
Merge branch 'master' into mesos-0.9
2012-04-08 23:39:37 -07:00
Matei Zaharia
d401e1b3e8
Fix a possible deadlock in MesosScheduler
2012-04-08 23:38:49 -07:00
Ankur Dave
7be1c7b331
Report entry dropping in BoundedMemoryCache
2012-04-06 15:49:32 -07:00
Matei Zaharia
a8bb324ed9
Merge branch 'master' into mesos-0.9
2012-04-05 14:53:22 -07:00
Matei Zaharia
816d4e5840
Pass local IP address instead of hostname in spark.master.host. Fixes #117 .
2012-04-05 14:53:17 -07:00
Matei Zaharia
335a6036ad
Converted some tabs to spaces
2012-04-05 11:58:01 -07:00
Matei Zaharia
8c95a85438
Use Runtime.maxMemory instead of Runtime.totalMemory in
...
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:39:35 -04:00
Matei Zaharia
03d5b3b48d
Use Runtime.maxMemory instead of Runtime.totalMemory in
...
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:38:19 -04:00
Matei Zaharia
95fb1a16b8
Use Mesos 0.9 RC3 JAR and protobuf 2.4.1
2012-03-30 11:38:49 -04:00
Matei Zaharia
dfa3b6b544
Fixes to work with the very latest Mesos 0.9 API
2012-03-29 22:12:35 -04:00
Matei Zaharia
4d52cc6738
Merge branch 'master' into mesos-0.9
2012-03-29 21:29:39 -04:00
Reynold Xin
42dcdbcb2f
Removed the extra spaces in OrderedRDDFunctions and SortedRDD.
2012-03-29 15:21:57 -07:00
Matei Zaharia
08cda89e8a
Further fixes to how Mesos is found and used
2012-03-17 13:39:14 -07:00
Matei Zaharia
3c3fdf6eca
Merge branch 'master' into mesos-0.9
2012-03-17 13:09:21 -07:00
Matei Zaharia
c7af538ac1
Some fixes to sorting for when the RDD has fewer elements than the
...
number of partitions we ask to partition it into. Also, removed a test
that was taking way too long to run.
2012-03-17 13:08:36 -07:00
Matei Zaharia
a099a63a8a
Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0
2012-03-17 12:31:34 -07:00
Matei Zaharia
a5e2b6a6bd
Merge pull request #112 from cengle/master
...
Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection
2012-03-06 13:38:32 -08:00
Matei Zaharia
97eee50825
Fixes a nasty bug that could happen when tasks fail, because calling
...
wait() with a timeout of 0 on a Java object means "wait forever".
2012-03-01 13:43:17 -08:00
Cliff Engle
dd68cb6099
Get key and value container from RecordReader
2012-02-29 16:33:23 -08:00
Matei Zaharia
1e10df0a46
Merge pull request #111 from alupher/master
...
Adding sorting to RDDs
2012-02-24 15:50:14 -08:00
Antonio
0d93d95bcf
Removed unnecessary import
2012-02-21 19:57:12 -08:00
Antonio
2990298f71
Added sorting testing suite
2012-02-21 19:54:21 -08:00
Matei Zaharia
aa04f87cd2
Added support for parallel execution of jobs in DAGScheduler.
2012-02-19 22:50:23 -08:00
Antonio
620798161b
Added fixes to sorting
2012-02-13 00:07:39 -08:00
Matei Zaharia
2587ce1690
Fixed a deadlock that occured with MesosScheduler due to an earlier
...
synchronization change
2012-02-11 21:22:45 -08:00
Antonio
e93f622665
Added sorting by key for pair RDDs
2012-02-11 00:56:28 -08:00
Matei Zaharia
98f008b721
Formatting fixes
2012-02-10 10:52:03 -08:00
Matei Zaharia
7660a8b12f
Merge branch 'formatting'
...
Conflicts:
core/src/main/scala/spark/DAGScheduler.scala
core/src/main/scala/spark/SimpleShuffleFetcher.scala
core/src/main/scala/spark/SparkContext.scala
2012-02-10 10:42:14 -08:00
haoyuan
194c42ab79
Code format.
2012-02-10 08:19:53 -08:00
Matei Zaharia
8f5ed51234
Delete Spark's temporary directories when the JVM exits.
2012-02-09 22:58:24 -08:00
Matei Zaharia
c0a0df3285
Made the default cache BoundedMemoryCache, and reduced its default size
2012-02-09 22:32:02 -08:00
Matei Zaharia
a766780f4c
Added some tests for multithreaded access to Spark.
2012-02-09 22:27:53 -08:00
Matei Zaharia
0e93891d3d
Replaced LocalFileShuffle with a non-singleton ShuffleManager class
...
and made DAGScheduler automatically set SparkEnv.
2012-02-09 22:14:56 -08:00
haoyuan
445e0bb1b5
Format the code a bit mroe.
2012-02-09 15:50:26 -08:00
haoyuan
651932e703
Format the code as coding style agreed by Matei/TD/Haoyuan
2012-02-09 13:26:23 -08:00
Matei Zaharia
e02dc83a5b
IO optimizations
2012-02-06 20:40:39 -08:00
Matei Zaharia
c40e766368
Use java.util.HashMap in shuffles
2012-02-06 19:20:25 -08:00
Matei Zaharia
b267175ab5
Synchronization fix in case SparkContext is used from multiple threads.
2012-02-06 14:28:18 -08:00
Matei Zaharia
43a3335090
Simplifying test
2012-02-05 22:46:51 -08:00
Hiral Patel
b47952342e
Add register immutable map to kryo serializer
2012-01-26 15:24:20 -08:00
Matei Zaharia
fabcc82528
Merge pull request #103 from edisontung/master
...
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia
fd5581a0d3
Fixed a failure recovery bug and added some tests for fault recovery.
2012-01-13 19:17:27 -08:00
Matei Zaharia
eb05154b7a
Fixed a failure recovery bug and added some tests for fault recovery.
2012-01-13 19:08:25 -08:00
Edison Tung
1ecc221f84
Fixed bugs
...
I've fixed the bugs detailed in the diff. One of the bugs was already
fixed on the local file (forgot to commit).
2012-01-09 11:59:52 -08:00
Matei Zaharia
e269f6f7ea
Register RDDs with the MapOutputTracker even if they have no partitions.
...
Fixes #105 .
2012-01-05 15:59:20 -05:00
Matei Zaharia
3034fc0d91
Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'
2011-12-14 18:19:43 +01:00
Matei Zaharia
6a650cbbdf
Make Spark port default to 7077 so that it's not an ephemeral port that might be taken
2011-12-14 18:18:22 +01:00
Matei Zaharia
735843a049
Merge remote-tracking branch 'origin/charles-newhadoop'
2011-12-02 21:59:30 -08:00
Charles Reiss
66f05f383e
Add new Hadoop API reading support.
2011-12-01 14:02:10 -08:00
Charles Reiss
02d43e6986
Add new Hadoop API writing support.
2011-12-01 14:01:28 -08:00
Edison Tung
42f8847a21
Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD
2011-12-01 13:43:25 -08:00
Edison Tung
de01b6deaa
Fixed bug in RDD
...
Math.min takes 2 args, not 1. This was not committed earlier for some
reason
2011-12-01 13:34:37 -08:00
Matei Zaharia
22b8fcf632
Added fold() and aggregate() operations that reuse an object to
...
merge results into rather than requiring a new object allocation
for each element merged. Fixes #95 .
2011-11-30 11:37:47 -08:00
Matei Zaharia
09dd58b3a7
Send SPARK_JAVA_OPTS to slave nodes.
2011-11-30 11:34:58 -08:00
Edison Tung
a3bc012af8
added takeSamples method
...
takeSamples method takes a specified number of samples from the RDD and
outputs it in an array.
2011-11-21 16:38:44 -08:00
Ankur Dave
ad4ebff42c
Deduplicate exceptions when printing them
...
The first time they appear, exceptions are printed in full, including
a stack trace. After that, they are printed in abbreviated form. They
are periodically reprinted in full; the reprint interval defaults to 5
seconds and is configurable using the property
spark.logging.exceptionPrintInterval.
2011-11-14 01:54:53 +00:00
Ankur Dave
35b6358a7c
Report errors in tasks to the driver via a Mesos status update
...
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.
Here's what the reporting currently looks like:
# ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
[...]
11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
[...]
11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia
07532021fe
Bug fix: reject offers that we didn't find any tasks for
2011-11-08 23:05:54 -08:00
Matei Zaharia
9e4c79a4d3
Closure cleaner unit test
2011-11-08 00:40:15 -08:00
Matei Zaharia
f346e64637
Updates to the closure cleaner to work better with closures in classes.
...
Before, the cleaner attempted to clone $outer objects that were classes
(as opposed to nested closures) and preserve only their used fields,
which was bad because it would miss fields that are accessed indirectly
by methods, and in general it would confuse user code. Now we keep a
reference to those objects without cloning them. This is not perfect
because the user still needs to be careful of what they'll carry along
into closures, but it works better in some cases that seemed confusing
before. We need to improve the documentation on what variables get
passed along with a closure and possibly add some debugging tools for it
as well.
Fixes #71 -- that code now works in the REPL.
2011-11-08 00:33:28 -08:00
Matei Zaharia
c2b7fd6899
Make parallelize() work efficiently for ranges of Long, Double, etc
...
(splitting them into sub-ranges). Fixes #87 .
2011-11-02 15:16:02 -07:00
Matei Zaharia
157279e9eb
Update Spark to work with the latest Mesos API
2011-10-30 14:10:56 -07:00
root
3a0e6c4363
Miscellaneous fixes:
...
- Executor should initialize logging properly
- groupByKey should allow custom partitioner
2011-10-17 18:07:35 +00:00
root
62aa820084
Merge branch 'ankur-master'
2011-10-14 02:14:07 +00:00
Ankur Dave
2d7057bf5d
Implement PairRDDFunctions.partitionBy
2011-10-09 15:52:09 -07:00
Ankur Dave
06637cb69e
Fix PairRDDFunctions.groupWith partitioning
...
This commit fixes a bug in groupWith that was causing it to destroy
partitioning information. It replaces a call to map with a call to
mapValues, which preserves partitioning.
2011-10-09 15:48:46 -07:00
Ankur Dave
2911a783d6
Add custom partitioner support to PairRDDFunctions.combineByKey
2011-10-09 15:47:20 -07:00
Ankur Dave
6c6e47e3cd
Use BufferedOutputStream in ShuffleMapTask
2011-10-09 15:43:31 -07:00
Matei Zaharia
1069740264
Added a jarOfObject method to get the JAR of the class that an object
...
belongs to, which seems like a more common case.
2011-08-29 23:27:10 -07:00
Matei Zaharia
0aa23bf17e
Added a convenience method for getting the JAR file that loaded a class
...
(useful for jobs to pass their own JAR files to SparkContext).
2011-08-29 22:59:44 -07:00
Matei Zaharia
a161f00610
Made a log message slightly less ugly
2011-08-27 16:58:54 -07:00
Matei Zaharia
3759bcd061
New mesos.jar
2011-08-10 14:03:48 -07:00
Matei Zaharia
c22043f150
Minor fix: can use >= when checking memory
2011-08-02 19:11:17 -07:00
Ismael Juma
6ff57f5594
Use scala.math instead of Math as the latter is deprecated.
2011-08-02 10:25:47 +01:00
Ismael Juma
620de2dd1d
Change currentThread to Thread.currentThread as the former is deprecated.
2011-08-02 10:25:16 +01:00
Ismael Juma
0fba22b3d2
Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch
...
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia
711575391d
Merge branch 'scala-2.9'
...
Conflicts:
project/build/SparkProject.scala
2011-08-01 15:25:26 -07:00
Matei Zaharia
4050d661c5
Updated to newest Mesos API, which includes better memory accounting
...
by specifying per-executor memory.
2011-08-01 13:54:48 -07:00
Matei Zaharia
d12122502b
Various improvements to Kryo serializer:
...
- Replaced modified Kryo version with the standard one augmented with
the kryo-serializers package, which includes support for classes with
no-arg constructors (that was why we had a modified Kryo before)
- The kryo-serializers version also fixes issue #72 .
- Added a bunch of tests.
- Serialize maps and a few other common types properly by default.
2011-07-21 22:09:33 -07:00
Matei Zaharia
baa72e2747
Removed a debug statement that slipped in as a println
2011-07-21 16:09:33 -07:00
Matei Zaharia
2bfd7931e8
Merge branch 'new-rdds-protobuf'
...
Conflicts:
core/src/main/scala/spark/Executor.scala
core/src/main/scala/spark/RDD.scala
2011-07-21 16:08:39 -07:00
Matei Zaharia
1450fd74d9
Merge branch 'master' into scala-2.9
2011-07-14 17:37:24 -04:00
Matei Zaharia
ccf48388cd
Lowered default number of splits for files
2011-07-14 17:37:04 -04:00
Matei Zaharia
146a18c2a4
Merge branch 'master' into scala-2.9
2011-07-14 17:29:17 -04:00
Matei Zaharia
c8eb8b2b90
Set class loader for remote actors to fix a bug that happens in 2.9
2011-07-14 17:29:11 -04:00
Matei Zaharia
8ea67307b9
Merge branch 'master' into scala-2.9
2011-07-14 14:47:12 -04:00
Matei Zaharia
e4c3402d2d
Renamed ParallelArray to ParallelCollection
2011-07-14 14:47:01 -04:00
Matei Zaharia
9ac461d85d
Remove RDD.toString because it looked confusing
2011-07-14 14:39:32 -04:00
Matei Zaharia
797b4547c3
Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter
2011-07-14 14:08:34 -04:00
Matei Zaharia
3efd9e94d8
Merge branch 'master' into scala-2.9
2011-07-14 12:42:57 -04:00
Matei Zaharia
0ccfe20755
Forgot to add a file
2011-07-14 12:42:50 -04:00
Matei Zaharia
38f38dda5b
Merge branch 'master' into scala-2.9
2011-07-14 12:42:02 -04:00
Matei Zaharia
969644df8e
Cleaned up a few issues to do with default parallelism levels. Also
...
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia
2fb906e8e5
Merge branch 'master' into scala-2.9
2011-07-14 00:20:14 -04:00
Matei Zaharia
2604939f64
Simplified and documented code a little and added test
2011-07-14 00:19:00 -04:00
Matei Zaharia
2439e51a03
Merge branch 'master' into implicit-sequencefile
2011-07-13 23:20:22 -04:00
Matei Zaharia
d0c7958364
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia
9c0069188b
Updated save code to allow non-file-based OutputFormats and added a test
...
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia
da8a3b8926
Increase default value of spark.locality.wait a little
2011-07-13 20:07:24 -04:00
Matei Zaharia
080869c6ef
Merge branch 'master' into scala-2.9
2011-07-13 00:20:08 -04:00
Matei Zaharia
842e14d567
Added mapPartitions operation and a bunch of tests for RDD ops
2011-07-13 00:19:52 -04:00
Matei Zaharia
9b568d37f7
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia
d05fea24f3
Simplified parallel shuffle fetcher to use URLConnection
2011-07-11 22:12:36 -04:00
Matei Zaharia
25c3a7781c
Moved PairRDD and SequenceFileRDD functions to separate source files
2011-07-10 00:06:15 -04:00
Matei Zaharia
b7f1f62ff5
bug fix
2011-07-09 18:53:02 -04:00
Matei Zaharia
003480f374
Register byte[] with Kryo serializer
2011-07-09 18:08:07 -04:00
Matei Zaharia
aea5cb4413
Added parallel shuffle fetcher
2011-07-09 17:25:56 -04:00
Matei Zaharia
4b1646a25f
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:55 -04:00
Matei Zaharia
07a97d47c2
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:34 -04:00
Matei Zaharia
3488c386a9
Initial work to make stuff like sequenceFile[Int, Int] work without
...
requiring the user to provide a Writable type. The approach here might
not be the best but it seems to work correctly.
2011-06-28 17:07:04 -07:00
Matei Zaharia
5633299ec6
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-06-27 22:50:59 -07:00
Matei Zaharia
b0ecf1ee41
Don't pass a null context when running tasks locally
2011-06-27 22:50:43 -07:00
Matei Zaharia
85cad5d9dd
Fixed HadoopFileWriter to compile for Scala 2.9
2011-06-27 22:44:14 -07:00
Matei Zaharia
393607d5ef
Merge branch 'master' into scala-2.9
2011-06-27 18:08:25 -07:00
Matei Zaharia
2f652f1656
Fix a compile error
2011-06-27 18:07:16 -07:00
Tathagata Das
3f08e1129f
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das
ad842ac823
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia
bae8a97968
Merge branch 'master' into scala-2.9
...
Conflicts:
repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Matei Zaharia
c4dd68ae21
Merge branch 'mos-bt'
...
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.
Conflicts:
core/src/main/scala/spark/BitTorrentBroadcast.scala
core/src/main/scala/spark/ChainedBroadcast.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das
38f2ba99cc
Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
...
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel
2e3531d8bf
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 11:00:51 +02:00
Tathagata Das
3d2befe831
Improved HadoopFileWriter (saves key and value classes to jobconf)
2011-06-23 08:11:22 -07:00
Olivier Grisel
005d1605a4
add missing test for RDD.groupWith
2011-06-23 02:10:52 +02:00
Matei Zaharia
214250016a
Added simple version of lookup
2011-06-20 11:59:16 -07:00
Matei Zaharia
23b42af70a
Merge branch 'master' into scala-2.9
2011-06-19 23:06:21 -07:00
Matei Zaharia
23b1c309fb
Added pipe() operation on RDDs for mapping through a shell command.
2011-06-19 23:05:19 -07:00
Tathagata Das
b5e6645505
Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
...
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das
869836a2fa
Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task
2011-06-10 19:47:28 -07:00
Tathagata Das
389e56156f
HadoopFileWriter changed to use Hadoop's OutputCommitter
2011-06-09 15:29:22 -07:00
Tathagata Das
24d845833c
First-cut implementation of RDD.SaveAsText
2011-06-05 04:14:43 -07:00
Matei Zaharia
3297706ab2
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-06-01 11:46:31 -07:00
Matei Zaharia
9bb448a151
Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57 .
2011-06-01 11:45:47 -07:00
Matei Zaharia
850fe3274e
Make the runJob API public. Fixes #56 .
2011-06-01 11:38:44 -07:00
Ismael Juma
82f10bd794
Remove unnecessary toStream calls.
2011-06-01 16:12:42 +01:00
Matei Zaharia
10fe324845
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-05-31 23:48:11 -07:00
Matei Zaharia
5166d76843
Ensure logging is initialized before spawning any threads to fix issue #45
2011-05-31 23:47:32 -07:00
Matei Zaharia
0afd35a8dd
Some docs in ClosureCleaner
2011-05-31 22:06:30 -07:00
Matei Zaharia
8b0390d344
Instantiate NullWritable properly in HadoopFile
2011-05-30 23:54:14 -07:00
Matei Zaharia
4096c2287e
Various fixes
2011-05-29 18:46:01 -07:00
Matei Zaharia
ef706ae959
Merge branch 'master' into new-rdds-protobuf
...
Conflicts:
run
2011-05-29 16:20:23 -07:00
Matei Zaharia
c501cff924
Executor was looking for the wrong constructor for ExecutorClassLoader
2011-05-29 16:15:59 -07:00
Ismael Juma
1396678baa
Move REPL classes to separate module.
2011-05-27 11:22:50 +01:00
Ismael Juma
051da8b4ad
Delete liblzf from lib as it's no longer used.
2011-05-27 11:22:10 +01:00
Ismael Juma
ae1a1f91f1
Remove several dependencies from git and configure them as SBT managed dependencies.
...
Upgrade some of the dependencies while at it.
2011-05-27 11:22:01 +01:00
Ismael Juma
164ef4c751
Use explicit asInstanceOf instead of misleading unchecked pattern matching.
...
Also enable -unchecked warnings in SBT build file.
2011-05-27 07:57:10 +01:00
Ismael Juma
89c8ea2bb2
Replace deprecated `-` and `--` with suggested filterNot (which is uglier).
2011-05-26 22:22:37 +01:00
Ismael Juma
94f05683bd
Replace deprecated `first` with `head`.
2011-05-26 22:13:41 +01:00
Ismael Juma
0b6a862b68
Use math instead of Math as the latter is deprecated.
2011-05-26 22:06:36 +01:00
Ismael Juma
1f27d94c48
Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.
2011-05-26 22:04:42 +01:00
Ismael Juma
1993a8e556
Use += instead of + for mutable sequences as the latter is deprecated.
2011-05-26 21:59:48 +01:00
root
5ef938615f
Initial work on making stuff compile with protobuf Mesos
2011-05-24 22:27:08 +00:00
Matei Zaharia
cec427e777
Fixed a bug with preferred locations having changed meaning in new RDDs
2011-05-22 17:12:29 -07:00
Matei Zaharia
4c888b2933
Fix queue type for executor
2011-05-22 16:42:05 -07:00
Matei Zaharia
bea3a33012
doc tweak
2011-05-22 16:03:41 -07:00
Matei Zaharia
9bde5a54cb
class loader fix
2011-05-22 16:00:41 -07:00
Matei Zaharia
91c07a33d9
Various fixes to serialization
2011-05-21 22:50:08 -07:00
Matei Zaharia
f61b61c4ac
Merge branch 'master' into new-rdds
2011-05-21 21:25:58 -07:00
Matei Zaharia
24a1e7f838
Scheduler can now recover from lost map outputs
2011-05-20 00:19:53 -07:00
Matei Zaharia
82329b0b28
Updated scheduler to support running on just some partitions of final RDD
2011-05-19 12:47:09 -07:00
Matei Zaharia
328e51b693
Various minor fixes
2011-05-19 11:19:25 -07:00
Matei Zaharia
fd1d255821
Stop objectifying various trackers, caches, etc.
2011-05-17 12:41:13 -07:00
Matei Zaharia
4db50e26c7
Fixed unit tests by making them clean up the SparkContext after use and
...
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia
aca8150c52
Ensure that AddedToCache messages make it home before tasks finish
2011-05-13 11:43:52 -07:00
Matei Zaharia
16c886a581
Optimization for count()
2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury
db7a2c4897
Issue #42 fixed.
2011-04-28 14:30:48 -07:00
Ankur Dave
a4c04f3f6f
Error handling for disk I/O in DiskSpillingCache
...
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave
12ff0d2dc3
Bring an entry back into memory after fetching it from disk
2011-04-27 22:59:05 -07:00
Ankur Dave
e30313aa2c
Added DiskSpillingCache
...
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:
- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk
In addition, here are some features that aren't critical but should be
implemented soon:
- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
rather than the root directory
2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury
60d1121343
Refactoring: daemonThreadFactories have all been moved to the Utils
...
object instead of having multiple copies in Broadcast and Shuffle
objects.
2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury
e898e108a3
Cleanup + refactoring...
2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury
0567646180
Shuffle is also working from its own subpackage.
2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury
2742de707a
Removed some shuffle implementations. Remaining ones all use local files
...
to write map outputs.
2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury
9d78779257
Merge branch 'mos-shuffle-tracked' into mos-bt
...
Conflicts:
core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury
ac7e066383
Merge branch 'master' into mos-shuffle-tracked
...
Conflicts:
.gitignore
core/src/main/scala/spark/LocalFileShuffle.scala
src/scala/spark/BasicLocalFileShuffle.scala
src/scala/spark/Broadcast.scala
src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury
4e4c41026c
Added support for custom classes. (from 49ea48)
2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury
65848da8df
Refacoring...
2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury
b8ab7862b8
Moved broadcast-related code to separate directory under spark.broadcast
...
package.
2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury
e31007248c
Merge branch 'master' into mos-bt
2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury
9257a55e3a
Refactoring...
2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury
9d2d533493
Temporary fix for issue #42 .
2011-04-21 17:40:26 -07:00
Timothy Hunter
5c9535228a
fixed small bug when classpath has some strange formatting
2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury
a8f47a62b9
Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots
...
respectively for clarity (most probably they were misunderstood and
misused)
2011-04-13 16:24:19 -07:00
Matei Zaharia
94ba95bcb2
Added flatMapValues
2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury
b67a968b5d
hasBlocks is now AtomicInteger (even though it was ok)
2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury
5bf3c83b13
BroadcastSuperTracker (right now for BT) is contacted over TCP instead
...
of direct procedure call.
Need to do the same for others and consolidate all broadcast mechanisms.
2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury
733a130108
Formatting...
2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury
4636aea598
Formatting...
2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury
addd569e52
Each broadcasted variable can have different blockSize. Corresponding
...
logic to adapt blockSize based on network condition is not yet
implemented.
Formatting + consolidation.
2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury
815f3411ec
Consolidated Broadcast config params.
2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury
a18a28b08e
Removed gossip-related code that were already commented out.
...
More formatting.
2011-03-30 14:22:09 -07:00
Mosharaf Chowdhury
43aceafd70
Formatting...
2011-03-30 12:18:50 -07:00
Mosharaf Chowdhury
73b165220d
Random is the default choice; rarestFirst didn't work well in
...
experiments.
2011-03-29 13:06:43 -07:00
Matei Zaharia
d840fa8d0c
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-09 00:40:07 -08:00
root
ff5b13799a
Some tweaks to make Kryo cache work better
2011-03-09 03:31:50 -05:00
Matei Zaharia
7febdfbe29
Better reuse of buffers in Kryo serialization
2011-03-08 12:36:36 -08:00
Matei Zaharia
8ee3ec29ee
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-08 11:58:19 -08:00
Matei Zaharia
7408230bfa
Updated modified Kryo to use objenesis
2011-03-08 11:58:08 -08:00
Matei Zaharia
ab1216cb14
Register None and Nil properly
2011-03-08 11:52:58 -08:00
Matei Zaharia
d39f5dd15e
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-08 10:28:50 -08:00
Matei Zaharia
4f0d0a7b73
stuff
2011-03-08 10:28:26 -08:00
Matei Zaharia
8b6f3db415
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-07 19:20:28 -08:00
Matei Zaharia
38f6bce33d
Added SerializingCache
2011-03-07 19:16:24 -08:00
Matei Zaharia
6316c7979d
Remove some logging
2011-03-07 18:56:36 -08:00
Matei Zaharia
e7b4b047a6
Added pluggable serializers and Kryo serialization
2011-03-07 18:41:53 -08:00
Matei Zaharia
467f056e29
Remove commented code
2011-03-06 23:38:41 -08:00
Matei Zaharia
bce95b8458
Finished cogroup stuff
2011-03-06 23:38:16 -08:00
Matei Zaharia
04c2d6a60c
stuff
2011-03-06 19:27:03 -08:00
Matei Zaharia
0fb691dd28
Various fixes to get MesosScheduler working with new RDDs
2011-03-06 16:16:38 -08:00
Matei Zaharia
1df5a65a01
Pass cache locations correctly to DAGScheduler.
2011-03-06 12:16:38 -08:00
Matei Zaharia
e1436f1eaa
Merge remote branch 'origin/master' into new-rdds
2011-03-06 11:11:47 -08:00
Matei Zaharia
370b95816f
Added sampling for large arrays in SizeEstimator
2011-03-06 11:11:20 -08:00
Matei Zaharia
a789e9aaea
Merge remote branch 'origin/master' into new-rdds
2011-03-01 10:33:37 -08:00
Matei Zaharia
021c50a8d4
Remove unnecessary lock which was there to work around a bug in
...
Configuration in Hadoop 0.20.0
2011-03-01 10:28:38 -08:00
Matei Zaharia
adaba4d550
Removed old slf4j jars that came with Hadoop
2011-03-01 10:28:21 -08:00
Matei Zaharia
447debb771
Updated Hadoop to 0.20.2 to include some bug fixes
2011-03-01 10:27:48 -08:00
Matei Zaharia
9e59afd710
More work on new RDD design
2011-02-27 19:15:52 -08:00
Matei Zaharia
f38f86d59e
More stuff
2011-02-27 14:27:12 -08:00
Matei Zaharia
2e6023f2bf
stuff
2011-02-26 23:41:44 -08:00
Matei Zaharia
309367c477
Initial work towards new RDD design
2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury
0416cc22d2
Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size)
2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury
cf81da9485
Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers.
2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury
2b946fb2d1
pickBlockRarestFirst and gossips commented OUT for now.
...
Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around.
Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.
2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury
ca2895ebb0
Fix in rarestFirst implemenation.
...
If there are more than one rarest blocks, pick randomly between them (was deterministic before)
2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury
520bbdc7e3
Peers now gossip about their neighbors when they talk.
2011-02-10 20:15:30 -08:00
Matei Zaharia
dc24aecd8f
Close record readers in HadoopFile after finishing a split
2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury
441462bc7f
Fixed some warnings during compilation.
2011-02-09 12:11:43 -08:00
Mosharaf Chowdhury
1a73c0d265
Merged with master. Using sbt.
2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury
495b38658e
Merge branch 'master' into mos-bt
2011-02-09 10:40:23 -08:00
Matei Zaharia
99f3f23efa
Changed default shuffle to LocalFileShuffle because it's way faster for small files
2011-02-08 17:03:03 -08:00
Matei Zaharia
ec28b607fd
Merge branch 'master' into sbt
...
Conflicts:
Makefile
core/src/main/java/spark/compress/lzf/LZF.java
core/src/main/java/spark/compress/lzf/LZFInputStream.java
core/src/main/java/spark/compress/lzf/LZFOutputStream.java
core/src/main/native/spark_compress_lzf_LZF.c
run
2011-02-02 00:25:54 -08:00
Matei Zaharia
e5c4cd8a5e
Made examples and core subprojects
2011-02-01 15:11:08 -08:00