Matei Zaharia
797b4547c3
Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter
2011-07-14 14:08:34 -04:00
Matei Zaharia
3efd9e94d8
Merge branch 'master' into scala-2.9
2011-07-14 12:42:57 -04:00
Matei Zaharia
0ccfe20755
Forgot to add a file
2011-07-14 12:42:50 -04:00
Matei Zaharia
38f38dda5b
Merge branch 'master' into scala-2.9
2011-07-14 12:42:02 -04:00
Matei Zaharia
969644df8e
Cleaned up a few issues to do with default parallelism levels. Also
...
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia
2fb906e8e5
Merge branch 'master' into scala-2.9
2011-07-14 00:20:14 -04:00
Matei Zaharia
2604939f64
Simplified and documented code a little and added test
2011-07-14 00:19:00 -04:00
Matei Zaharia
2439e51a03
Merge branch 'master' into implicit-sequencefile
2011-07-13 23:20:22 -04:00
Matei Zaharia
d0c7958364
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia
9c0069188b
Updated save code to allow non-file-based OutputFormats and added a test
...
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia
da8a3b8926
Increase default value of spark.locality.wait a little
2011-07-13 20:07:24 -04:00
Matei Zaharia
080869c6ef
Merge branch 'master' into scala-2.9
2011-07-13 00:20:08 -04:00
Matei Zaharia
842e14d567
Added mapPartitions operation and a bunch of tests for RDD ops
2011-07-13 00:19:52 -04:00
Matei Zaharia
9b568d37f7
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia
d05fea24f3
Simplified parallel shuffle fetcher to use URLConnection
2011-07-11 22:12:36 -04:00
Matei Zaharia
25c3a7781c
Moved PairRDD and SequenceFileRDD functions to separate source files
2011-07-10 00:06:15 -04:00
Matei Zaharia
b7f1f62ff5
bug fix
2011-07-09 18:53:02 -04:00
Matei Zaharia
003480f374
Register byte[] with Kryo serializer
2011-07-09 18:08:07 -04:00
Matei Zaharia
aea5cb4413
Added parallel shuffle fetcher
2011-07-09 17:25:56 -04:00
Matei Zaharia
4b1646a25f
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:55 -04:00
Matei Zaharia
07a97d47c2
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:34 -04:00
Matei Zaharia
3488c386a9
Initial work to make stuff like sequenceFile[Int, Int] work without
...
requiring the user to provide a Writable type. The approach here might
not be the best but it seems to work correctly.
2011-06-28 17:07:04 -07:00
Matei Zaharia
5633299ec6
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-06-27 22:50:59 -07:00
Matei Zaharia
b0ecf1ee41
Don't pass a null context when running tasks locally
2011-06-27 22:50:43 -07:00
Matei Zaharia
85cad5d9dd
Fixed HadoopFileWriter to compile for Scala 2.9
2011-06-27 22:44:14 -07:00
Matei Zaharia
393607d5ef
Merge branch 'master' into scala-2.9
2011-06-27 18:08:25 -07:00
Matei Zaharia
2f652f1656
Fix a compile error
2011-06-27 18:07:16 -07:00
Tathagata Das
3f08e1129f
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das
ad842ac823
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia
bae8a97968
Merge branch 'master' into scala-2.9
...
Conflicts:
repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Matei Zaharia
c4dd68ae21
Merge branch 'mos-bt'
...
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.
Conflicts:
core/src/main/scala/spark/BitTorrentBroadcast.scala
core/src/main/scala/spark/ChainedBroadcast.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das
38f2ba99cc
Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
...
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel
2e3531d8bf
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 11:00:51 +02:00
Tathagata Das
3d2befe831
Improved HadoopFileWriter (saves key and value classes to jobconf)
2011-06-23 08:11:22 -07:00
Olivier Grisel
005d1605a4
add missing test for RDD.groupWith
2011-06-23 02:10:52 +02:00
Matei Zaharia
214250016a
Added simple version of lookup
2011-06-20 11:59:16 -07:00
Matei Zaharia
23b42af70a
Merge branch 'master' into scala-2.9
2011-06-19 23:06:21 -07:00
Matei Zaharia
23b1c309fb
Added pipe() operation on RDDs for mapping through a shell command.
2011-06-19 23:05:19 -07:00
Tathagata Das
b5e6645505
Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
...
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das
869836a2fa
Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task
2011-06-10 19:47:28 -07:00
Tathagata Das
389e56156f
HadoopFileWriter changed to use Hadoop's OutputCommitter
2011-06-09 15:29:22 -07:00
Tathagata Das
24d845833c
First-cut implementation of RDD.SaveAsText
2011-06-05 04:14:43 -07:00
Matei Zaharia
3297706ab2
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-06-01 11:46:31 -07:00
Matei Zaharia
9bb448a151
Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57 .
2011-06-01 11:45:47 -07:00
Matei Zaharia
850fe3274e
Make the runJob API public. Fixes #56 .
2011-06-01 11:38:44 -07:00
Ismael Juma
82f10bd794
Remove unnecessary toStream calls.
2011-06-01 16:12:42 +01:00
Matei Zaharia
10fe324845
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-05-31 23:48:11 -07:00
Matei Zaharia
5166d76843
Ensure logging is initialized before spawning any threads to fix issue #45
2011-05-31 23:47:32 -07:00
Matei Zaharia
0afd35a8dd
Some docs in ClosureCleaner
2011-05-31 22:06:30 -07:00
Matei Zaharia
8b0390d344
Instantiate NullWritable properly in HadoopFile
2011-05-30 23:54:14 -07:00
Matei Zaharia
4096c2287e
Various fixes
2011-05-29 18:46:01 -07:00
Matei Zaharia
ef706ae959
Merge branch 'master' into new-rdds-protobuf
...
Conflicts:
run
2011-05-29 16:20:23 -07:00
Matei Zaharia
c501cff924
Executor was looking for the wrong constructor for ExecutorClassLoader
2011-05-29 16:15:59 -07:00
Ismael Juma
1396678baa
Move REPL classes to separate module.
2011-05-27 11:22:50 +01:00
Ismael Juma
051da8b4ad
Delete liblzf from lib as it's no longer used.
2011-05-27 11:22:10 +01:00
Ismael Juma
ae1a1f91f1
Remove several dependencies from git and configure them as SBT managed dependencies.
...
Upgrade some of the dependencies while at it.
2011-05-27 11:22:01 +01:00
Ismael Juma
164ef4c751
Use explicit asInstanceOf instead of misleading unchecked pattern matching.
...
Also enable -unchecked warnings in SBT build file.
2011-05-27 07:57:10 +01:00
Ismael Juma
89c8ea2bb2
Replace deprecated `-` and `--` with suggested filterNot (which is uglier).
2011-05-26 22:22:37 +01:00
Ismael Juma
94f05683bd
Replace deprecated `first` with `head`.
2011-05-26 22:13:41 +01:00
Ismael Juma
0b6a862b68
Use math instead of Math as the latter is deprecated.
2011-05-26 22:06:36 +01:00
Ismael Juma
1f27d94c48
Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.
2011-05-26 22:04:42 +01:00
Ismael Juma
1993a8e556
Use += instead of + for mutable sequences as the latter is deprecated.
2011-05-26 21:59:48 +01:00
root
5ef938615f
Initial work on making stuff compile with protobuf Mesos
2011-05-24 22:27:08 +00:00
Matei Zaharia
cec427e777
Fixed a bug with preferred locations having changed meaning in new RDDs
2011-05-22 17:12:29 -07:00
Matei Zaharia
4c888b2933
Fix queue type for executor
2011-05-22 16:42:05 -07:00
Matei Zaharia
bea3a33012
doc tweak
2011-05-22 16:03:41 -07:00
Matei Zaharia
9bde5a54cb
class loader fix
2011-05-22 16:00:41 -07:00
Matei Zaharia
91c07a33d9
Various fixes to serialization
2011-05-21 22:50:08 -07:00
Matei Zaharia
f61b61c4ac
Merge branch 'master' into new-rdds
2011-05-21 21:25:58 -07:00
Matei Zaharia
24a1e7f838
Scheduler can now recover from lost map outputs
2011-05-20 00:19:53 -07:00
Matei Zaharia
82329b0b28
Updated scheduler to support running on just some partitions of final RDD
2011-05-19 12:47:09 -07:00
Matei Zaharia
328e51b693
Various minor fixes
2011-05-19 11:19:25 -07:00
Matei Zaharia
fd1d255821
Stop objectifying various trackers, caches, etc.
2011-05-17 12:41:13 -07:00
Matei Zaharia
4db50e26c7
Fixed unit tests by making them clean up the SparkContext after use and
...
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia
aca8150c52
Ensure that AddedToCache messages make it home before tasks finish
2011-05-13 11:43:52 -07:00
Matei Zaharia
16c886a581
Optimization for count()
2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury
db7a2c4897
Issue #42 fixed.
2011-04-28 14:30:48 -07:00
Ankur Dave
a4c04f3f6f
Error handling for disk I/O in DiskSpillingCache
...
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave
12ff0d2dc3
Bring an entry back into memory after fetching it from disk
2011-04-27 22:59:05 -07:00
Ankur Dave
e30313aa2c
Added DiskSpillingCache
...
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:
- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk
In addition, here are some features that aren't critical but should be
implemented soon:
- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
rather than the root directory
2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury
60d1121343
Refactoring: daemonThreadFactories have all been moved to the Utils
...
object instead of having multiple copies in Broadcast and Shuffle
objects.
2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury
e898e108a3
Cleanup + refactoring...
2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury
0567646180
Shuffle is also working from its own subpackage.
2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury
2742de707a
Removed some shuffle implementations. Remaining ones all use local files
...
to write map outputs.
2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury
9d78779257
Merge branch 'mos-shuffle-tracked' into mos-bt
...
Conflicts:
core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury
ac7e066383
Merge branch 'master' into mos-shuffle-tracked
...
Conflicts:
.gitignore
core/src/main/scala/spark/LocalFileShuffle.scala
src/scala/spark/BasicLocalFileShuffle.scala
src/scala/spark/Broadcast.scala
src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury
4e4c41026c
Added support for custom classes. (from 49ea48)
2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury
65848da8df
Refacoring...
2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury
b8ab7862b8
Moved broadcast-related code to separate directory under spark.broadcast
...
package.
2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury
e31007248c
Merge branch 'master' into mos-bt
2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury
9257a55e3a
Refactoring...
2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury
9d2d533493
Temporary fix for issue #42 .
2011-04-21 17:40:26 -07:00
Timothy Hunter
5c9535228a
fixed small bug when classpath has some strange formatting
2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury
a8f47a62b9
Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots
...
respectively for clarity (most probably they were misunderstood and
misused)
2011-04-13 16:24:19 -07:00
Matei Zaharia
94ba95bcb2
Added flatMapValues
2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury
b67a968b5d
hasBlocks is now AtomicInteger (even though it was ok)
2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury
5bf3c83b13
BroadcastSuperTracker (right now for BT) is contacted over TCP instead
...
of direct procedure call.
Need to do the same for others and consolidate all broadcast mechanisms.
2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury
733a130108
Formatting...
2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury
4636aea598
Formatting...
2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury
addd569e52
Each broadcasted variable can have different blockSize. Corresponding
...
logic to adapt blockSize based on network condition is not yet
implemented.
Formatting + consolidation.
2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury
815f3411ec
Consolidated Broadcast config params.
2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury
a18a28b08e
Removed gossip-related code that were already commented out.
...
More formatting.
2011-03-30 14:22:09 -07:00
Mosharaf Chowdhury
43aceafd70
Formatting...
2011-03-30 12:18:50 -07:00
Mosharaf Chowdhury
73b165220d
Random is the default choice; rarestFirst didn't work well in
...
experiments.
2011-03-29 13:06:43 -07:00
Matei Zaharia
d840fa8d0c
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-09 00:40:07 -08:00
root
ff5b13799a
Some tweaks to make Kryo cache work better
2011-03-09 03:31:50 -05:00
Matei Zaharia
7febdfbe29
Better reuse of buffers in Kryo serialization
2011-03-08 12:36:36 -08:00
Matei Zaharia
8ee3ec29ee
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-08 11:58:19 -08:00
Matei Zaharia
7408230bfa
Updated modified Kryo to use objenesis
2011-03-08 11:58:08 -08:00
Matei Zaharia
ab1216cb14
Register None and Nil properly
2011-03-08 11:52:58 -08:00
Matei Zaharia
d39f5dd15e
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-08 10:28:50 -08:00
Matei Zaharia
4f0d0a7b73
stuff
2011-03-08 10:28:26 -08:00
Matei Zaharia
8b6f3db415
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-07 19:20:28 -08:00
Matei Zaharia
38f6bce33d
Added SerializingCache
2011-03-07 19:16:24 -08:00
Matei Zaharia
6316c7979d
Remove some logging
2011-03-07 18:56:36 -08:00
Matei Zaharia
e7b4b047a6
Added pluggable serializers and Kryo serialization
2011-03-07 18:41:53 -08:00
Matei Zaharia
467f056e29
Remove commented code
2011-03-06 23:38:41 -08:00
Matei Zaharia
bce95b8458
Finished cogroup stuff
2011-03-06 23:38:16 -08:00
Matei Zaharia
04c2d6a60c
stuff
2011-03-06 19:27:03 -08:00
Matei Zaharia
0fb691dd28
Various fixes to get MesosScheduler working with new RDDs
2011-03-06 16:16:38 -08:00
Matei Zaharia
1df5a65a01
Pass cache locations correctly to DAGScheduler.
2011-03-06 12:16:38 -08:00
Matei Zaharia
e1436f1eaa
Merge remote branch 'origin/master' into new-rdds
2011-03-06 11:11:47 -08:00
Matei Zaharia
370b95816f
Added sampling for large arrays in SizeEstimator
2011-03-06 11:11:20 -08:00
Matei Zaharia
a789e9aaea
Merge remote branch 'origin/master' into new-rdds
2011-03-01 10:33:37 -08:00
Matei Zaharia
021c50a8d4
Remove unnecessary lock which was there to work around a bug in
...
Configuration in Hadoop 0.20.0
2011-03-01 10:28:38 -08:00
Matei Zaharia
adaba4d550
Removed old slf4j jars that came with Hadoop
2011-03-01 10:28:21 -08:00
Matei Zaharia
447debb771
Updated Hadoop to 0.20.2 to include some bug fixes
2011-03-01 10:27:48 -08:00
Matei Zaharia
9e59afd710
More work on new RDD design
2011-02-27 19:15:52 -08:00
Matei Zaharia
f38f86d59e
More stuff
2011-02-27 14:27:12 -08:00
Matei Zaharia
2e6023f2bf
stuff
2011-02-26 23:41:44 -08:00
Matei Zaharia
309367c477
Initial work towards new RDD design
2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury
0416cc22d2
Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size)
2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury
cf81da9485
Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers.
2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury
2b946fb2d1
pickBlockRarestFirst and gossips commented OUT for now.
...
Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around.
Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.
2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury
ca2895ebb0
Fix in rarestFirst implemenation.
...
If there are more than one rarest blocks, pick randomly between them (was deterministic before)
2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury
520bbdc7e3
Peers now gossip about their neighbors when they talk.
2011-02-10 20:15:30 -08:00
Matei Zaharia
dc24aecd8f
Close record readers in HadoopFile after finishing a split
2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury
441462bc7f
Fixed some warnings during compilation.
2011-02-09 12:11:43 -08:00
Mosharaf Chowdhury
1a73c0d265
Merged with master. Using sbt.
2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury
495b38658e
Merge branch 'master' into mos-bt
2011-02-09 10:40:23 -08:00
Matei Zaharia
99f3f23efa
Changed default shuffle to LocalFileShuffle because it's way faster for small files
2011-02-08 17:03:03 -08:00
Matei Zaharia
ec28b607fd
Merge branch 'master' into sbt
...
Conflicts:
Makefile
core/src/main/java/spark/compress/lzf/LZF.java
core/src/main/java/spark/compress/lzf/LZFInputStream.java
core/src/main/java/spark/compress/lzf/LZFOutputStream.java
core/src/main/native/spark_compress_lzf_LZF.c
run
2011-02-02 00:25:54 -08:00
Matei Zaharia
e5c4cd8a5e
Made examples and core subprojects
2011-02-01 15:11:08 -08:00