This commit is contained in:
Mark Reid 2014-04-02 13:37:48 -03:00
Родитель 25fac80d1a
Коммит a4f4e49cd4
1 изменённых файлов: 2 добавлений и 2 удалений

Просмотреть файл

@ -59,7 +59,7 @@ find duplicates)
* where pos is position in file after readline() * where pos is position in file after readline()
Idle-daily deduplication notes: Idle-daily deduplication notes:
* 86400(seconds in day)*250(submissions per second)*8(bytes per UUID)= 161mb -> 2gb for 12 weeks * 86400(seconds per day) x 250(submissions per second) x 8(bytes per UUID) = 161mb per day -> 2gb for 12 weeks
* leveldb seems well-suited for this sort of workload, but a C++ implementation is trivial: https://github.com/tarasglek/tombstone_maker * leveldb seems well-suited for this sort of workload, but a C++ implementation is trivial: https://github.com/tarasglek/tombstone_maker
* skiplists(filename:offset) are called tombstones * skiplists(filename:offset) are called tombstones
* should have a compressed TOMBSTONE_INDEX for every IDLE_DAILY in a particular release. incoming_data EC2 job should generate those. Since these are basically sets they can be generated in parallel and UNIONED at the end of each EC2 job. * should have a compressed TOMBSTONE_INDEX for every IDLE_DAILY in a particular release. incoming_data EC2 job should generate those. Since these are basically sets they can be generated in parallel and UNIONED at the end of each EC2 job.