Minor tweaks
This commit is contained in:
Родитель
25fac80d1a
Коммит
a4f4e49cd4
|
@ -59,7 +59,7 @@ find duplicates)
|
||||||
* where pos is position in file after readline()
|
* where pos is position in file after readline()
|
||||||
|
|
||||||
Idle-daily deduplication notes:
|
Idle-daily deduplication notes:
|
||||||
* 86400(seconds in day)*250(submissions per second)*8(bytes per UUID)= 161mb -> 2gb for 12 weeks
|
* 86400(seconds per day) x 250(submissions per second) x 8(bytes per UUID) = 161mb per day -> 2gb for 12 weeks
|
||||||
* leveldb seems well-suited for this sort of workload, but a C++ implementation is trivial: https://github.com/tarasglek/tombstone_maker
|
* leveldb seems well-suited for this sort of workload, but a C++ implementation is trivial: https://github.com/tarasglek/tombstone_maker
|
||||||
* skiplists(filename:offset) are called tombstones
|
* skiplists(filename:offset) are called tombstones
|
||||||
* should have a compressed TOMBSTONE_INDEX for every IDLE_DAILY in a particular release. incoming_data EC2 job should generate those. Since these are basically sets they can be generated in parallel and UNIONED at the end of each EC2 job.
|
* should have a compressed TOMBSTONE_INDEX for every IDLE_DAILY in a particular release. incoming_data EC2 job should generate those. Since these are basically sets they can be generated in parallel and UNIONED at the end of each EC2 job.
|
||||||
|
|
Загрузка…
Ссылка в новой задаче