Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
This commit is contained in:
Junio C Hamano 2014-02-27 14:01:48 -08:00
Родитель 6784fab0ac 6b5b3a27b7
Коммит 0f9e62e084
33 изменённых файлов: 4735 добавлений и 275 удалений

Просмотреть файл

@ -1870,6 +1870,31 @@ pack.packSizeLimit::
Common unit suffixes of 'k', 'm', or 'g' are
supported.
pack.useBitmaps::
When true, git will use pack bitmaps (if available) when packing
to stdout (e.g., during the server side of a fetch). Defaults to
true. You should not generally need to turn this off unless
you are debugging pack bitmaps.
pack.writebitmaps::
When true, git will write a bitmap index when packing all
objects to disk (e.g., when `git repack -a` is run). This
index can speed up the "counting objects" phase of subsequent
packs created for clones and fetches, at the cost of some disk
space and extra time spent on the initial repack. Defaults to
false.
pack.writeBitmapHashCache::
When true, git will include a "hash cache" section in the bitmap
index (if one is written). This cache can be used to feed git's
delta heuristics, potentially leading to better deltas between
bitmapped and non-bitmapped objects (e.g., when serving a fetch
between an older, bitmapped pack and objects that have been
pushed since the last gc). The downside is that it consumes 4
bytes per object of disk space, and that JGit's bitmap
implementation does not understand it, causing it to complain if
Git and JGit are used on the same repository. Defaults to false.
pager.<cmd>::
If the value is boolean, turns on or off pagination of the
output of a particular Git subcommand when writing to a tty.

Просмотреть файл

@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository
SYNOPSIS
--------
[verse]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=<n>] [--depth=<n>]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=<n>] [--depth=<n>]
DESCRIPTION
-----------
@ -110,6 +110,13 @@ other objects in that pack they already have locally.
The default is unlimited, unless the config variable
`pack.packSizeLimit` is set.
-b::
--write-bitmap-index::
Write a reachability bitmap index as part of the repack. This
only makes sense when used with `-a` or `-A`, as the bitmaps
must be able to refer to all reachable objects. This option
overrides the setting of `pack.writebitmaps`.
Configuration
-------------

Просмотреть файл

@ -55,6 +55,7 @@ SYNOPSIS
[ \--reverse ]
[ \--walk-reflogs ]
[ \--no-walk ] [ \--do-walk ]
[ \--use-bitmap-index ]
<commit>... [ \-- <paths>... ]
DESCRIPTION

Просмотреть файл

@ -257,6 +257,14 @@ See also linkgit:git-reflog[1].
Output excluded boundary commits. Boundary commits are
prefixed with `-`.
ifdef::git-rev-list[]
--use-bitmap-index::
Try to speed up the traversal using the pack bitmap index (if
one is available). Note that when traversing with `--objects`,
trees and blobs will not have their associated path printed.
endif::git-rev-list[]
--
History Simplification

Просмотреть файл

@ -0,0 +1,164 @@
GIT bitmap v1 format
====================
- A header appears at the beginning:
4-byte signature: {'B', 'I', 'T', 'M'}
2-byte version number (network byte order)
The current implementation only supports version 1
of the bitmap index (the same one as JGit).
2-byte flags (network byte order)
The following flags are supported:
- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
This flag must always be present. It implies that the bitmap
index has been generated for a packfile with full closure
(i.e. where every single object in the packfile can find
its parent links inside the same packfile). This is a
requirement for the bitmap index format, also present in JGit,
that greatly reduces the complexity of the implementation.
- BITMAP_OPT_HASH_CACHE (0x4)
If present, the end of the bitmap file contains
`N` 32-bit name-hash values, one per object in the
pack. The format and meaning of the name-hash is
described below.
4-byte entry count (network byte order)
The total count of entries (bitmapped commits) in this bitmap index.
20-byte checksum
The SHA1 checksum of the pack this bitmap index belongs to.
- 4 EWAH bitmaps that act as type indexes
Type indexes are serialized after the hash cache in the shape
of four EWAH bitmaps stored consecutively (see Appendix A for
the serialization format of an EWAH bitmap).
There is a bitmap for each Git object type, stored in the following
order:
- Commits
- Trees
- Blobs
- Tags
In each bitmap, the `n`th bit is set to true if the `n`th object
in the packfile is of that type.
The obvious consequence is that the OR of all 4 bitmaps will result
in a full set (all bits set), and the AND of all 4 bitmaps will
result in an empty bitmap (no bits set).
- N entries with compressed bitmaps, one for each indexed commit
Where `N` is the total amount of entries in this bitmap index.
Each entry contains the following:
- 4-byte object position (network byte order)
The position **in the index for the packfile** where the
bitmap for this commit is found.
- 1-byte XOR-offset
The xor offset used to compress this bitmap. For an entry
in position `x`, a XOR offset of `y` means that the actual
bitmap representing this commit is composed by XORing the
bitmap for this entry with the bitmap in entry `x-y` (i.e.
the bitmap `y` entries before this one).
Note that this compression can be recursive. In order to
XOR this entry with a previous one, the previous entry needs
to be decompressed first, and so on.
The hard-limit for this offset is 160 (an entry can only be
xor'ed against one of the 160 entries preceding it). This
number is always positive, and hence entries are always xor'ed
with **previous** bitmaps, not bitmaps that will come afterwards
in the index.
- 1-byte flags for this bitmap
At the moment the only available flag is `0x1`, which hints
that this bitmap can be re-used when rebuilding bitmap indexes
for the repository.
- The compressed bitmap itself, see Appendix A.
== Appendix A: Serialization format for an EWAH bitmap
Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
library, making them backwards compatible with the JGit
implementation:
- 4-byte number of bits of the resulting UNCOMPRESSED bitmap
- 4-byte number of words of the COMPRESSED bitmap, when stored
- N x 8-byte words, as specified by the previous field
This is the actual content of the compressed bitmap.
- 4-byte position of the current RLW for the compressed
bitmap
All words are stored in network byte order for their corresponding
sizes.
The compressed bitmap is stored in a form of run-length encoding, as
follows. It consists of a concatenation of an arbitrary number of
chunks. Each chunk consists of one or more 64-bit words
H L_1 L_2 L_3 .... L_M
H is called RLW (run length word). It consists of (from lower to higher
order bits):
- 1 bit: the repeated bit B
- 32 bits: repetition count K (unsigned)
- 31 bits: literal word count M (unsigned)
The bitstream represented by the above chunk is then:
- K repetitions of B
- The bits stored in `L_1` through `L_M`. Within a word, bits at
lower order come earlier in the stream than those at higher
order.
The next word after `L_M` (if any) must again be a RLW, for the next
chunk. For efficient appending to the bitstream, the EWAH stores a
pointer to the last RLW in the stream.
== Appendix B: Optional Bitmap Sections
These sections may or may not be present in the `.bitmap` file; their
presence is indicated by the header flags section described above.
Name-hash cache
---------------
If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
a cache of 32-bit values, one per object in the pack. The value at
position `i` is the hash of the pathname at which the `i`th object
(counting in index order) in the pack can be found. This can be fed
into the delta heuristics to compare objects with similar pathnames.
The hash algorithm used is:
hash = 0;
while ((c = *name++))
if (!isspace(c))
hash = (hash >> 2) + (c << 24);
Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
If implementations want to choose a different hashing scheme, they are
free to do so, but MUST allocate a new header flag (because comparing
hashes made under two different schemes would be pointless).

Просмотреть файл

@ -664,6 +664,8 @@ LIB_H += diff.h
LIB_H += diffcore.h
LIB_H += dir.h
LIB_H += exec_cmd.h
LIB_H += ewah/ewok.h
LIB_H += ewah/ewok_rlw.h
LIB_H += fetch-pack.h
LIB_H += fmt-merge-msg.h
LIB_H += fsck.h
@ -691,8 +693,10 @@ LIB_H += notes-merge.h
LIB_H += notes-utils.h
LIB_H += notes.h
LIB_H += object.h
LIB_H += pack-objects.h
LIB_H += pack-revindex.h
LIB_H += pack.h
LIB_H += pack-bitmap.h
LIB_H += parse-options.h
LIB_H += patch-ids.h
LIB_H += pathspec.h
@ -796,6 +800,10 @@ LIB_OBJS += dir.o
LIB_OBJS += editor.o
LIB_OBJS += entry.o
LIB_OBJS += environment.o
LIB_OBJS += ewah/bitmap.o
LIB_OBJS += ewah/ewah_bitmap.o
LIB_OBJS += ewah/ewah_io.o
LIB_OBJS += ewah/ewah_rlw.o
LIB_OBJS += exec_cmd.o
LIB_OBJS += fetch-pack.o
LIB_OBJS += fsck.o
@ -827,7 +835,10 @@ LIB_OBJS += notes-cache.o
LIB_OBJS += notes-merge.o
LIB_OBJS += notes-utils.o
LIB_OBJS += object.o
LIB_OBJS += pack-bitmap.o
LIB_OBJS += pack-bitmap-write.o
LIB_OBJS += pack-check.o
LIB_OBJS += pack-objects.o
LIB_OBJS += pack-revindex.o
LIB_OBJS += pack-write.o
LIB_OBJS += pager.o
@ -2480,8 +2491,9 @@ profile-clean:
$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
clean: profile-clean coverage-clean
$(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o vcs-svn/*.o \
builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
$(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o
$(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o
$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
$(RM) $(TEST_PROGRAMS) $(NO_INSTALL)
$(RM) -r bin-wrappers $(dep_dirs)

Просмотреть файл

@ -62,38 +62,6 @@
#define setW(x, val) (W(x) = (val))
#endif
/*
* Performance might be improved if the CPU architecture is OK with
* unaligned 32-bit loads and a fast ntohl() is available.
* Otherwise fall back to byte loads and shifts which is portable,
* and is faster on architectures with memory alignment issues.
*/
#if defined(__i386__) || defined(__x86_64__) || \
defined(_M_IX86) || defined(_M_X64) || \
defined(__ppc__) || defined(__ppc64__) || \
defined(__powerpc__) || defined(__powerpc64__) || \
defined(__s390__) || defined(__s390x__)
#define get_be32(p) ntohl(*(unsigned int *)(p))
#define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
#else
#define get_be32(p) ( \
(*((unsigned char *)(p) + 0) << 24) | \
(*((unsigned char *)(p) + 1) << 16) | \
(*((unsigned char *)(p) + 2) << 8) | \
(*((unsigned char *)(p) + 3) << 0) )
#define put_be32(p, v) do { \
unsigned int __v = (v); \
*((unsigned char *)(p) + 0) = __v >> 24; \
*((unsigned char *)(p) + 1) = __v >> 16; \
*((unsigned char *)(p) + 2) = __v >> 8; \
*((unsigned char *)(p) + 3) = __v >> 0; } while (0)
#endif
/* This "rolls" over the 512-bit array */
#define W(x) (array[(x)&15])

Просмотреть файл

@ -14,10 +14,12 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
#include "pack-objects.h"
#include "progress.h"
#include "refs.h"
#include "streaming.h"
#include "thread-utils.h"
#include "pack-bitmap.h"
static const char *pack_usage[] = {
N_("git pack-objects --stdout [options...] [< ref-list | < object-list]"),
@ -25,42 +27,15 @@ static const char *pack_usage[] = {
NULL
};
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
struct packed_git *in_pack; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
struct object_entry *delta_sibling; /* other deltified objects who
* uses the same base as me
*/
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
enum object_type type;
enum object_type in_pack_type; /* could be delta */
uint32_t hash; /* name hint hash */
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
};
/*
* Objects we are going to pack are collected in objects array (dynamically
* expanded). nr_objects & nr_alloc controls this array. They are stored
* in the order we see -- typically rev-list --objects order that gives us
* nice "minimum seek" order.
* Objects we are going to pack are collected in the `to_pack` structure.
* It contains an array (dynamically expanded) of the object data, and a map
* that can resolve SHA1s to their position in the array.
*/
static struct object_entry *objects;
static struct packing_data to_pack;
static struct pack_idx_entry **written_list;
static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
static uint32_t nr_result, nr_written;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
@ -83,27 +58,43 @@ static struct progress *progress_state;
static int pack_compression_level = Z_DEFAULT_COMPRESSION;
static int pack_compression_seen;
static struct packed_git *reuse_packfile;
static uint32_t reuse_packfile_objects;
static off_t reuse_packfile_offset;
static int use_bitmap_index = 1;
static int write_bitmap_index;
static uint16_t write_bitmap_options;
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
static unsigned long cache_max_small_delta_size = 1000;
static unsigned long window_memory_limit = 0;
/*
* The object names in objects array are hashed with this hashtable,
* to help looking up the entry by object name.
* This hashtable is built after all the objects are seen.
*/
static int *object_ix;
static int object_ix_hashsz;
static struct object_entry *locate_object_entry(const unsigned char *sha1);
/*
* stats
*/
static uint32_t written, written_delta;
static uint32_t reused, reused_delta;
/*
* Indexed commits
*/
static struct commit **indexed_commits;
static unsigned int indexed_commits_nr;
static unsigned int indexed_commits_alloc;
static void index_commit_for_bitmap(struct commit *commit)
{
if (indexed_commits_nr >= indexed_commits_alloc) {
indexed_commits_alloc = (indexed_commits_alloc + 32) * 2;
indexed_commits = xrealloc(indexed_commits,
indexed_commits_alloc * sizeof(struct commit *));
}
indexed_commits[indexed_commits_nr++] = commit;
}
static void *get_delta(struct object_entry *entry)
{
@ -553,12 +544,12 @@ static int mark_tagged(const char *path, const unsigned char *sha1, int flag,
void *cb_data)
{
unsigned char peeled[20];
struct object_entry *entry = locate_object_entry(sha1);
struct object_entry *entry = packlist_find(&to_pack, sha1, NULL);
if (entry)
entry->tagged = 1;
if (!peel_ref(path, peeled)) {
entry = locate_object_entry(peeled);
entry = packlist_find(&to_pack, peeled, NULL);
if (entry)
entry->tagged = 1;
}
@ -633,9 +624,10 @@ static struct object_entry **compute_write_order(void)
{
unsigned int i, wo_end, last_untagged;
struct object_entry **wo = xmalloc(nr_objects * sizeof(*wo));
struct object_entry **wo = xmalloc(to_pack.nr_objects * sizeof(*wo));
struct object_entry *objects = to_pack.objects;
for (i = 0; i < nr_objects; i++) {
for (i = 0; i < to_pack.nr_objects; i++) {
objects[i].tagged = 0;
objects[i].filled = 0;
objects[i].delta_child = NULL;
@ -647,7 +639,7 @@ static struct object_entry **compute_write_order(void)
* Make sure delta_sibling is sorted in the original
* recency order.
*/
for (i = nr_objects; i > 0;) {
for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
if (!e->delta)
continue;
@ -665,7 +657,7 @@ static struct object_entry **compute_write_order(void)
* Give the objects in the original recency order until
* we see a tagged tip.
*/
for (i = wo_end = 0; i < nr_objects; i++) {
for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
break;
add_to_write_order(wo, &wo_end, &objects[i]);
@ -675,7 +667,7 @@ static struct object_entry **compute_write_order(void)
/*
* Then fill all the tagged tips.
*/
for (; i < nr_objects; i++) {
for (; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
add_to_write_order(wo, &wo_end, &objects[i]);
}
@ -683,7 +675,7 @@ static struct object_entry **compute_write_order(void)
/*
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < nr_objects; i++) {
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (objects[i].type != OBJ_COMMIT &&
objects[i].type != OBJ_TAG)
continue;
@ -693,7 +685,7 @@ static struct object_entry **compute_write_order(void)
/*
* And then all the trees.
*/
for (i = last_untagged; i < nr_objects; i++) {
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (objects[i].type != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
@ -702,17 +694,57 @@ static struct object_entry **compute_write_order(void)
/*
* Finally all the rest in really tight order
*/
for (i = last_untagged; i < nr_objects; i++) {
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (!objects[i].filled)
add_family_to_write_order(wo, &wo_end, &objects[i]);
}
if (wo_end != nr_objects)
die("ordered %u objects, expected %"PRIu32, wo_end, nr_objects);
if (wo_end != to_pack.nr_objects)
die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects);
return wo;
}
static off_t write_reused_pack(struct sha1file *f)
{
unsigned char buffer[8192];
off_t to_write;
int fd;
if (!is_pack_valid(reuse_packfile))
die("packfile is invalid: %s", reuse_packfile->pack_name);
fd = git_open_noatime(reuse_packfile->pack_name);
if (fd < 0)
die_errno("unable to open packfile for reuse: %s",
reuse_packfile->pack_name);
if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
die_errno("unable to seek in reused packfile");
if (reuse_packfile_offset < 0)
reuse_packfile_offset = reuse_packfile->pack_size - 20;
to_write = reuse_packfile_offset - sizeof(struct pack_header);
while (to_write) {
int read_pack = xread(fd, buffer, sizeof(buffer));
if (read_pack <= 0)
die_errno("unable to read from reused packfile");
if (read_pack > to_write)
read_pack = to_write;
sha1write(f, buffer, read_pack);
to_write -= read_pack;
}
close(fd);
written += reuse_packfile_objects;
return reuse_packfile_offset - sizeof(struct pack_header);
}
static void write_pack_file(void)
{
uint32_t i = 0, j;
@ -724,7 +756,7 @@ static void write_pack_file(void)
if (progress > pack_to_stdout)
progress_state = start_progress("Writing objects", nr_result);
written_list = xmalloc(nr_objects * sizeof(*written_list));
written_list = xmalloc(to_pack.nr_objects * sizeof(*written_list));
write_order = compute_write_order();
do {
@ -737,8 +769,17 @@ static void write_pack_file(void)
f = create_tmp_packfile(&pack_tmp_name);
offset = write_pack_header(f, nr_remaining);
if (reuse_packfile) {
off_t packfile_size;
assert(pack_to_stdout);
packfile_size = write_reused_pack(f);
offset += packfile_size;
}
nr_written = 0;
for (; i < nr_objects; i++) {
for (; i < to_pack.nr_objects; i++) {
struct object_entry *e = write_order[i];
if (write_one(f, e, &offset) == WRITE_ONE_BREAK)
break;
@ -789,9 +830,31 @@ static void write_pack_file(void)
if (sizeof(tmpname) <= strlen(base_name) + 50)
die("pack base name '%s' too long", base_name);
snprintf(tmpname, sizeof(tmpname), "%s-", base_name);
if (write_bitmap_index) {
bitmap_writer_set_checksum(sha1);
bitmap_writer_build_type_index(written_list, nr_written);
}
finish_tmp_packfile(tmpname, pack_tmp_name,
written_list, nr_written,
&pack_idx_opts, sha1);
if (write_bitmap_index) {
char *end_of_name_prefix = strrchr(tmpname, 0);
sprintf(end_of_name_prefix, "%s.bitmap", sha1_to_hex(sha1));
stop_progress(&progress_state);
bitmap_writer_show_progress(progress);
bitmap_writer_reuse_bitmaps(&to_pack);
bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
bitmap_writer_build(&to_pack);
bitmap_writer_finish(written_list, nr_written,
tmpname, write_bitmap_options);
write_bitmap_index = 0;
}
free(pack_tmp_name);
puts(sha1_to_hex(sha1));
}
@ -801,7 +864,7 @@ static void write_pack_file(void)
written_list[j]->offset = (off_t)-1;
}
nr_remaining -= nr_written;
} while (nr_remaining && i < nr_objects);
} while (nr_remaining && i < to_pack.nr_objects);
free(written_list);
free(write_order);
@ -811,73 +874,6 @@ static void write_pack_file(void)
written, nr_result);
}
static int locate_object_entry_hash(const unsigned char *sha1)
{
int i;
unsigned int ui;
memcpy(&ui, sha1, sizeof(unsigned int));
i = ui % object_ix_hashsz;
while (0 < object_ix[i]) {
if (!hashcmp(sha1, objects[object_ix[i] - 1].idx.sha1))
return i;
if (++i == object_ix_hashsz)
i = 0;
}
return -1 - i;
}
static struct object_entry *locate_object_entry(const unsigned char *sha1)
{
int i;
if (!object_ix_hashsz)
return NULL;
i = locate_object_entry_hash(sha1);
if (0 <= i)
return &objects[object_ix[i]-1];
return NULL;
}
static void rehash_objects(void)
{
uint32_t i;
struct object_entry *oe;
object_ix_hashsz = nr_objects * 3;
if (object_ix_hashsz < 1024)
object_ix_hashsz = 1024;
object_ix = xrealloc(object_ix, sizeof(int) * object_ix_hashsz);
memset(object_ix, 0, sizeof(int) * object_ix_hashsz);
for (i = 0, oe = objects; i < nr_objects; i++, oe++) {
int ix = locate_object_entry_hash(oe->idx.sha1);
if (0 <= ix)
continue;
ix = -1 - ix;
object_ix[ix] = i + 1;
}
}
static uint32_t name_hash(const char *name)
{
uint32_t c, hash = 0;
if (!name)
return 0;
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
return hash;
}
static void setup_delta_attr_check(struct git_attr_check *check)
{
static struct git_attr *attr_delta;
@ -900,42 +896,69 @@ static int no_try_delta(const char *path)
return 0;
}
static int add_object_entry(const unsigned char *sha1, enum object_type type,
const char *name, int exclude)
/*
* When adding an object, check whether we have already added it
* to our packing list. If so, we can skip. However, if we are
* being asked to excludei t, but the previous mention was to include
* it, make sure to adjust its flags and tweak our numbers accordingly.
*
* As an optimization, we pass out the index position where we would have
* found the item, since that saves us from having to look it up again a
* few lines later when we want to add the new entry.
*/
static int have_duplicate_entry(const unsigned char *sha1,
int exclude,
uint32_t *index_pos)
{
struct object_entry *entry;
struct packed_git *p, *found_pack = NULL;
off_t found_offset = 0;
int ix;
uint32_t hash = name_hash(name);
ix = nr_objects ? locate_object_entry_hash(sha1) : -1;
if (ix >= 0) {
if (exclude) {
entry = objects + object_ix[ix] - 1;
if (!entry->preferred_base)
nr_result--;
entry->preferred_base = 1;
}
entry = packlist_find(&to_pack, sha1, index_pos);
if (!entry)
return 0;
if (exclude) {
if (!entry->preferred_base)
nr_result--;
entry->preferred_base = 1;
}
return 1;
}
/*
* Check whether we want the object in the pack (e.g., we do not want
* objects found in non-local stores if the "--local" option was used).
*
* As a side effect of this check, we will find the packed version of this
* object, if any. We therefore pass out the pack information to avoid having
* to look it up again later.
*/
static int want_object_in_pack(const unsigned char *sha1,
int exclude,
struct packed_git **found_pack,
off_t *found_offset)
{
struct packed_git *p;
if (!exclude && local && has_loose_object_nonlocal(sha1))
return 0;
*found_pack = NULL;
*found_offset = 0;
for (p = packed_git; p; p = p->next) {
off_t offset = find_pack_entry_one(sha1, p);
if (offset) {
if (!found_pack) {
if (!*found_pack) {
if (!is_pack_valid(p)) {
warning("packfile %s cannot be accessed", p->pack_name);
continue;
}
found_offset = offset;
found_pack = p;
*found_offset = offset;
*found_pack = p;
}
if (exclude)
break;
return 1;
if (incremental)
return 0;
if (local && !p->pack_local)
@ -945,14 +968,21 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
}
}
if (nr_objects >= nr_alloc) {
nr_alloc = (nr_alloc + 1024) * 3 / 2;
objects = xrealloc(objects, nr_alloc * sizeof(*entry));
}
return 1;
}
entry = objects + nr_objects++;
memset(entry, 0, sizeof(*entry));
hashcpy(entry->idx.sha1, sha1);
static void create_object_entry(const unsigned char *sha1,
enum object_type type,
uint32_t hash,
int exclude,
int no_try_delta,
uint32_t index_pos,
struct packed_git *found_pack,
off_t found_offset)
{
struct object_entry *entry;
entry = packlist_alloc(&to_pack, sha1, index_pos);
entry->hash = hash;
if (type)
entry->type = type;
@ -965,16 +995,43 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
entry->in_pack_offset = found_offset;
}
if (object_ix_hashsz * 3 <= nr_objects * 4)
rehash_objects();
else
object_ix[-1 - ix] = nr_objects;
entry->no_try_delta = no_try_delta;
}
display_progress(progress_state, nr_objects);
static int add_object_entry(const unsigned char *sha1, enum object_type type,
const char *name, int exclude)
{
struct packed_git *found_pack;
off_t found_offset;
uint32_t index_pos;
if (name && no_try_delta(name))
entry->no_try_delta = 1;
if (have_duplicate_entry(sha1, exclude, &index_pos))
return 0;
if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset))
return 0;
create_object_entry(sha1, type, pack_name_hash(name),
exclude, name && no_try_delta(name),
index_pos, found_pack, found_offset);
display_progress(progress_state, to_pack.nr_objects);
return 1;
}
static int add_object_entry_from_bitmap(const unsigned char *sha1,
enum object_type type,
int flags, uint32_t name_hash,
struct packed_git *pack, off_t offset)
{
uint32_t index_pos;
if (have_duplicate_entry(sha1, 0, &index_pos))
return 0;
create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
display_progress(progress_state, to_pack.nr_objects);
return 1;
}
@ -1175,7 +1232,7 @@ static void add_preferred_base_object(const char *name)
{
struct pbase_tree *it;
int cmplen;
unsigned hash = name_hash(name);
unsigned hash = pack_name_hash(name);
if (!num_preferred_base || check_pbase_path(hash))
return;
@ -1327,7 +1384,7 @@ static void check_object(struct object_entry *entry)
break;
}
if (base_ref && (base_entry = locate_object_entry(base_ref))) {
if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
/*
* If base_ref was set above that means we wish to
* reuse delta data, and we even found that base
@ -1401,12 +1458,12 @@ static void get_object_details(void)
uint32_t i;
struct object_entry **sorted_by_offset;
sorted_by_offset = xcalloc(nr_objects, sizeof(struct object_entry *));
for (i = 0; i < nr_objects; i++)
sorted_by_offset[i] = objects + i;
qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
sorted_by_offset = xcalloc(to_pack.nr_objects, sizeof(struct object_entry *));
for (i = 0; i < to_pack.nr_objects; i++)
sorted_by_offset[i] = to_pack.objects + i;
qsort(sorted_by_offset, to_pack.nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
for (i = 0; i < nr_objects; i++) {
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
if (big_file_threshold < entry->size)
@ -2032,7 +2089,7 @@ static int add_ref_tag(const char *path, const unsigned char *sha1, int flag, vo
if (starts_with(path, "refs/tags/") && /* is a tag? */
!peel_ref(path, peeled) && /* peelable? */
locate_object_entry(peeled)) /* object packed? */
packlist_find(&to_pack, peeled, NULL)) /* object packed? */
add_object_entry(sha1, OBJ_TAG, NULL, 0);
return 0;
}
@ -2055,14 +2112,14 @@ static void prepare_pack(int window, int depth)
if (!pack_to_stdout)
do_check_packed_object_crc = 1;
if (!nr_objects || !window || !depth)
if (!to_pack.nr_objects || !window || !depth)
return;
delta_list = xmalloc(nr_objects * sizeof(*delta_list));
delta_list = xmalloc(to_pack.nr_objects * sizeof(*delta_list));
nr_deltas = n = 0;
for (i = 0; i < nr_objects; i++) {
struct object_entry *entry = objects + i;
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = to_pack.objects + i;
if (entry->delta)
/* This happens if we decided to reuse existing
@ -2140,6 +2197,20 @@ static int git_pack_config(const char *k, const char *v, void *cb)
cache_max_small_delta_size = git_config_int(k, v);
return 0;
}
if (!strcmp(k, "pack.writebitmaps")) {
write_bitmap_index = git_config_bool(k, v);
return 0;
}
if (!strcmp(k, "pack.writebitmaphashcache")) {
if (git_config_bool(k, v))
write_bitmap_options |= BITMAP_OPT_HASH_CACHE;
else
write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
}
if (!strcmp(k, "pack.usebitmaps")) {
use_bitmap_index = git_config_bool(k, v);
return 0;
}
if (!strcmp(k, "pack.threads")) {
delta_search_threads = git_config_int(k, v);
if (delta_search_threads < 0)
@ -2198,6 +2269,9 @@ static void show_commit(struct commit *commit, void *data)
{
add_object_entry(commit->object.sha1, OBJ_COMMIT, NULL, 0);
commit->object.flags |= OBJECT_ADDED;
if (write_bitmap_index)
index_commit_for_bitmap(commit);
}
static void show_object(struct object *obj,
@ -2340,7 +2414,7 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
for (i = 0; i < p->num_objects; i++) {
sha1 = nth_packed_object_sha1(p, i);
if (!locate_object_entry(sha1) &&
if (!packlist_find(&to_pack, sha1, NULL) &&
!has_sha1_pack_kept_or_nonlocal(sha1))
if (force_object_loose(sha1, p->mtime))
die("unable to force loose object");
@ -2348,6 +2422,29 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
}
}
static int get_object_list_from_bitmap(struct rev_info *revs)
{
if (prepare_bitmap_walk(revs) < 0)
return -1;
if (!reuse_partial_packfile_from_bitmap(
&reuse_packfile,
&reuse_packfile_objects,
&reuse_packfile_offset)) {
assert(reuse_packfile_objects);
nr_result += reuse_packfile_objects;
if (progress) {
fprintf(stderr, "Reusing existing pack: %d, done.\n",
reuse_packfile_objects);
fflush(stderr);
}
}
traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
return 0;
}
static void get_object_list(int ac, const char **av)
{
struct rev_info revs;
@ -2367,6 +2464,7 @@ static void get_object_list(int ac, const char **av)
if (*line == '-') {
if (!strcmp(line, "--not")) {
flags ^= UNINTERESTING;
write_bitmap_index = 0;
continue;
}
die("not a rev '%s'", line);
@ -2375,6 +2473,9 @@ static void get_object_list(int ac, const char **av)
die("bad revision '%s'", line);
}
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
return;
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
@ -2504,6 +2605,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
N_("pack compression level")),
OPT_SET_INT(0, "keep-true-parents", &grafts_replace_parents,
N_("do not hide commits by grafts"), 0),
OPT_BOOL(0, "use-bitmap-index", &use_bitmap_index,
N_("use a bitmap index if available to speed up counting objects")),
OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
N_("write a bitmap index together with the pack index")),
OPT_END(),
};
@ -2570,6 +2675,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (keep_unreachable && unpack_unreachable)
die("--keep-unreachable and --unpack-unreachable are incompatible.");
if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow())
use_bitmap_index = 0;
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;
if (progress && all_progress_implied)
progress = 2;

Просмотреть файл

@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list)
static void remove_redundant_pack(const char *dir_name, const char *base_name)
{
const char *exts[] = {".pack", ".idx", ".keep"};
const char *exts[] = {".pack", ".idx", ".keep", ".bitmap"};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
@ -115,7 +115,14 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
int cmd_repack(int argc, const char **argv, const char *prefix)
{
const char *exts[2] = {".pack", ".idx"};
struct {
const char *name;
unsigned optional:1;
} exts[] = {
{".pack"},
{".idx"},
{".bitmap", 1},
};
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@ -137,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
int write_bitmap = -1;
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
@ -155,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
OPT__QUIET(&quiet, N_("be quiet")),
OPT_BOOL('l', "local", &local,
N_("pass --local to git-pack-objects")),
OPT_BOOL('b', "write-bitmap-index", &write_bitmap,
N_("write bitmap index")),
OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
N_("with -A, do not loosen objects older than this")),
OPT_STRING(0, "window", &window, N_("n"),
@ -196,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
argv_array_pushf(&cmd_args, "--no-reuse-delta");
if (no_reuse_object)
argv_array_pushf(&cmd_args, "--no-reuse-object");
if (write_bitmap >= 0)
argv_array_pushf(&cmd_args, "--%swrite-bitmap-index",
write_bitmap ? "" : "no-");
if (pack_everything & ALL_INTO_ONE) {
get_non_kept_pack_filenames(&existing_packs);
@ -256,17 +269,17 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
*/
failed = 0;
for_each_string_list_item(item, &names) {
for (ext = 0; ext < 2; ext++) {
for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup("%s/pack-%s%s", packdir,
item->string, exts[ext]);
item->string, exts[ext].name);
if (!file_exists(fname)) {
free(fname);
continue;
}
fname_old = mkpath("%s/old-%s%s", packdir,
item->string, exts[ext]);
item->string, exts[ext].name);
if (file_exists(fname_old))
if (unlink(fname_old))
failed = 1;
@ -313,19 +326,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
/* Now the ones with the same name are out of the way... */
for_each_string_list_item(item, &names) {
for (ext = 0; ext < 2; ext++) {
for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
int exists = 0;
fname = mkpathdup("%s/pack-%s%s",
packdir, item->string, exts[ext]);
packdir, item->string, exts[ext].name);
fname_old = mkpathdup("%s-%s%s",
packtmp, item->string, exts[ext]);
packtmp, item->string, exts[ext].name);
if (!stat(fname_old, &statbuffer)) {
statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
exists = 1;
}
if (exists || !exts[ext].optional) {
if (rename(fname_old, fname))
die_errno(_("renaming '%s' failed"), fname_old);
}
if (rename(fname_old, fname))
die_errno(_("renaming '%s' failed"), fname_old);
free(fname);
free(fname_old);
}
@ -333,12 +350,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
/* Remove the "old-" files */
for_each_string_list_item(item, &names) {
for (ext = 0; ext < 2; ext++) {
for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname;
fname = mkpath("%s/old-%s%s",
packdir,
item->string,
exts[ext]);
exts[ext].name);
if (remove_path(fname))
warning(_("removing '%s' failed"), fname);
}

Просмотреть файл

@ -3,6 +3,8 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
#include "pack.h"
#include "pack-bitmap.h"
#include "builtin.h"
#include "log-tree.h"
#include "graph.h"
@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, int reaches, int all)
return 0;
}
static int show_object_fast(
const unsigned char *sha1,
enum object_type type,
int exclude,
uint32_t name_hash,
struct packed_git *found_pack,
off_t found_offset)
{
fprintf(stdout, "%s\n", sha1_to_hex(sha1));
return 1;
}
int cmd_rev_list(int argc, const char **argv, const char *prefix)
{
struct rev_info revs;
@ -265,6 +279,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
int bisect_list = 0;
int bisect_show_vars = 0;
int bisect_find_all = 0;
int use_bitmap_index = 0;
git_config(git_default_config, NULL);
init_revisions(&revs, prefix);
@ -306,6 +321,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
bisect_show_vars = 1;
continue;
}
if (!strcmp(arg, "--use-bitmap-index")) {
use_bitmap_index = 1;
continue;
}
if (!strcmp(arg, "--test-bitmap")) {
test_bitmap_walk(&revs);
return 0;
}
usage(rev_list_usage);
}
@ -333,6 +356,22 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
if (bisect_list)
revs.limited = 1;
if (use_bitmap_index) {
if (revs.count && !revs.left_right && !revs.cherry_mark) {
uint32_t commit_count;
if (!prepare_bitmap_walk(&revs)) {
count_bitmap_commit_list(&commit_count, NULL, NULL, NULL);
printf("%d\n", commit_count);
return 0;
}
} else if (revs.tag_objects && revs.tree_objects && revs.blob_objects) {
if (!prepare_bitmap_walk(&revs)) {
traverse_bitmap_commit_list(&show_object_fast);
return 0;
}
}
}
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
if (revs.tree_objects)

Просмотреть файл

@ -809,6 +809,7 @@ extern int hash_sha1_file(const void *buf, unsigned long len, const char *type,
extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
extern int force_object_loose(const unsigned char *sha1, time_t mtime);
extern int git_open_noatime(const char *name);
extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
extern int parse_sha1_header(const char *hdr, unsigned long *sizep);

Просмотреть файл

@ -17,7 +17,20 @@ static inline uint32_t default_swab32(uint32_t val)
((val & 0x000000ff) << 24));
}
static inline uint64_t default_bswap64(uint64_t val)
{
return (((val & (uint64_t)0x00000000000000ffULL) << 56) |
((val & (uint64_t)0x000000000000ff00ULL) << 40) |
((val & (uint64_t)0x0000000000ff0000ULL) << 24) |
((val & (uint64_t)0x00000000ff000000ULL) << 8) |
((val & (uint64_t)0x000000ff00000000ULL) >> 8) |
((val & (uint64_t)0x0000ff0000000000ULL) >> 24) |
((val & (uint64_t)0x00ff000000000000ULL) >> 40) |
((val & (uint64_t)0xff00000000000000ULL) >> 56));
}
#undef bswap32
#undef bswap64
#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
@ -32,15 +45,42 @@ static inline uint32_t git_bswap32(uint32_t x)
return result;
}
#define bswap64 git_bswap64
#if defined(__x86_64__)
static inline uint64_t git_bswap64(uint64_t x)
{
uint64_t result;
if (__builtin_constant_p(x))
result = default_bswap64(x);
else
__asm__("bswap %q0" : "=r" (result) : "0" (x));
return result;
}
#else
static inline uint64_t git_bswap64(uint64_t x)
{
union { uint64_t i64; uint32_t i32[2]; } tmp, result;
if (__builtin_constant_p(x))
result.i64 = default_bswap64(x);
else {
tmp.i64 = x;
result.i32[0] = git_bswap32(tmp.i32[1]);
result.i32[1] = git_bswap32(tmp.i32[0]);
}
return result.i64;
}
#endif
#elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))
#include <stdlib.h>
#define bswap32(x) _byteswap_ulong(x)
#define bswap64(x) _byteswap_uint64(x)
#endif
#ifdef bswap32
#if defined(bswap32)
#undef ntohl
#undef htonl
@ -48,3 +88,73 @@ static inline uint32_t git_bswap32(uint32_t x)
#define htonl(x) bswap32(x)
#endif
#if defined(bswap64)
#undef ntohll
#undef htonll
#define ntohll(x) bswap64(x)
#define htonll(x) bswap64(x)
#else
#undef ntohll
#undef htonll
#if !defined(__BYTE_ORDER)
# if defined(BYTE_ORDER) && defined(LITTLE_ENDIAN) && defined(BIG_ENDIAN)
# define __BYTE_ORDER BYTE_ORDER
# define __LITTLE_ENDIAN LITTLE_ENDIAN
# define __BIG_ENDIAN BIG_ENDIAN
# endif
#endif
#if !defined(__BYTE_ORDER)
# error "Cannot determine endianness"
#endif
#if __BYTE_ORDER == __BIG_ENDIAN
# define ntohll(n) (n)
# define htonll(n) (n)
#else
# define ntohll(n) default_bswap64(n)
# define htonll(n) default_bswap64(n)
#endif
#endif
/*
* Performance might be improved if the CPU architecture is OK with
* unaligned 32-bit loads and a fast ntohl() is available.
* Otherwise fall back to byte loads and shifts which is portable,
* and is faster on architectures with memory alignment issues.
*/
#if defined(__i386__) || defined(__x86_64__) || \
defined(_M_IX86) || defined(_M_X64) || \
defined(__ppc__) || defined(__ppc64__) || \
defined(__powerpc__) || defined(__powerpc64__) || \
defined(__s390__) || defined(__s390x__)
#define get_be16(p) ntohs(*(unsigned short *)(p))
#define get_be32(p) ntohl(*(unsigned int *)(p))
#define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
#else
#define get_be16(p) ( \
(*((unsigned char *)(p) + 0) << 8) | \
(*((unsigned char *)(p) + 1) << 0) )
#define get_be32(p) ( \
(*((unsigned char *)(p) + 0) << 24) | \
(*((unsigned char *)(p) + 1) << 16) | \
(*((unsigned char *)(p) + 2) << 8) | \
(*((unsigned char *)(p) + 3) << 0) )
#define put_be32(p, v) do { \
unsigned int __v = (v); \
*((unsigned char *)(p) + 0) = __v >> 24; \
*((unsigned char *)(p) + 1) = __v >> 16; \
*((unsigned char *)(p) + 2) = __v >> 8; \
*((unsigned char *)(p) + 3) = __v >> 0; } while (0)
#endif

221
ewah/bitmap.c Normal file
Просмотреть файл

@ -0,0 +1,221 @@
/**
* Copyright 2013, GitHub, Inc
* Copyright 2009-2013, Daniel Lemire, Cliff Moon,
* David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include "git-compat-util.h"
#include "ewok.h"
#define MASK(x) ((eword_t)1 << (x % BITS_IN_WORD))
#define BLOCK(x) (x / BITS_IN_WORD)
struct bitmap *bitmap_new(void)
{
struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap));
bitmap->words = ewah_calloc(32, sizeof(eword_t));
bitmap->word_alloc = 32;
return bitmap;
}
void bitmap_set(struct bitmap *self, size_t pos)
{
size_t block = BLOCK(pos);
if (block >= self->word_alloc) {
size_t old_size = self->word_alloc;
self->word_alloc = block * 2;
self->words = ewah_realloc(self->words,
self->word_alloc * sizeof(eword_t));
memset(self->words + old_size, 0x0,
(self->word_alloc - old_size) * sizeof(eword_t));
}
self->words[block] |= MASK(pos);
}
void bitmap_clear(struct bitmap *self, size_t pos)
{
size_t block = BLOCK(pos);
if (block < self->word_alloc)
self->words[block] &= ~MASK(pos);
}
int bitmap_get(struct bitmap *self, size_t pos)
{
size_t block = BLOCK(pos);
return block < self->word_alloc &&
(self->words[block] & MASK(pos)) != 0;
}
struct ewah_bitmap *bitmap_to_ewah(struct bitmap *bitmap)
{
struct ewah_bitmap *ewah = ewah_new();
size_t i, running_empty_words = 0;
eword_t last_word = 0;
for (i = 0; i < bitmap->word_alloc; ++i) {
if (bitmap->words[i] == 0) {
running_empty_words++;
continue;
}
if (last_word != 0)
ewah_add(ewah, last_word);
if (running_empty_words > 0) {
ewah_add_empty_words(ewah, 0, running_empty_words);
running_empty_words = 0;
}
last_word = bitmap->words[i];
}
ewah_add(ewah, last_word);
return ewah;
}
struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah)
{
struct bitmap *bitmap = bitmap_new();
struct ewah_iterator it;
eword_t blowup;
size_t i = 0;
ewah_iterator_init(&it, ewah);
while (ewah_iterator_next(&blowup, &it)) {
if (i >= bitmap->word_alloc) {
bitmap->word_alloc *= 1.5;
bitmap->words = ewah_realloc(
bitmap->words, bitmap->word_alloc * sizeof(eword_t));
}
bitmap->words[i++] = blowup;
}
bitmap->word_alloc = i;
return bitmap;
}
void bitmap_and_not(struct bitmap *self, struct bitmap *other)
{
const size_t count = (self->word_alloc < other->word_alloc) ?
self->word_alloc : other->word_alloc;
size_t i;
for (i = 0; i < count; ++i)
self->words[i] &= ~other->words[i];
}
void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
{
size_t original_size = self->word_alloc;
size_t other_final = (other->bit_size / BITS_IN_WORD) + 1;
size_t i = 0;
struct ewah_iterator it;
eword_t word;
if (self->word_alloc < other_final) {
self->word_alloc = other_final;
self->words = ewah_realloc(self->words,
self->word_alloc * sizeof(eword_t));
memset(self->words + original_size, 0x0,
(self->word_alloc - original_size) * sizeof(eword_t));
}
ewah_iterator_init(&it, other);
while (ewah_iterator_next(&word, &it))
self->words[i++] |= word;
}
void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data)
{
size_t pos = 0, i;
for (i = 0; i < self->word_alloc; ++i) {
eword_t word = self->words[i];
uint32_t offset;
if (word == (eword_t)~0) {
for (offset = 0; offset < BITS_IN_WORD; ++offset)
callback(pos++, data);
} else {
for (offset = 0; offset < BITS_IN_WORD; ++offset) {
if ((word >> offset) == 0)
break;
offset += ewah_bit_ctz64(word >> offset);
callback(pos + offset, data);
}
pos += BITS_IN_WORD;
}
}
}
size_t bitmap_popcount(struct bitmap *self)
{
size_t i, count = 0;
for (i = 0; i < self->word_alloc; ++i)
count += ewah_bit_popcount64(self->words[i]);
return count;
}
int bitmap_equals(struct bitmap *self, struct bitmap *other)
{
struct bitmap *big, *small;
size_t i;
if (self->word_alloc < other->word_alloc) {
small = self;
big = other;
} else {
small = other;
big = self;
}
for (i = 0; i < small->word_alloc; ++i) {
if (small->words[i] != big->words[i])
return 0;
}
for (; i < big->word_alloc; ++i) {
if (big->words[i] != 0)
return 0;
}
return 1;
}
void bitmap_reset(struct bitmap *bitmap)
{
memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t));
}
void bitmap_free(struct bitmap *bitmap)
{
if (bitmap == NULL)
return;
free(bitmap->words);
free(bitmap);
}

714
ewah/ewah_bitmap.c Normal file
Просмотреть файл

@ -0,0 +1,714 @@
/**
* Copyright 2013, GitHub, Inc
* Copyright 2009-2013, Daniel Lemire, Cliff Moon,
* David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include "git-compat-util.h"
#include "ewok.h"
#include "ewok_rlw.h"
static inline size_t min_size(size_t a, size_t b)
{
return a < b ? a : b;
}
static inline size_t max_size(size_t a, size_t b)
{
return a > b ? a : b;
}
static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size)
{
size_t rlw_offset = (uint8_t *)self->rlw - (uint8_t *)self->buffer;
if (self->alloc_size >= new_size)
return;
self->alloc_size = new_size;
self->buffer = ewah_realloc(self->buffer,
self->alloc_size * sizeof(eword_t));
self->rlw = self->buffer + (rlw_offset / sizeof(size_t));
}
static inline void buffer_push(struct ewah_bitmap *self, eword_t value)
{
if (self->buffer_size + 1 >= self->alloc_size)
buffer_grow(self, self->buffer_size * 3 / 2);
self->buffer[self->buffer_size++] = value;
}
static void buffer_push_rlw(struct ewah_bitmap *self, eword_t value)
{
buffer_push(self, value);
self->rlw = self->buffer + self->buffer_size - 1;
}
static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number)
{
size_t added = 0;
eword_t runlen, can_add;
if (rlw_get_run_bit(self->rlw) != v && rlw_size(self->rlw) == 0) {
rlw_set_run_bit(self->rlw, v);
} else if (rlw_get_literal_words(self->rlw) != 0 ||
rlw_get_run_bit(self->rlw) != v) {
buffer_push_rlw(self, 0);
if (v) rlw_set_run_bit(self->rlw, v);
added++;
}
runlen = rlw_get_running_len(self->rlw);
can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen);
rlw_set_running_len(self->rlw, runlen + can_add);
number -= can_add;
while (number >= RLW_LARGEST_RUNNING_COUNT) {
buffer_push_rlw(self, 0);
added++;
if (v) rlw_set_run_bit(self->rlw, v);
rlw_set_running_len(self->rlw, RLW_LARGEST_RUNNING_COUNT);
number -= RLW_LARGEST_RUNNING_COUNT;
}
if (number > 0) {
buffer_push_rlw(self, 0);
added++;
if (v) rlw_set_run_bit(self->rlw, v);
rlw_set_running_len(self->rlw, number);
}
return added;
}
size_t ewah_add_empty_words(struct ewah_bitmap *self, int v, size_t number)
{
if (number == 0)
return 0;
self->bit_size += number * BITS_IN_WORD;
return add_empty_words(self, v, number);
}
static size_t add_literal(struct ewah_bitmap *self, eword_t new_data)
{
eword_t current_num = rlw_get_literal_words(self->rlw);
if (current_num >= RLW_LARGEST_LITERAL_COUNT) {
buffer_push_rlw(self, 0);
rlw_set_literal_words(self->rlw, 1);
buffer_push(self, new_data);
return 2;
}
rlw_set_literal_words(self->rlw, current_num + 1);
/* sanity check */
assert(rlw_get_literal_words(self->rlw) == current_num + 1);
buffer_push(self, new_data);
return 1;
}
void ewah_add_dirty_words(
struct ewah_bitmap *self, const eword_t *buffer,
size_t number, int negate)
{
size_t literals, can_add;
while (1) {
literals = rlw_get_literal_words(self->rlw);
can_add = min_size(number, RLW_LARGEST_LITERAL_COUNT - literals);
rlw_set_literal_words(self->rlw, literals + can_add);
if (self->buffer_size + can_add >= self->alloc_size)
buffer_grow(self, (self->buffer_size + can_add) * 3 / 2);
if (negate) {
size_t i;
for (i = 0; i < can_add; ++i)
self->buffer[self->buffer_size++] = ~buffer[i];
} else {
memcpy(self->buffer + self->buffer_size,
buffer, can_add * sizeof(eword_t));
self->buffer_size += can_add;
}
self->bit_size += can_add * BITS_IN_WORD;
if (number - can_add == 0)
break;
buffer_push_rlw(self, 0);
buffer += can_add;
number -= can_add;
}
}
static size_t add_empty_word(struct ewah_bitmap *self, int v)
{
int no_literal = (rlw_get_literal_words(self->rlw) == 0);
eword_t run_len = rlw_get_running_len(self->rlw);
if (no_literal && run_len == 0) {
rlw_set_run_bit(self->rlw, v);
assert(rlw_get_run_bit(self->rlw) == v);
}
if (no_literal && rlw_get_run_bit(self->rlw) == v &&
run_len < RLW_LARGEST_RUNNING_COUNT) {
rlw_set_running_len(self->rlw, run_len + 1);
assert(rlw_get_running_len(self->rlw) == run_len + 1);
return 0;
} else {
buffer_push_rlw(self, 0);
assert(rlw_get_running_len(self->rlw) == 0);
assert(rlw_get_run_bit(self->rlw) == 0);
assert(rlw_get_literal_words(self->rlw) == 0);
rlw_set_run_bit(self->rlw, v);
assert(rlw_get_run_bit(self->rlw) == v);
rlw_set_running_len(self->rlw, 1);
assert(rlw_get_running_len(self->rlw) == 1);
assert(rlw_get_literal_words(self->rlw) == 0);
return 1;
}
}
size_t ewah_add(struct ewah_bitmap *self, eword_t word)
{
self->bit_size += BITS_IN_WORD;
if (word == 0)
return add_empty_word(self, 0);
if (word == (eword_t)(~0))
return add_empty_word(self, 1);
return add_literal(self, word);
}
void ewah_set(struct ewah_bitmap *self, size_t i)
{
const size_t dist =
(i + BITS_IN_WORD) / BITS_IN_WORD -
(self->bit_size + BITS_IN_WORD - 1) / BITS_IN_WORD;
assert(i >= self->bit_size);
self->bit_size = i + 1;
if (dist > 0) {
if (dist > 1)
add_empty_words(self, 0, dist - 1);
add_literal(self, (eword_t)1 << (i % BITS_IN_WORD));
return;
}
if (rlw_get_literal_words(self->rlw) == 0) {
rlw_set_running_len(self->rlw,
rlw_get_running_len(self->rlw) - 1);
add_literal(self, (eword_t)1 << (i % BITS_IN_WORD));
return;
}
self->buffer[self->buffer_size - 1] |=
((eword_t)1 << (i % BITS_IN_WORD));
/* check if we just completed a stream of 1s */
if (self->buffer[self->buffer_size - 1] == (eword_t)(~0)) {
self->buffer[--self->buffer_size] = 0;
rlw_set_literal_words(self->rlw,
rlw_get_literal_words(self->rlw) - 1);
add_empty_word(self, 1);
}
}
void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), void *payload)
{
size_t pos = 0;
size_t pointer = 0;
size_t k;
while (pointer < self->buffer_size) {
eword_t *word = &self->buffer[pointer];
if (rlw_get_run_bit(word)) {
size_t len = rlw_get_running_len(word) * BITS_IN_WORD;
for (k = 0; k < len; ++k, ++pos)
callback(pos, payload);
} else {
pos += rlw_get_running_len(word) * BITS_IN_WORD;
}
++pointer;
for (k = 0; k < rlw_get_literal_words(word); ++k) {
int c;
/* todo: zero count optimization */
for (c = 0; c < BITS_IN_WORD; ++c, ++pos) {
if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0)
callback(pos, payload);
}
++pointer;
}
}
}
struct ewah_bitmap *ewah_new(void)
{
struct ewah_bitmap *self;
self = ewah_malloc(sizeof(struct ewah_bitmap));
if (self == NULL)
return NULL;
self->buffer = ewah_malloc(32 * sizeof(eword_t));
self->alloc_size = 32;
ewah_clear(self);
return self;
}
void ewah_clear(struct ewah_bitmap *self)
{
self->buffer_size = 1;
self->buffer[0] = 0;
self->bit_size = 0;
self->rlw = self->buffer;
}
void ewah_free(struct ewah_bitmap *self)
{
if (!self)
return;
if (self->alloc_size)
free(self->buffer);
free(self);
}
static void read_new_rlw(struct ewah_iterator *it)
{
const eword_t *word = NULL;
it->literals = 0;
it->compressed = 0;
while (1) {
word = &it->buffer[it->pointer];
it->rl = rlw_get_running_len(word);
it->lw = rlw_get_literal_words(word);
it->b = rlw_get_run_bit(word);
if (it->rl || it->lw)
return;
if (it->pointer < it->buffer_size - 1) {
it->pointer++;
} else {
it->pointer = it->buffer_size;
return;
}
}
}
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it)
{
if (it->pointer >= it->buffer_size)
return 0;
if (it->compressed < it->rl) {
it->compressed++;
*next = it->b ? (eword_t)(~0) : 0;
} else {
assert(it->literals < it->lw);
it->literals++;
it->pointer++;
assert(it->pointer < it->buffer_size);
*next = it->buffer[it->pointer];
}
if (it->compressed == it->rl && it->literals == it->lw) {
if (++it->pointer < it->buffer_size)
read_new_rlw(it);
}
return 1;
}
void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
{
it->buffer = parent->buffer;
it->buffer_size = parent->buffer_size;
it->pointer = 0;
it->lw = 0;
it->rl = 0;
it->compressed = 0;
it->literals = 0;
it->b = 0;
if (it->pointer < it->buffer_size)
read_new_rlw(it);
}
void ewah_not(struct ewah_bitmap *self)
{
size_t pointer = 0;
while (pointer < self->buffer_size) {
eword_t *word = &self->buffer[pointer];
size_t literals, k;
rlw_xor_run_bit(word);
++pointer;
literals = rlw_get_literal_words(word);
for (k = 0; k < literals; ++k) {
self->buffer[pointer] = ~self->buffer[pointer];
++pointer;
}
}
}
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out)
{
struct rlw_iterator rlw_i;
struct rlw_iterator rlw_j;
size_t literals;
rlwit_init(&rlw_i, ewah_i);
rlwit_init(&rlw_j, ewah_j);
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
size_t index;
int negate_words;
if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
prey = &rlw_i;
predator = &rlw_j;
} else {
prey = &rlw_j;
predator = &rlw_i;
}
negate_words = !!predator->rlw.running_bit;
index = rlwit_discharge(prey, out,
predator->rlw.running_len, negate_words);
ewah_add_empty_words(out, negate_words,
predator->rlw.running_len - index);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
}
literals = min_size(
rlw_i.rlw.literal_words,
rlw_j.rlw.literal_words);
if (literals) {
size_t k;
for (k = 0; k < literals; ++k) {
ewah_add(out,
rlw_i.buffer[rlw_i.literal_word_start + k] ^
rlw_j.buffer[rlw_j.literal_word_start + k]
);
}
rlwit_discard_first_words(&rlw_i, literals);
rlwit_discard_first_words(&rlw_j, literals);
}
}
if (rlwit_word_size(&rlw_i) > 0)
rlwit_discharge(&rlw_i, out, ~0, 0);
else
rlwit_discharge(&rlw_j, out, ~0, 0);
out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
}
void ewah_and(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out)
{
struct rlw_iterator rlw_i;
struct rlw_iterator rlw_j;
size_t literals;
rlwit_init(&rlw_i, ewah_i);
rlwit_init(&rlw_j, ewah_j);
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
prey = &rlw_i;
predator = &rlw_j;
} else {
prey = &rlw_j;
predator = &rlw_i;
}
if (predator->rlw.running_bit == 0) {
ewah_add_empty_words(out, 0,
predator->rlw.running_len);
rlwit_discard_first_words(prey,
predator->rlw.running_len);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
} else {
size_t index = rlwit_discharge(prey, out,
predator->rlw.running_len, 0);
ewah_add_empty_words(out, 0,
predator->rlw.running_len - index);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
}
}
literals = min_size(
rlw_i.rlw.literal_words,
rlw_j.rlw.literal_words);
if (literals) {
size_t k;
for (k = 0; k < literals; ++k) {
ewah_add(out,
rlw_i.buffer[rlw_i.literal_word_start + k] &
rlw_j.buffer[rlw_j.literal_word_start + k]
);
}
rlwit_discard_first_words(&rlw_i, literals);
rlwit_discard_first_words(&rlw_j, literals);
}
}
if (rlwit_word_size(&rlw_i) > 0)
rlwit_discharge_empty(&rlw_i, out);
else
rlwit_discharge_empty(&rlw_j, out);
out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
}
void ewah_and_not(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out)
{
struct rlw_iterator rlw_i;
struct rlw_iterator rlw_j;
size_t literals;
rlwit_init(&rlw_i, ewah_i);
rlwit_init(&rlw_j, ewah_j);
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
prey = &rlw_i;
predator = &rlw_j;
} else {
prey = &rlw_j;
predator = &rlw_i;
}
if ((predator->rlw.running_bit && prey == &rlw_i) ||
(!predator->rlw.running_bit && prey != &rlw_i)) {
ewah_add_empty_words(out, 0,
predator->rlw.running_len);
rlwit_discard_first_words(prey,
predator->rlw.running_len);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
} else {
size_t index;
int negate_words;
negate_words = (&rlw_i != prey);
index = rlwit_discharge(prey, out,
predator->rlw.running_len, negate_words);
ewah_add_empty_words(out, negate_words,
predator->rlw.running_len - index);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
}
}
literals = min_size(
rlw_i.rlw.literal_words,
rlw_j.rlw.literal_words);
if (literals) {
size_t k;
for (k = 0; k < literals; ++k) {
ewah_add(out,
rlw_i.buffer[rlw_i.literal_word_start + k] &
~(rlw_j.buffer[rlw_j.literal_word_start + k])
);
}
rlwit_discard_first_words(&rlw_i, literals);
rlwit_discard_first_words(&rlw_j, literals);
}
}
if (rlwit_word_size(&rlw_i) > 0)
rlwit_discharge(&rlw_i, out, ~0, 0);
else
rlwit_discharge_empty(&rlw_j, out);
out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
}
void ewah_or(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out)
{
struct rlw_iterator rlw_i;
struct rlw_iterator rlw_j;
size_t literals;
rlwit_init(&rlw_i, ewah_i);
rlwit_init(&rlw_j, ewah_j);
while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
struct rlw_iterator *prey, *predator;
if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
prey = &rlw_i;
predator = &rlw_j;
} else {
prey = &rlw_j;
predator = &rlw_i;
}
if (predator->rlw.running_bit) {
ewah_add_empty_words(out, 0,
predator->rlw.running_len);
rlwit_discard_first_words(prey,
predator->rlw.running_len);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
} else {
size_t index = rlwit_discharge(prey, out,
predator->rlw.running_len, 0);
ewah_add_empty_words(out, 0,
predator->rlw.running_len - index);
rlwit_discard_first_words(predator,
predator->rlw.running_len);
}
}
literals = min_size(
rlw_i.rlw.literal_words,
rlw_j.rlw.literal_words);
if (literals) {
size_t k;
for (k = 0; k < literals; ++k) {
ewah_add(out,
rlw_i.buffer[rlw_i.literal_word_start + k] |
rlw_j.buffer[rlw_j.literal_word_start + k]
);
}
rlwit_discard_first_words(&rlw_i, literals);
rlwit_discard_first_words(&rlw_j, literals);
}
}
if (rlwit_word_size(&rlw_i) > 0)
rlwit_discharge(&rlw_i, out, ~0, 0);
else
rlwit_discharge(&rlw_j, out, ~0, 0);
out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
}
#define BITMAP_POOL_MAX 16
static struct ewah_bitmap *bitmap_pool[BITMAP_POOL_MAX];
static size_t bitmap_pool_size;
struct ewah_bitmap *ewah_pool_new(void)
{
if (bitmap_pool_size)
return bitmap_pool[--bitmap_pool_size];
return ewah_new();
}
void ewah_pool_free(struct ewah_bitmap *self)
{
if (self == NULL)
return;
if (bitmap_pool_size == BITMAP_POOL_MAX ||
self->alloc_size == 0) {
ewah_free(self);
return;
}
ewah_clear(self);
bitmap_pool[bitmap_pool_size++] = self;
}
uint32_t ewah_checksum(struct ewah_bitmap *self)
{
const uint8_t *p = (uint8_t *)self->buffer;
uint32_t crc = (uint32_t)self->bit_size;
size_t size = self->buffer_size * sizeof(eword_t);
while (size--)
crc = (crc << 5) - crc + (uint32_t)*p++;
return crc;
}

204
ewah/ewah_io.c Normal file
Просмотреть файл

@ -0,0 +1,204 @@
/**
* Copyright 2013, GitHub, Inc
* Copyright 2009-2013, Daniel Lemire, Cliff Moon,
* David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include "git-compat-util.h"
#include "ewok.h"
int ewah_serialize_native(struct ewah_bitmap *self, int fd)
{
uint32_t write32;
size_t to_write = self->buffer_size * 8;
/* 32 bit -- bit size for the map */
write32 = (uint32_t)self->bit_size;
if (write(fd, &write32, 4) != 4)
return -1;
/** 32 bit -- number of compressed 64-bit words */
write32 = (uint32_t)self->buffer_size;
if (write(fd, &write32, 4) != 4)
return -1;
if (write(fd, self->buffer, to_write) != to_write)
return -1;
/** 32 bit -- position for the RLW */
write32 = self->rlw - self->buffer;
if (write(fd, &write32, 4) != 4)
return -1;
return (3 * 4) + to_write;
}
int ewah_serialize_to(struct ewah_bitmap *self,
int (*write_fun)(void *, const void *, size_t),
void *data)
{
size_t i;
eword_t dump[2048];
const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
uint32_t bitsize, word_count, rlw_pos;
const eword_t *buffer;
size_t words_left;
/* 32 bit -- bit size for the map */
bitsize = htonl((uint32_t)self->bit_size);
if (write_fun(data, &bitsize, 4) != 4)
return -1;
/** 32 bit -- number of compressed 64-bit words */
word_count = htonl((uint32_t)self->buffer_size);
if (write_fun(data, &word_count, 4) != 4)
return -1;
/** 64 bit x N -- compressed words */
buffer = self->buffer;
words_left = self->buffer_size;
while (words_left >= words_per_dump) {
for (i = 0; i < words_per_dump; ++i, ++buffer)
dump[i] = htonll(*buffer);
if (write_fun(data, dump, sizeof(dump)) != sizeof(dump))
return -1;
words_left -= words_per_dump;
}
if (words_left) {
for (i = 0; i < words_left; ++i, ++buffer)
dump[i] = htonll(*buffer);
if (write_fun(data, dump, words_left * 8) != words_left * 8)
return -1;
}
/** 32 bit -- position for the RLW */
rlw_pos = (uint8_t*)self->rlw - (uint8_t *)self->buffer;
rlw_pos = htonl(rlw_pos / sizeof(eword_t));
if (write_fun(data, &rlw_pos, 4) != 4)
return -1;
return (3 * 4) + (self->buffer_size * 8);
}
static int write_helper(void *fd, const void *buf, size_t len)
{
return write((intptr_t)fd, buf, len);
}
int ewah_serialize(struct ewah_bitmap *self, int fd)
{
return ewah_serialize_to(self, write_helper, (void *)(intptr_t)fd);
}
int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len)
{
uint8_t *ptr = map;
size_t i;
self->bit_size = get_be32(ptr);
ptr += sizeof(uint32_t);
self->buffer_size = self->alloc_size = get_be32(ptr);
ptr += sizeof(uint32_t);
self->buffer = ewah_realloc(self->buffer,
self->alloc_size * sizeof(eword_t));
if (!self->buffer)
return -1;
/*
* Copy the raw data for the bitmap as a whole chunk;
* if we're in a little-endian platform, we'll perform
* the endianness conversion in a separate pass to ensure
* we're loading 8-byte aligned words.
*/
memcpy(self->buffer, ptr, self->buffer_size * sizeof(uint64_t));
ptr += self->buffer_size * sizeof(uint64_t);
for (i = 0; i < self->buffer_size; ++i)
self->buffer[i] = ntohll(self->buffer[i]);
self->rlw = self->buffer + get_be32(ptr);
return (3 * 4) + (self->buffer_size * 8);
}
int ewah_deserialize(struct ewah_bitmap *self, int fd)
{
size_t i;
eword_t dump[2048];
const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
uint32_t bitsize, word_count, rlw_pos;
eword_t *buffer = NULL;
size_t words_left;
ewah_clear(self);
/* 32 bit -- bit size for the map */
if (read(fd, &bitsize, 4) != 4)
return -1;
self->bit_size = (size_t)ntohl(bitsize);
/** 32 bit -- number of compressed 64-bit words */
if (read(fd, &word_count, 4) != 4)
return -1;
self->buffer_size = self->alloc_size = (size_t)ntohl(word_count);
self->buffer = ewah_realloc(self->buffer,
self->alloc_size * sizeof(eword_t));
if (!self->buffer)
return -1;
/** 64 bit x N -- compressed words */
buffer = self->buffer;
words_left = self->buffer_size;
while (words_left >= words_per_dump) {
if (read(fd, dump, sizeof(dump)) != sizeof(dump))
return -1;
for (i = 0; i < words_per_dump; ++i, ++buffer)
*buffer = ntohll(dump[i]);
words_left -= words_per_dump;
}
if (words_left) {
if (read(fd, dump, words_left * 8) != words_left * 8)
return -1;
for (i = 0; i < words_left; ++i, ++buffer)
*buffer = ntohll(dump[i]);
}
/** 32 bit -- position for the RLW */
if (read(fd, &rlw_pos, 4) != 4)
return -1;
self->rlw = self->buffer + ntohl(rlw_pos);
return 0;
}

115
ewah/ewah_rlw.c Normal file
Просмотреть файл

@ -0,0 +1,115 @@
/**
* Copyright 2013, GitHub, Inc
* Copyright 2009-2013, Daniel Lemire, Cliff Moon,
* David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include "git-compat-util.h"
#include "ewok.h"
#include "ewok_rlw.h"
static inline int next_word(struct rlw_iterator *it)
{
if (it->pointer >= it->size)
return 0;
it->rlw.word = &it->buffer[it->pointer];
it->pointer += rlw_get_literal_words(it->rlw.word) + 1;
it->rlw.literal_words = rlw_get_literal_words(it->rlw.word);
it->rlw.running_len = rlw_get_running_len(it->rlw.word);
it->rlw.running_bit = rlw_get_run_bit(it->rlw.word);
it->rlw.literal_word_offset = 0;
return 1;
}
void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *from_ewah)
{
it->buffer = from_ewah->buffer;
it->size = from_ewah->buffer_size;
it->pointer = 0;
next_word(it);
it->literal_word_start = rlwit_literal_words(it) +
it->rlw.literal_word_offset;
}
void rlwit_discard_first_words(struct rlw_iterator *it, size_t x)
{
while (x > 0) {
size_t discard;
if (it->rlw.running_len > x) {
it->rlw.running_len -= x;
return;
}
x -= it->rlw.running_len;
it->rlw.running_len = 0;
discard = (x > it->rlw.literal_words) ? it->rlw.literal_words : x;
it->literal_word_start += discard;
it->rlw.literal_words -= discard;
x -= discard;
if (x > 0 || rlwit_word_size(it) == 0) {
if (!next_word(it))
break;
it->literal_word_start =
rlwit_literal_words(it) + it->rlw.literal_word_offset;
}
}
}
size_t rlwit_discharge(
struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate)
{
size_t index = 0;
while (index < max && rlwit_word_size(it) > 0) {
size_t pd, pl = it->rlw.running_len;
if (index + pl > max)
pl = max - index;
ewah_add_empty_words(out, it->rlw.running_bit ^ negate, pl);
index += pl;
pd = it->rlw.literal_words;
if (pd + index > max)
pd = max - index;
ewah_add_dirty_words(out,
it->buffer + it->literal_word_start, pd, negate);
rlwit_discard_first_words(it, pd + pl);
index += pd;
}
return index;
}
void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out)
{
while (rlwit_word_size(it) > 0) {
ewah_add_empty_words(out, 0, rlwit_word_size(it));
rlwit_discard_first_words(it, rlwit_word_size(it));
}
}

233
ewah/ewok.h Normal file
Просмотреть файл

@ -0,0 +1,233 @@
/**
* Copyright 2013, GitHub, Inc
* Copyright 2009-2013, Daniel Lemire, Cliff Moon,
* David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef __EWOK_BITMAP_H__
#define __EWOK_BITMAP_H__
#ifndef ewah_malloc
# define ewah_malloc xmalloc
#endif
#ifndef ewah_realloc
# define ewah_realloc xrealloc
#endif
#ifndef ewah_calloc
# define ewah_calloc xcalloc
#endif
typedef uint64_t eword_t;
#define BITS_IN_WORD (sizeof(eword_t) * 8)
/**
* Do not use __builtin_popcountll. The GCC implementation
* is notoriously slow on all platforms.
*
* See: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
*/
static inline uint32_t ewah_bit_popcount64(uint64_t x)
{
x = (x & 0x5555555555555555ULL) + ((x >> 1) & 0x5555555555555555ULL);
x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL);
x = (x & 0x0F0F0F0F0F0F0F0FULL) + ((x >> 4) & 0x0F0F0F0F0F0F0F0FULL);
return (x * 0x0101010101010101ULL) >> 56;
}
#ifdef __GNUC__
#define ewah_bit_ctz64(x) __builtin_ctzll(x)
#else
static inline int ewah_bit_ctz64(uint64_t x)
{
int n = 0;
if ((x & 0xffffffff) == 0) { x >>= 32; n += 32; }
if ((x & 0xffff) == 0) { x >>= 16; n += 16; }
if ((x & 0xff) == 0) { x >>= 8; n += 8; }
if ((x & 0xf) == 0) { x >>= 4; n += 4; }
if ((x & 0x3) == 0) { x >>= 2; n += 2; }
if ((x & 0x1) == 0) { x >>= 1; n += 1; }
return n + !x;
}
#endif
struct ewah_bitmap {
eword_t *buffer;
size_t buffer_size;
size_t alloc_size;
size_t bit_size;
eword_t *rlw;
};
typedef void (*ewah_callback)(size_t pos, void *);
struct ewah_bitmap *ewah_pool_new(void);
void ewah_pool_free(struct ewah_bitmap *self);
/**
* Allocate a new EWAH Compressed bitmap
*/
struct ewah_bitmap *ewah_new(void);
/**
* Clear all the bits in the bitmap. Does not free or resize
* memory.
*/
void ewah_clear(struct ewah_bitmap *self);
/**
* Free all the memory of the bitmap
*/
void ewah_free(struct ewah_bitmap *self);
int ewah_serialize_to(struct ewah_bitmap *self,
int (*write_fun)(void *out, const void *buf, size_t len),
void *out);
int ewah_serialize(struct ewah_bitmap *self, int fd);
int ewah_serialize_native(struct ewah_bitmap *self, int fd);
int ewah_deserialize(struct ewah_bitmap *self, int fd);
int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len);
int ewah_read_mmap_native(struct ewah_bitmap *self, void *map, size_t len);
uint32_t ewah_checksum(struct ewah_bitmap *self);
/**
* Logical not (bitwise negation) in-place on the bitmap
*
* This operation is linear time based on the size of the bitmap.
*/
void ewah_not(struct ewah_bitmap *self);
/**
* Call the given callback with the position of every single bit
* that has been set on the bitmap.
*
* This is an efficient operation that does not fully decompress
* the bitmap.
*/
void ewah_each_bit(struct ewah_bitmap *self, ewah_callback callback, void *payload);
/**
* Set a given bit on the bitmap.
*
* The bit at position `pos` will be set to true. Because of the
* way that the bitmap is compressed, a set bit cannot be unset
* later on.
*
* Furthermore, since the bitmap uses streaming compression, bits
* can only set incrementally.
*
* E.g.
* ewah_set(bitmap, 1); // ok
* ewah_set(bitmap, 76); // ok
* ewah_set(bitmap, 77); // ok
* ewah_set(bitmap, 8712800127); // ok
* ewah_set(bitmap, 25); // failed, assert raised
*/
void ewah_set(struct ewah_bitmap *self, size_t i);
struct ewah_iterator {
const eword_t *buffer;
size_t buffer_size;
size_t pointer;
eword_t compressed, literals;
eword_t rl, lw;
int b;
};
/**
* Initialize a new iterator to run through the bitmap in uncompressed form.
*
* The iterator can be stack allocated. The underlying bitmap must not be freed
* before the iteration is over.
*
* E.g.
*
* struct ewah_bitmap *bitmap = ewah_new();
* struct ewah_iterator it;
*
* ewah_iterator_init(&it, bitmap);
*/
void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
/**
* Yield every single word in the bitmap in uncompressed form. This is:
* yield single words (32-64 bits) where each bit represents an actual
* bit from the bitmap.
*
* Return: true if a word was yield, false if there are no words left
*/
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
void ewah_or(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out);
void ewah_and_not(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out);
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out);
void ewah_and(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
struct ewah_bitmap *out);
/**
* Direct word access
*/
size_t ewah_add_empty_words(struct ewah_bitmap *self, int v, size_t number);
void ewah_add_dirty_words(
struct ewah_bitmap *self, const eword_t *buffer, size_t number, int negate);
size_t ewah_add(struct ewah_bitmap *self, eword_t word);
/**
* Uncompressed, old-school bitmap that can be efficiently compressed
* into an `ewah_bitmap`.
*/
struct bitmap {
eword_t *words;
size_t word_alloc;
};
struct bitmap *bitmap_new(void);
void bitmap_set(struct bitmap *self, size_t pos);
void bitmap_clear(struct bitmap *self, size_t pos);
int bitmap_get(struct bitmap *self, size_t pos);
void bitmap_reset(struct bitmap *self);
void bitmap_free(struct bitmap *self);
int bitmap_equals(struct bitmap *self, struct bitmap *other);
int bitmap_is_subset(struct bitmap *self, struct bitmap *super);
struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap);
struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah);
void bitmap_and_not(struct bitmap *self, struct bitmap *other);
void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
void bitmap_or(struct bitmap *self, const struct bitmap *other);
void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data);
size_t bitmap_popcount(struct bitmap *self);
#endif

114
ewah/ewok_rlw.h Normal file
Просмотреть файл

@ -0,0 +1,114 @@
/**
* Copyright 2013, GitHub, Inc
* Copyright 2009-2013, Daniel Lemire, Cliff Moon,
* David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef __EWOK_RLW_H__
#define __EWOK_RLW_H__
#define RLW_RUNNING_BITS (sizeof(eword_t) * 4)
#define RLW_LITERAL_BITS (sizeof(eword_t) * 8 - 1 - RLW_RUNNING_BITS)
#define RLW_LARGEST_RUNNING_COUNT (((eword_t)1 << RLW_RUNNING_BITS) - 1)
#define RLW_LARGEST_LITERAL_COUNT (((eword_t)1 << RLW_LITERAL_BITS) - 1)
#define RLW_LARGEST_RUNNING_COUNT_SHIFT (RLW_LARGEST_RUNNING_COUNT << 1)
#define RLW_RUNNING_LEN_PLUS_BIT (((eword_t)1 << (RLW_RUNNING_BITS + 1)) - 1)
static int rlw_get_run_bit(const eword_t *word)
{
return *word & (eword_t)1;
}
static inline void rlw_set_run_bit(eword_t *word, int b)
{
if (b) {
*word |= (eword_t)1;
} else {
*word &= (eword_t)(~1);
}
}
static inline void rlw_xor_run_bit(eword_t *word)
{
if (*word & 1) {
*word &= (eword_t)(~1);
} else {
*word |= (eword_t)1;
}
}
static inline void rlw_set_running_len(eword_t *word, eword_t l)
{
*word |= RLW_LARGEST_RUNNING_COUNT_SHIFT;
*word &= (l << 1) | (~RLW_LARGEST_RUNNING_COUNT_SHIFT);
}
static inline eword_t rlw_get_running_len(const eword_t *word)
{
return (*word >> 1) & RLW_LARGEST_RUNNING_COUNT;
}
static inline eword_t rlw_get_literal_words(const eword_t *word)
{
return *word >> (1 + RLW_RUNNING_BITS);
}
static inline void rlw_set_literal_words(eword_t *word, eword_t l)
{
*word |= ~RLW_RUNNING_LEN_PLUS_BIT;
*word &= (l << (RLW_RUNNING_BITS + 1)) | RLW_RUNNING_LEN_PLUS_BIT;
}
static inline eword_t rlw_size(const eword_t *self)
{
return rlw_get_running_len(self) + rlw_get_literal_words(self);
}
struct rlw_iterator {
const eword_t *buffer;
size_t size;
size_t pointer;
size_t literal_word_start;
struct {
const eword_t *word;
int literal_words;
int running_len;
int literal_word_offset;
int running_bit;
} rlw;
};
void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *bitmap);
void rlwit_discard_first_words(struct rlw_iterator *it, size_t x);
size_t rlwit_discharge(
struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate);
void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out);
static inline size_t rlwit_word_size(struct rlw_iterator *it)
{
return it->rlw.running_len + it->rlw.literal_words;
}
static inline size_t rlwit_literal_words(struct rlw_iterator *it)
{
return it->pointer - it->rlw.literal_words;
}
#endif

338
khash.h Normal file
Просмотреть файл

@ -0,0 +1,338 @@
/* The MIT License
Copyright (c) 2008, 2009, 2011 by Attractive Chaos <attractor@live.co.uk>
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/
#ifndef __AC_KHASH_H
#define __AC_KHASH_H
#define AC_VERSION_KHASH_H "0.2.8"
typedef uint32_t khint32_t;
typedef uint64_t khint64_t;
typedef khint32_t khint_t;
typedef khint_t khiter_t;
#define __ac_isempty(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&2)
#define __ac_isdel(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&1)
#define __ac_iseither(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&3)
#define __ac_set_isdel_false(flag, i) (flag[i>>4]&=~(1ul<<((i&0xfU)<<1)))
#define __ac_set_isempty_false(flag, i) (flag[i>>4]&=~(2ul<<((i&0xfU)<<1)))
#define __ac_set_isboth_false(flag, i) (flag[i>>4]&=~(3ul<<((i&0xfU)<<1)))
#define __ac_set_isdel_true(flag, i) (flag[i>>4]|=1ul<<((i&0xfU)<<1))
#define __ac_fsize(m) ((m) < 16? 1 : (m)>>4)
#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
static inline khint_t __ac_X31_hash_string(const char *s)
{
khint_t h = (khint_t)*s;
if (h) for (++s ; *s; ++s) h = (h << 5) - h + (khint_t)*s;
return h;
}
#define kh_str_hash_func(key) __ac_X31_hash_string(key)
#define kh_str_hash_equal(a, b) (strcmp(a, b) == 0)
static const double __ac_HASH_UPPER = 0.77;
#define __KHASH_TYPE(name, khkey_t, khval_t) \
typedef struct { \
khint_t n_buckets, size, n_occupied, upper_bound; \
khint32_t *flags; \
khkey_t *keys; \
khval_t *vals; \
} kh_##name##_t;
#define __KHASH_PROTOTYPES(name, khkey_t, khval_t) \
extern kh_##name##_t *kh_init_##name(void); \
extern void kh_destroy_##name(kh_##name##_t *h); \
extern void kh_clear_##name(kh_##name##_t *h); \
extern khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \
extern int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
extern khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \
extern void kh_del_##name(kh_##name##_t *h, khint_t x);
#define __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \
SCOPE kh_##name##_t *kh_init_##name(void) { \
return (kh_##name##_t*)xcalloc(1, sizeof(kh_##name##_t)); \
} \
SCOPE void kh_destroy_##name(kh_##name##_t *h) \
{ \
if (h) { \
free((void *)h->keys); free(h->flags); \
free((void *)h->vals); \
free(h); \
} \
} \
SCOPE void kh_clear_##name(kh_##name##_t *h) \
{ \
if (h && h->flags) { \
memset(h->flags, 0xaa, __ac_fsize(h->n_buckets) * sizeof(khint32_t)); \
h->size = h->n_occupied = 0; \
} \
} \
SCOPE khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key) \
{ \
if (h->n_buckets) { \
khint_t k, i, last, mask, step = 0; \
mask = h->n_buckets - 1; \
k = __hash_func(key); i = k & mask; \
last = i; \
while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \
i = (i + (++step)) & mask; \
if (i == last) return h->n_buckets; \
} \
return __ac_iseither(h->flags, i)? h->n_buckets : i; \
} else return 0; \
} \
SCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
{ /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \
khint32_t *new_flags = NULL; \
khint_t j = 1; \
{ \
kroundup32(new_n_buckets); \
if (new_n_buckets < 4) new_n_buckets = 4; \
if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0; /* requested size is too small */ \
else { /* hash table size to be changed (shrink or expand); rehash */ \
new_flags = (khint32_t*)xmalloc(__ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
if (!new_flags) return -1; \
memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
if (h->n_buckets < new_n_buckets) { /* expand */ \
khkey_t *new_keys = (khkey_t*)xrealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \
if (!new_keys) return -1; \
h->keys = new_keys; \
if (kh_is_map) { \
khval_t *new_vals = (khval_t*)xrealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \
if (!new_vals) return -1; \
h->vals = new_vals; \
} \
} /* otherwise shrink */ \
} \
} \
if (j) { /* rehashing is needed */ \
for (j = 0; j != h->n_buckets; ++j) { \
if (__ac_iseither(h->flags, j) == 0) { \
khkey_t key = h->keys[j]; \
khval_t val; \
khint_t new_mask; \
new_mask = new_n_buckets - 1; \
if (kh_is_map) val = h->vals[j]; \
__ac_set_isdel_true(h->flags, j); \
while (1) { /* kick-out process; sort of like in Cuckoo hashing */ \
khint_t k, i, step = 0; \
k = __hash_func(key); \
i = k & new_mask; \
while (!__ac_isempty(new_flags, i)) i = (i + (++step)) & new_mask; \
__ac_set_isempty_false(new_flags, i); \
if (i < h->n_buckets && __ac_iseither(h->flags, i) == 0) { /* kick out the existing element */ \
{ khkey_t tmp = h->keys[i]; h->keys[i] = key; key = tmp; } \
if (kh_is_map) { khval_t tmp = h->vals[i]; h->vals[i] = val; val = tmp; } \
__ac_set_isdel_true(h->flags, i); /* mark it as deleted in the old hash table */ \
} else { /* write the element and jump out of the loop */ \
h->keys[i] = key; \
if (kh_is_map) h->vals[i] = val; \
break; \
} \
} \
} \
} \
if (h->n_buckets > new_n_buckets) { /* shrink the hash table */ \
h->keys = (khkey_t*)xrealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \
if (kh_is_map) h->vals = (khval_t*)xrealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \
} \
free(h->flags); /* free the working space */ \
h->flags = new_flags; \
h->n_buckets = new_n_buckets; \
h->n_occupied = h->size; \
h->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \
} \
return 0; \
} \
SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \
{ \
khint_t x; \
if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
if (h->n_buckets > (h->size<<1)) { \
if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
*ret = -1; return h->n_buckets; \
} \
} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
*ret = -1; return h->n_buckets; \
} \
} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
{ \
khint_t k, i, site, last, mask = h->n_buckets - 1, step = 0; \
x = site = h->n_buckets; k = __hash_func(key); i = k & mask; \
if (__ac_isempty(h->flags, i)) x = i; /* for speed up */ \
else { \
last = i; \
while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \
if (__ac_isdel(h->flags, i)) site = i; \
i = (i + (++step)) & mask; \
if (i == last) { x = site; break; } \
} \
if (x == h->n_buckets) { \
if (__ac_isempty(h->flags, i) && site != h->n_buckets) x = site; \
else x = i; \
} \
} \
} \
if (__ac_isempty(h->flags, x)) { /* not present at all */ \
h->keys[x] = key; \
__ac_set_isboth_false(h->flags, x); \
++h->size; ++h->n_occupied; \
*ret = 1; \
} else if (__ac_isdel(h->flags, x)) { /* deleted */ \
h->keys[x] = key; \
__ac_set_isboth_false(h->flags, x); \
++h->size; \
*ret = 2; \
} else *ret = 0; /* Don't touch h->keys[x] if present and not deleted */ \
return x; \
} \
SCOPE void kh_del_##name(kh_##name##_t *h, khint_t x) \
{ \
if (x != h->n_buckets && !__ac_iseither(h->flags, x)) { \
__ac_set_isdel_true(h->flags, x); \
--h->size; \
} \
}
#define KHASH_DECLARE(name, khkey_t, khval_t) \
__KHASH_TYPE(name, khkey_t, khval_t) \
__KHASH_PROTOTYPES(name, khkey_t, khval_t)
#define KHASH_INIT2(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \
__KHASH_TYPE(name, khkey_t, khval_t) \
__KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)
#define KHASH_INIT(name, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \
KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)
/* Other convenient macros... */
/*! @function
@abstract Test whether a bucket contains data.
@param h Pointer to the hash table [khash_t(name)*]
@param x Iterator to the bucket [khint_t]
@return 1 if containing data; 0 otherwise [int]
*/
#define kh_exist(h, x) (!__ac_iseither((h)->flags, (x)))
/*! @function
@abstract Get key given an iterator
@param h Pointer to the hash table [khash_t(name)*]
@param x Iterator to the bucket [khint_t]
@return Key [type of keys]
*/
#define kh_key(h, x) ((h)->keys[x])
/*! @function
@abstract Get value given an iterator
@param h Pointer to the hash table [khash_t(name)*]
@param x Iterator to the bucket [khint_t]
@return Value [type of values]
@discussion For hash sets, calling this results in segfault.
*/
#define kh_val(h, x) ((h)->vals[x])
/*! @function
@abstract Alias of kh_val()
*/
#define kh_value(h, x) ((h)->vals[x])
/*! @function
@abstract Get the start iterator
@param h Pointer to the hash table [khash_t(name)*]
@return The start iterator [khint_t]
*/
#define kh_begin(h) (khint_t)(0)
/*! @function
@abstract Get the end iterator
@param h Pointer to the hash table [khash_t(name)*]
@return The end iterator [khint_t]
*/
#define kh_end(h) ((h)->n_buckets)
/*! @function
@abstract Get the number of elements in the hash table
@param h Pointer to the hash table [khash_t(name)*]
@return Number of elements in the hash table [khint_t]
*/
#define kh_size(h) ((h)->size)
/*! @function
@abstract Get the number of buckets in the hash table
@param h Pointer to the hash table [khash_t(name)*]
@return Number of buckets in the hash table [khint_t]
*/
#define kh_n_buckets(h) ((h)->n_buckets)
/*! @function
@abstract Iterate over the entries in the hash table
@param h Pointer to the hash table [khash_t(name)*]
@param kvar Variable to which key will be assigned
@param vvar Variable to which value will be assigned
@param code Block of code to execute
*/
#define kh_foreach(h, kvar, vvar, code) { khint_t __i; \
for (__i = kh_begin(h); __i != kh_end(h); ++__i) { \
if (!kh_exist(h,__i)) continue; \
(kvar) = kh_key(h,__i); \
(vvar) = kh_val(h,__i); \
code; \
} }
/*! @function
@abstract Iterate over the values in the hash table
@param h Pointer to the hash table [khash_t(name)*]
@param vvar Variable to which value will be assigned
@param code Block of code to execute
*/
#define kh_foreach_value(h, vvar, code) { khint_t __i; \
for (__i = kh_begin(h); __i != kh_end(h); ++__i) { \
if (!kh_exist(h,__i)) continue; \
(vvar) = kh_val(h,__i); \
code; \
} }
static inline khint_t __kh_oid_hash(const unsigned char *oid)
{
khint_t hash;
memcpy(&hash, oid, sizeof(hash));
return hash;
}
#define __kh_oid_cmp(a, b) (hashcmp(a, b) == 0)
KHASH_INIT(sha1, const unsigned char *, void *, 1, __kh_oid_hash, __kh_oid_cmp)
typedef kh_sha1_t khash_sha1;
KHASH_INIT(sha1_pos, const unsigned char *, int, 1, __kh_oid_hash, __kh_oid_cmp)
typedef kh_sha1_pos_t khash_sha1_pos;
#endif /* __AC_KHASH_H */

552
pack-bitmap-write.c Normal file
Просмотреть файл

@ -0,0 +1,552 @@
#include "cache.h"
#include "commit.h"
#include "tag.h"
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
#include "progress.h"
#include "pack-revindex.h"
#include "pack.h"
#include "pack-bitmap.h"
#include "sha1-lookup.h"
#include "pack-objects.h"
struct bitmapped_commit {
struct commit *commit;
struct ewah_bitmap *bitmap;
struct ewah_bitmap *write_as;
int flags;
int xor_offset;
uint32_t commit_pos;
};
struct bitmap_writer {
struct ewah_bitmap *commits;
struct ewah_bitmap *trees;
struct ewah_bitmap *blobs;
struct ewah_bitmap *tags;
khash_sha1 *bitmaps;
khash_sha1 *reused;
struct packing_data *to_pack;
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;
struct progress *progress;
int show_progress;
unsigned char pack_checksum[20];
};
static struct bitmap_writer writer;
void bitmap_writer_show_progress(int show)
{
writer.show_progress = show;
}
/**
* Build the initial type index for the packfile
*/
void bitmap_writer_build_type_index(struct pack_idx_entry **index,
uint32_t index_nr)
{
uint32_t i;
writer.commits = ewah_new();
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
entry->in_pack_pos = i;
switch (entry->type) {
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
real_type = entry->type;
break;
default:
real_type = sha1_object_info(entry->idx.sha1, NULL);
break;
}
switch (real_type) {
case OBJ_COMMIT:
ewah_set(writer.commits, i);
break;
case OBJ_TREE:
ewah_set(writer.trees, i);
break;
case OBJ_BLOB:
ewah_set(writer.blobs, i);
break;
case OBJ_TAG:
ewah_set(writer.tags, i);
break;
default:
die("Missing type information for %s (%d/%d)",
sha1_to_hex(entry->idx.sha1), real_type, entry->type);
}
}
}
/**
* Compute the actual bitmaps
*/
static struct object **seen_objects;
static unsigned int seen_objects_nr, seen_objects_alloc;
static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused)
{
if (writer.selected_nr >= writer.selected_alloc) {
writer.selected_alloc = (writer.selected_alloc + 32) * 2;
writer.selected = xrealloc(writer.selected,
writer.selected_alloc * sizeof(struct bitmapped_commit));
}
writer.selected[writer.selected_nr].commit = commit;
writer.selected[writer.selected_nr].bitmap = reused;
writer.selected[writer.selected_nr].flags = 0;
writer.selected_nr++;
}
static inline void mark_as_seen(struct object *object)
{
ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc);
seen_objects[seen_objects_nr++] = object;
}
static inline void reset_all_seen(void)
{
unsigned int i;
for (i = 0; i < seen_objects_nr; ++i) {
seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN);
}
seen_objects_nr = 0;
}
static uint32_t find_object_pos(const unsigned char *sha1)
{
struct object_entry *entry = packlist_find(writer.to_pack, sha1, NULL);
if (!entry) {
die("Failed to write bitmap index. Packfile doesn't have full closure "
"(object %s is missing)", sha1_to_hex(sha1));
}
return entry->in_pack_pos;
}
static void show_object(struct object *object, const struct name_path *path,
const char *last, void *data)
{
struct bitmap *base = data;
bitmap_set(base, find_object_pos(object->sha1));
mark_as_seen(object);
}
static void show_commit(struct commit *commit, void *data)
{
mark_as_seen((struct object *)commit);
}
static int
add_to_include_set(struct bitmap *base, struct commit *commit)
{
khiter_t hash_pos;
uint32_t bitmap_pos = find_object_pos(commit->object.sha1);
if (bitmap_get(base, bitmap_pos))
return 0;
hash_pos = kh_get_sha1(writer.bitmaps, commit->object.sha1);
if (hash_pos < kh_end(writer.bitmaps)) {
struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos);
bitmap_or_ewah(base, bc->bitmap);
return 0;
}
bitmap_set(base, bitmap_pos);
return 1;
}
static int
should_include(struct commit *commit, void *_data)
{
struct bitmap *base = _data;
if (!add_to_include_set(base, commit)) {
struct commit_list *parent = commit->parents;
mark_as_seen((struct object *)commit);
while (parent) {
parent->item->object.flags |= SEEN;
mark_as_seen((struct object *)parent->item);
parent = parent->next;
}
return 0;
}
return 1;
}
static void compute_xor_offsets(void)
{
static const int MAX_XOR_OFFSET_SEARCH = 10;
int i, next = 0;
while (next < writer.selected_nr) {
struct bitmapped_commit *stored = &writer.selected[next];
int best_offset = 0;
struct ewah_bitmap *best_bitmap = stored->bitmap;
struct ewah_bitmap *test_xor;
for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) {
int curr = next - i;
if (curr < 0)
break;
test_xor = ewah_pool_new();
ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor);
if (test_xor->buffer_size < best_bitmap->buffer_size) {
if (best_bitmap != stored->bitmap)
ewah_pool_free(best_bitmap);
best_bitmap = test_xor;
best_offset = i;
} else {
ewah_pool_free(test_xor);
}
}
stored->xor_offset = best_offset;
stored->write_as = best_bitmap;
next++;
}
}
void bitmap_writer_build(struct packing_data *to_pack)
{
static const double REUSE_BITMAP_THRESHOLD = 0.2;
int i, reuse_after, need_reset;
struct bitmap *base = bitmap_new();
struct rev_info revs;
writer.bitmaps = kh_init_sha1();
writer.to_pack = to_pack;
if (writer.show_progress)
writer.progress = start_progress("Building bitmaps", writer.selected_nr);
init_revisions(&revs, NULL);
revs.tag_objects = 1;
revs.tree_objects = 1;
revs.blob_objects = 1;
revs.no_walk = 0;
revs.include_check = should_include;
reset_revision_walk();
reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD;
need_reset = 0;
for (i = writer.selected_nr - 1; i >= 0; --i) {
struct bitmapped_commit *stored;
struct object *object;
khiter_t hash_pos;
int hash_ret;
stored = &writer.selected[i];
object = (struct object *)stored->commit;
if (stored->bitmap == NULL) {
if (i < writer.selected_nr - 1 &&
(need_reset ||
!in_merge_bases(writer.selected[i + 1].commit,
stored->commit))) {
bitmap_reset(base);
reset_all_seen();
}
add_pending_object(&revs, object, "");
revs.include_check_data = base;
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
traverse_commit_list(&revs, show_commit, show_object, base);
revs.pending.nr = 0;
revs.pending.alloc = 0;
revs.pending.objects = NULL;
stored->bitmap = bitmap_to_ewah(base);
need_reset = 0;
} else
need_reset = 1;
if (i >= reuse_after)
stored->flags |= BITMAP_FLAG_REUSE;
hash_pos = kh_put_sha1(writer.bitmaps, object->sha1, &hash_ret);
if (hash_ret == 0)
die("Duplicate entry when writing index: %s",
sha1_to_hex(object->sha1));
kh_value(writer.bitmaps, hash_pos) = stored;
display_progress(writer.progress, writer.selected_nr - i);
}
bitmap_free(base);
stop_progress(&writer.progress);
compute_xor_offsets();
}
/**
* Select the commits that will be bitmapped
*/
static inline unsigned int next_commit_index(unsigned int idx)
{
static const unsigned int MIN_COMMITS = 100;
static const unsigned int MAX_COMMITS = 5000;
static const unsigned int MUST_REGION = 100;
static const unsigned int MIN_REGION = 20000;
unsigned int offset, next;
if (idx <= MUST_REGION)
return 0;
if (idx <= MIN_REGION) {
offset = idx - MUST_REGION;
return (offset < MIN_COMMITS) ? offset : MIN_COMMITS;
}
offset = idx - MIN_REGION;
next = (offset < MAX_COMMITS) ? offset : MAX_COMMITS;
return (next > MIN_COMMITS) ? next : MIN_COMMITS;
}
static int date_compare(const void *_a, const void *_b)
{
struct commit *a = *(struct commit **)_a;
struct commit *b = *(struct commit **)_b;
return (long)b->date - (long)a->date;
}
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack)
{
if (prepare_bitmap_git() < 0)
return;
writer.reused = kh_init_sha1();
rebuild_existing_bitmaps(to_pack, writer.reused, writer.show_progress);
}
static struct ewah_bitmap *find_reused_bitmap(const unsigned char *sha1)
{
khiter_t hash_pos;
if (!writer.reused)
return NULL;
hash_pos = kh_get_sha1(writer.reused, sha1);
if (hash_pos >= kh_end(writer.reused))
return NULL;
return kh_value(writer.reused, hash_pos);
}
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr,
int max_bitmaps)
{
unsigned int i = 0, j, next;
qsort(indexed_commits, indexed_commits_nr, sizeof(indexed_commits[0]),
date_compare);
if (writer.show_progress)
writer.progress = start_progress("Selecting bitmap commits", 0);
if (indexed_commits_nr < 100) {
for (i = 0; i < indexed_commits_nr; ++i)
push_bitmapped_commit(indexed_commits[i], NULL);
return;
}
for (;;) {
struct ewah_bitmap *reused_bitmap = NULL;
struct commit *chosen = NULL;
next = next_commit_index(i);
if (i + next >= indexed_commits_nr)
break;
if (max_bitmaps > 0 && writer.selected_nr >= max_bitmaps) {
writer.selected_nr = max_bitmaps;
break;
}
if (next == 0) {
chosen = indexed_commits[i];
reused_bitmap = find_reused_bitmap(chosen->object.sha1);
} else {
chosen = indexed_commits[i + next];
for (j = 0; j <= next; ++j) {
struct commit *cm = indexed_commits[i + j];
reused_bitmap = find_reused_bitmap(cm->object.sha1);
if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) {
chosen = cm;
break;
}
if (cm->parents && cm->parents->next)
chosen = cm;
}
}
push_bitmapped_commit(chosen, reused_bitmap);
i += next + 1;
display_progress(writer.progress, i);
}
stop_progress(&writer.progress);
}
static int sha1write_ewah_helper(void *f, const void *buf, size_t len)
{
/* sha1write will die on error */
sha1write(f, buf, len);
return len;
}
/**
* Write the bitmap index to disk
*/
static inline void dump_bitmap(struct sha1file *f, struct ewah_bitmap *bitmap)
{
if (ewah_serialize_to(bitmap, sha1write_ewah_helper, f) < 0)
die("Failed to write bitmap index");
}
static const unsigned char *sha1_access(size_t pos, void *table)
{
struct pack_idx_entry **index = table;
return index[pos]->sha1;
}
static void write_selected_commits_v1(struct sha1file *f,
struct pack_idx_entry **index,
uint32_t index_nr)
{
int i;
for (i = 0; i < writer.selected_nr; ++i) {
struct bitmapped_commit *stored = &writer.selected[i];
struct bitmap_disk_entry on_disk;
int commit_pos =
sha1_pos(stored->commit->object.sha1, index, index_nr, sha1_access);
if (commit_pos < 0)
die("BUG: trying to write commit not in index");
on_disk.object_pos = htonl(commit_pos);
on_disk.xor_offset = stored->xor_offset;
on_disk.flags = stored->flags;
sha1write(f, &on_disk, sizeof(on_disk));
dump_bitmap(f, stored->write_as);
}
}
static void write_hash_cache(struct sha1file *f,
struct pack_idx_entry **index,
uint32_t index_nr)
{
uint32_t i;
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
uint32_t hash_value = htonl(entry->hash);
sha1write(f, &hash_value, sizeof(hash_value));
}
}
void bitmap_writer_set_checksum(unsigned char *sha1)
{
hashcpy(writer.pack_checksum, sha1);
}
void bitmap_writer_finish(struct pack_idx_entry **index,
uint32_t index_nr,
const char *filename,
uint16_t options)
{
static char tmp_file[PATH_MAX];
static uint16_t default_version = 1;
static uint16_t flags = BITMAP_OPT_FULL_DAG;
struct sha1file *f;
struct bitmap_disk_header header;
int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX");
if (fd < 0)
die_errno("unable to create '%s'", tmp_file);
f = sha1fd(fd, tmp_file);
memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
header.version = htons(default_version);
header.options = htons(flags | options);
header.entry_count = htonl(writer.selected_nr);
memcpy(header.checksum, writer.pack_checksum, 20);
sha1write(f, &header, sizeof(header));
dump_bitmap(f, writer.commits);
dump_bitmap(f, writer.trees);
dump_bitmap(f, writer.blobs);
dump_bitmap(f, writer.tags);
write_selected_commits_v1(f, index, index_nr);
if (options & BITMAP_OPT_HASH_CACHE)
write_hash_cache(f, index, index_nr);
sha1close(f, NULL, CSUM_FSYNC);
if (adjust_shared_perm(tmp_file))
die_errno("unable to make temporary bitmap file readable");
if (rename(tmp_file, filename))
die_errno("unable to rename temporary bitmap file to '%s'", filename);
}

1073
pack-bitmap.c Normal file

Разница между файлами не показана из-за своего большого размера Загрузить разницу

64
pack-bitmap.h Normal file
Просмотреть файл

@ -0,0 +1,64 @@
#ifndef PACK_BITMAP_H
#define PACK_BITMAP_H
#include "ewah/ewok.h"
#include "khash.h"
#include "pack-objects.h"
struct bitmap_disk_entry {
uint32_t object_pos;
uint8_t xor_offset;
uint8_t flags;
} __attribute__((packed));
struct bitmap_disk_header {
char magic[4];
uint16_t version;
uint16_t options;
uint32_t entry_count;
unsigned char checksum[20];
};
static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
#define NEEDS_BITMAP (1u<<22)
enum pack_bitmap_opts {
BITMAP_OPT_FULL_DAG = 1,
BITMAP_OPT_HASH_CACHE = 4,
};
enum pack_bitmap_flags {
BITMAP_FLAG_REUSE = 0x1
};
typedef int (*show_reachable_fn)(
const unsigned char *sha1,
enum object_type type,
int flags,
uint32_t hash,
struct packed_git *found_pack,
off_t found_offset);
int prepare_bitmap_git(void);
void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees, uint32_t *blobs, uint32_t *tags);
void traverse_bitmap_commit_list(show_reachable_fn show_reachable);
void test_bitmap_walk(struct rev_info *revs);
char *pack_bitmap_filename(struct packed_git *p);
int prepare_bitmap_walk(struct rev_info *revs);
int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *entries, off_t *up_to);
int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress);
void bitmap_writer_show_progress(int show);
void bitmap_writer_set_checksum(unsigned char *sha1);
void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr);
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
void bitmap_writer_build(struct packing_data *to_pack);
void bitmap_writer_finish(struct pack_idx_entry **index,
uint32_t index_nr,
const char *filename,
uint16_t options);
#endif

111
pack-objects.c Normal file
Просмотреть файл

@ -0,0 +1,111 @@
#include "cache.h"
#include "object.h"
#include "pack.h"
#include "pack-objects.h"
static uint32_t locate_object_entry_hash(struct packing_data *pdata,
const unsigned char *sha1,
int *found)
{
uint32_t i, hash, mask = (pdata->index_size - 1);
memcpy(&hash, sha1, sizeof(uint32_t));
i = hash & mask;
while (pdata->index[i] > 0) {
uint32_t pos = pdata->index[i] - 1;
if (!hashcmp(sha1, pdata->objects[pos].idx.sha1)) {
*found = 1;
return i;
}
i = (i + 1) & mask;
}
*found = 0;
return i;
}
static inline uint32_t closest_pow2(uint32_t v)
{
v = v - 1;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
return v + 1;
}
static void rehash_objects(struct packing_data *pdata)
{
uint32_t i;
struct object_entry *entry;
pdata->index_size = closest_pow2(pdata->nr_objects * 3);
if (pdata->index_size < 1024)
pdata->index_size = 1024;
pdata->index = xrealloc(pdata->index, sizeof(uint32_t) * pdata->index_size);
memset(pdata->index, 0, sizeof(int) * pdata->index_size);
entry = pdata->objects;
for (i = 0; i < pdata->nr_objects; i++) {
int found;
uint32_t ix = locate_object_entry_hash(pdata, entry->idx.sha1, &found);
if (found)
die("BUG: Duplicate object in hash");
pdata->index[ix] = i + 1;
entry++;
}
}
struct object_entry *packlist_find(struct packing_data *pdata,
const unsigned char *sha1,
uint32_t *index_pos)
{
uint32_t i;
int found;
if (!pdata->index_size)
return NULL;
i = locate_object_entry_hash(pdata, sha1, &found);
if (index_pos)
*index_pos = i;
if (!found)
return NULL;
return &pdata->objects[pdata->index[i] - 1];
}
struct object_entry *packlist_alloc(struct packing_data *pdata,
const unsigned char *sha1,
uint32_t index_pos)
{
struct object_entry *new_entry;
if (pdata->nr_objects >= pdata->nr_alloc) {
pdata->nr_alloc = (pdata->nr_alloc + 1024) * 3 / 2;
pdata->objects = xrealloc(pdata->objects,
pdata->nr_alloc * sizeof(*new_entry));
}
new_entry = pdata->objects + pdata->nr_objects++;
memset(new_entry, 0, sizeof(*new_entry));
hashcpy(new_entry->idx.sha1, sha1);
if (pdata->index_size * 3 <= pdata->nr_objects * 4)
rehash_objects(pdata);
else
pdata->index[index_pos] = pdata->nr_objects;
return new_entry;
}

68
pack-objects.h Normal file
Просмотреть файл

@ -0,0 +1,68 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
struct packed_git *in_pack; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
struct object_entry *delta_sibling; /* other deltified objects who
* uses the same base as me
*/
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
enum object_type type;
enum object_type in_pack_type; /* could be delta */
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
};
struct packing_data {
struct object_entry *objects;
uint32_t nr_objects, nr_alloc;
int32_t *index;
uint32_t index_size;
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
const unsigned char *sha1,
uint32_t index_pos);
struct object_entry *packlist_find(struct packing_data *pdata,
const unsigned char *sha1,
uint32_t *index_pos);
static inline uint32_t pack_name_hash(const char *name)
{
uint32_t c, hash = 0;
if (!name)
return 0;
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
return hash;
}
#endif

Просмотреть файл

@ -16,11 +16,6 @@
* get the object sha1 from the main index.
*/
struct pack_revindex {
struct packed_git *p;
struct revindex_entry *revindex;
};
static struct pack_revindex *pack_revindex;
static int pack_revindex_hashsz;
@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex *rix)
sort_revindex(rix->revindex, num_ent, p->pack_size);
}
struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
struct pack_revindex *revindex_for_pack(struct packed_git *p)
{
int num;
unsigned lo, hi;
struct pack_revindex *rix;
struct revindex_entry *revindex;
if (!pack_revindex_hashsz)
init_pack_revindex();
num = pack_revindex_ix(p);
if (num < 0)
die("internal error: pack revindex fubar");
@ -217,30 +211,37 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
rix = &pack_revindex[num];
if (!rix->revindex)
create_pack_revindex(rix);
revindex = rix->revindex;
lo = 0;
hi = p->num_objects + 1;
return rix;
}
int find_revindex_position(struct pack_revindex *pridx, off_t ofs)
{
int lo = 0;
int hi = pridx->p->num_objects + 1;
struct revindex_entry *revindex = pridx->revindex;
do {
unsigned mi = lo + (hi - lo) / 2;
if (revindex[mi].offset == ofs) {
return revindex + mi;
return mi;
} else if (ofs < revindex[mi].offset)
hi = mi;
else
lo = mi + 1;
} while (lo < hi);
error("bad offset for revindex");
return NULL;
return -1;
}
void discard_revindex(void)
struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
{
if (pack_revindex_hashsz) {
int i;
for (i = 0; i < pack_revindex_hashsz; i++)
free(pack_revindex[i].revindex);
free(pack_revindex);
pack_revindex_hashsz = 0;
}
struct pack_revindex *pridx = revindex_for_pack(p);
int pos = find_revindex_position(pridx, ofs);
if (pos < 0)
return NULL;
return pridx->revindex + pos;
}

Просмотреть файл

@ -6,7 +6,14 @@ struct revindex_entry {
unsigned int nr;
};
struct pack_revindex {
struct packed_git *p;
struct revindex_entry *revindex;
};
struct pack_revindex *revindex_for_pack(struct packed_git *p);
int find_revindex_position(struct pack_revindex *pridx, off_t ofs);
struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs);
void discard_revindex(void);
#endif

Просмотреть файл

@ -364,5 +364,7 @@ void finish_tmp_packfile(char *name_buffer,
if (rename(idx_tmp_name, name_buffer))
die_errno("unable to rename temporary index file");
*end_of_name_prefix = '\0';
free((void *)idx_tmp_name);
}

Просмотреть файл

@ -1314,26 +1314,6 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file());
}
#ifndef NEEDS_ALIGNED_ACCESS
#define ntoh_s(var) ntohs(var)
#define ntoh_l(var) ntohl(var)
#else
static inline uint16_t ntoh_s_force_align(void *p)
{
uint16_t x;
memcpy(&x, p, sizeof(x));
return ntohs(x);
}
static inline uint32_t ntoh_l_force_align(void *p)
{
uint32_t x;
memcpy(&x, p, sizeof(x));
return ntohl(x);
}
#define ntoh_s(var) ntoh_s_force_align(&(var))
#define ntoh_l(var) ntoh_l_force_align(&(var))
#endif
static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
unsigned int flags,
const char *name,
@ -1341,16 +1321,16 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
{
struct cache_entry *ce = xmalloc(cache_entry_size(len));
ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);
ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);
ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);
ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
ce->ce_stat_data.sd_dev = ntoh_l(ondisk->dev);
ce->ce_stat_data.sd_ino = ntoh_l(ondisk->ino);
ce->ce_mode = ntoh_l(ondisk->mode);
ce->ce_stat_data.sd_uid = ntoh_l(ondisk->uid);
ce->ce_stat_data.sd_gid = ntoh_l(ondisk->gid);
ce->ce_stat_data.sd_size = ntoh_l(ondisk->size);
ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec);
ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec);
ce->ce_stat_data.sd_dev = get_be32(&ondisk->dev);
ce->ce_stat_data.sd_ino = get_be32(&ondisk->ino);
ce->ce_mode = get_be32(&ondisk->mode);
ce->ce_stat_data.sd_uid = get_be32(&ondisk->uid);
ce->ce_stat_data.sd_gid = get_be32(&ondisk->gid);
ce->ce_stat_data.sd_size = get_be32(&ondisk->size);
ce->ce_flags = flags & ~CE_NAMEMASK;
ce->ce_namelen = len;
hashcpy(ce->sha1, ondisk->sha1);
@ -1390,14 +1370,14 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
unsigned int flags;
/* On-disk flags are just 16 bits */
flags = ntoh_s(ondisk->flags);
flags = get_be16(&ondisk->flags);
len = flags & CE_NAMEMASK;
if (flags & CE_EXTENDED) {
struct ondisk_cache_entry_extended *ondisk2;
int extended_flags;
ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
extended_flags = ntoh_s(ondisk2->flags2) << 16;
extended_flags = get_be16(&ondisk2->flags2) << 16;
/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
if (extended_flags & ~CE_EXTENDED_FLAGS)
die("Unknown index entry format %08x", extended_flags);

Просмотреть файл

@ -774,6 +774,10 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
return 0;
commit->object.flags |= ADDED;
if (revs->include_check &&
!revs->include_check(commit, revs->include_check_data))
return 0;
/*
* If the commit is uninteresting, don't try to
* prune parents - we want the maximal uninteresting

Просмотреть файл

@ -172,6 +172,8 @@ struct rev_info {
unsigned long min_age;
int min_parents;
int max_parents;
int (*include_check)(struct commit *, void *);
void *include_check_data;
/* diff info for patches and for paths limiting */
struct diff_options diffopt;

Просмотреть файл

@ -252,8 +252,6 @@ char *sha1_pack_index_name(const unsigned char *sha1)
struct alternate_object_database *alt_odb_list;
static struct alternate_object_database **alt_odb_tail;
static int git_open_noatime(const char *name);
/*
* Prepare alternate object database registry.
*
@ -1232,6 +1230,7 @@ static void prepare_packed_git_one(char *objdir, int local)
if (has_extension(de->d_name, ".idx") ||
has_extension(de->d_name, ".pack") ||
has_extension(de->d_name, ".bitmap") ||
has_extension(de->d_name, ".keep"))
string_list_append(&garbage, path);
else
@ -1316,7 +1315,6 @@ void prepare_packed_git(void)
void reprepare_packed_git(void)
{
discard_revindex();
prepare_packed_git_run_once = 0;
prepare_packed_git();
}
@ -1393,7 +1391,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
}
static int git_open_noatime(const char *name)
int git_open_noatime(const char *name)
{
static int sha1_file_open_flag = O_NOATIME;

57
t/perf/p5310-pack-bitmaps.sh Executable file
Просмотреть файл

@ -0,0 +1,57 @@
#!/bin/sh
test_description='Tests pack performance using bitmaps'
. ./perf-lib.sh
test_perf_large_repo
# note that we do everything through config,
# since we want to be able to compare bitmap-aware
# git versus non-bitmap git
test_expect_success 'setup bitmap config' '
git config pack.writebitmaps true &&
git config pack.writebitmaphashcache true
'
test_perf 'repack to disk' '
git repack -ad
'
test_perf 'simulated clone' '
git pack-objects --stdout --all </dev/null >/dev/null
'
test_perf 'simulated fetch' '
have=$(git rev-list HEAD~100 -1) &&
{
echo HEAD &&
echo ^$have
} | git pack-objects --revs --stdout >/dev/null
'
test_expect_success 'create partial bitmap state' '
# pick a commit to represent the repo tip in the past
cutoff=$(git rev-list HEAD~100 -1) &&
orig_tip=$(git rev-parse HEAD) &&
# now kill off all of the refs and pretend we had
# just the one tip
rm -rf .git/logs .git/refs/* .git/packed-refs
git update-ref HEAD $cutoff
# and then repack, which will leave us with a nice
# big bitmap pack of the "old" history, and all of
# the new history will be loose, as if it had been pushed
# up incrementally and exploded via unpack-objects
git repack -Ad
# and now restore our original tip, as if the pushes
# had happened
git update-ref HEAD $orig_tip
'
test_perf 'partial bitmap' '
git pack-objects --stdout --all </dev/null >/dev/null
'
test_done

139
t/t5310-pack-bitmaps.sh Executable file
Просмотреть файл

@ -0,0 +1,139 @@
#!/bin/sh
test_description='exercise basic bitmap functionality'
. ./test-lib.sh
test_expect_success 'setup repo with moderate-sized history' '
for i in $(test_seq 1 10); do
test_commit $i
done &&
git checkout -b other HEAD~5 &&
for i in $(test_seq 1 10); do
test_commit side-$i
done &&
git checkout master &&
blob=$(echo tagged-blob | git hash-object -w --stdin) &&
git tag tagged-blob $blob &&
git config pack.writebitmaps true &&
git config pack.writebitmaphashcache true
'
test_expect_success 'full repack creates bitmaps' '
git repack -ad &&
ls .git/objects/pack/ | grep bitmap >output &&
test_line_count = 1 output
'
test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
git rev-list --test-bitmap HEAD
'
rev_list_tests() {
state=$1
test_expect_success "counting commits via bitmap ($state)" '
git rev-list --count HEAD >expect &&
git rev-list --use-bitmap-index --count HEAD >actual &&
test_cmp expect actual
'
test_expect_success "counting partial commits via bitmap ($state)" '
git rev-list --count HEAD~5..HEAD >expect &&
git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual &&
test_cmp expect actual
'
test_expect_success "counting non-linear history ($state)" '
git rev-list --count other...master >expect &&
git rev-list --use-bitmap-index --count other...master >actual &&
test_cmp expect actual
'
test_expect_success "enumerate --objects ($state)" '
git rev-list --objects --use-bitmap-index HEAD >tmp &&
cut -d" " -f1 <tmp >tmp2 &&
sort <tmp2 >actual &&
git rev-list --objects HEAD >tmp &&
cut -d" " -f1 <tmp >tmp2 &&
sort <tmp2 >expect &&
test_cmp expect actual
'
test_expect_success "bitmap --objects handles non-commit objects ($state)" '
git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual &&
grep $blob actual
'
}
rev_list_tests 'full bitmap'
test_expect_success 'clone from bitmapped repository' '
git clone --no-local --bare . clone.git &&
git rev-parse HEAD >expect &&
git --git-dir=clone.git rev-parse HEAD >actual &&
test_cmp expect actual
'
test_expect_success 'setup further non-bitmapped commits' '
for i in $(test_seq 1 10); do
test_commit further-$i
done
'
rev_list_tests 'partial bitmap'
test_expect_success 'fetch (partial bitmap)' '
git --git-dir=clone.git fetch origin master:master &&
git rev-parse HEAD >expect &&
git --git-dir=clone.git rev-parse HEAD >actual &&
test_cmp expect actual
'
test_expect_success 'incremental repack cannot create bitmaps' '
test_commit more-1 &&
test_must_fail git repack -d
'
test_expect_success 'incremental repack can disable bitmaps' '
test_commit more-2 &&
git repack -d --no-write-bitmap-index
'
test_expect_success 'full repack, reusing previous bitmaps' '
git repack -ad &&
ls .git/objects/pack/ | grep bitmap >output &&
test_line_count = 1 output
'
test_expect_success 'fetch (full bitmap)' '
git --git-dir=clone.git fetch origin master:master &&
git rev-parse HEAD >expect &&
git --git-dir=clone.git rev-parse HEAD >actual &&
test_cmp expect actual
'
test_lazy_prereq JGIT '
type jgit
'
test_expect_success JGIT 'we can read jgit bitmaps' '
git clone . compat-jgit &&
(
cd compat-jgit &&
rm -f .git/objects/pack/*.bitmap &&
jgit gc &&
git rev-list --test-bitmap HEAD
)
'
test_expect_success JGIT 'jgit can read our bitmaps' '
git clone . compat-us &&
(
cd compat-us &&
git repack -adb &&
# jgit gc will barf if it does not like our bitmaps
jgit gc
)
'
test_done