Merge branch 'sb/hashmap-customize-comparison'

Update the hashmap API so that data to customize the behaviour of
the comparison function can be specified at the time a hashmap is
initialized.

* sb/hashmap-customize-comparison:
  hashmap: migrate documentation from Documentation/technical into header
  patch-ids.c: use hashmap correctly
  hashmap.h: compare function has access to a data field
This commit is contained in:
Junio C Hamano 2017-07-13 16:14:54 -07:00
Родитель eac97b438c 1ecbf31d02
Коммит 91f6922544
20 изменённых файлов: 433 добавлений и 411 удалений

Просмотреть файл

@ -1,309 +0,0 @@
hashmap API
===========
The hashmap API is a generic implementation of hash-based key-value mappings.
Data Structures
---------------
`struct hashmap`::
The hash table structure. Members can be used as follows, but should
not be modified directly:
+
The `size` member keeps track of the total number of entries (0 means the
hashmap is empty).
+
`tablesize` is the allocated size of the hash table. A non-0 value indicates
that the hashmap is initialized. It may also be useful for statistical purposes
(i.e. `size / tablesize` is the current load factor).
+
`cmpfn` stores the comparison function specified in `hashmap_init()`. In
advanced scenarios, it may be useful to change this, e.g. to switch between
case-sensitive and case-insensitive lookup.
+
When `disallow_rehash` is set, automatic rehashes are prevented during inserts
and deletes.
`struct hashmap_entry`::
An opaque structure representing an entry in the hash table, which must
be used as first member of user data structures. Ideally it should be
followed by an int-sized member to prevent unused memory on 64-bit
systems due to alignment.
+
The `hash` member is the entry's hash code and the `next` member points to the
next entry in case of collisions (i.e. if multiple entries map to the same
bucket).
`struct hashmap_iter`::
An iterator structure, to be used with hashmap_iter_* functions.
Types
-----
`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
User-supplied function to test two hashmap entries for equality. Shall
return 0 if the entries are equal.
+
This function is always called with non-NULL `entry` / `entry_or_key`
parameters that have the same hash code. When looking up an entry, the `key`
and `keydata` parameters to hashmap_get and hashmap_remove are always passed
as second and third argument, respectively. Otherwise, `keydata` is NULL.
Functions
---------
`unsigned int strhash(const char *buf)`::
`unsigned int strihash(const char *buf)`::
`unsigned int memhash(const void *buf, size_t len)`::
`unsigned int memihash(const void *buf, size_t len)`::
`unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
http://www.isthe.com/chongo/tech/comp/fnv).
+
`strhash` and `strihash` take 0-terminated strings, while `memhash` and
`memihash` operate on arbitrary-length memory.
+
`strihash` and `memihash` are case insensitive versions.
+
`memihash_cont` is a variant of `memihash` that allows a computation to be
continued with another chunk of data.
`unsigned int sha1hash(const unsigned char *sha1)`::
Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
for use in hash tables. Cryptographic hashes are supposed to have
uniform distribution, so in contrast to `memhash()`, this just copies
the first `sizeof(int)` bytes without shuffling any bits. Note that
the results will be different on big-endian and little-endian
platforms, so they should not be stored or transferred over the net.
`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
Initializes a hashmap structure.
+
`map` is the hashmap to initialize.
+
The `equals_function` can be specified to compare two entries for equality.
If NULL, entries are considered equal if their hash codes are equal.
+
If the total number of entries is known in advance, the `initial_size`
parameter may be used to preallocate a sufficiently large table and thus
prevent expensive resizing. If 0, the table is dynamically resized.
`void hashmap_free(struct hashmap *map, int free_entries)`::
Frees a hashmap structure and allocated memory.
+
`map` is the hashmap to free.
+
If `free_entries` is true, each hashmap_entry in the map is freed as well
(using stdlib's free()).
`void hashmap_entry_init(void *entry, unsigned int hash)`::
Initializes a hashmap_entry structure.
+
`entry` points to the entry to initialize.
+
`hash` is the hash code of the entry.
+
The hashmap_entry structure does not hold references to external resources,
and it is safe to just discard it once you are done with it (i.e. if
your structure was allocated with xmalloc(), you can just free(3) it,
and if it is on stack, you can just let it go out of scope).
`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
Returns the hashmap entry for the specified key, or NULL if not found.
+
`map` is the hashmap structure.
+
`key` is a hashmap_entry structure (or user data structure that starts with
hashmap_entry) that has at least been initialized with the proper hash code
(via `hashmap_entry_init`).
+
If an entry with matching hash code is found, `key` and `keydata` are passed
to `hashmap_cmp_fn` to decide whether the entry matches the key.
`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
Returns the hashmap entry for the specified hash code and key data,
or NULL if not found.
+
`map` is the hashmap structure.
+
`hash` is the hash code of the entry to look up.
+
If an entry with matching hash code is found, `keydata` is passed to
`hashmap_cmp_fn` to decide whether the entry matches the key. The
`entry_or_key` parameter points to a bogus hashmap_entry structure that
should not be used in the comparison.
`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
Returns the next equal hashmap entry, or NULL if not found. This can be
used to iterate over duplicate entries (see `hashmap_add`).
+
`map` is the hashmap structure.
+
`entry` is the hashmap_entry to start the search from, obtained via a previous
call to `hashmap_get` or `hashmap_get_next`.
`void hashmap_add(struct hashmap *map, void *entry)`::
Adds a hashmap entry. This allows to add duplicate entries (i.e.
separate values with the same key according to hashmap_cmp_fn).
+
`map` is the hashmap structure.
+
`entry` is the entry to add.
`void *hashmap_put(struct hashmap *map, void *entry)`::
Adds or replaces a hashmap entry. If the hashmap contains duplicate
entries equal to the specified entry, only one of them will be replaced.
+
`map` is the hashmap structure.
+
`entry` is the entry to add or replace.
+
Returns the replaced entry, or NULL if not found (i.e. the entry was added).
`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
Removes a hashmap entry matching the specified key. If the hashmap
contains duplicate entries equal to the specified key, only one of
them will be removed.
+
`map` is the hashmap structure.
+
`key` is a hashmap_entry structure (or user data structure that starts with
hashmap_entry) that has at least been initialized with the proper hash code
(via `hashmap_entry_init`).
+
If an entry with matching hash code is found, `key` and `keydata` are
passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
+
Returns the removed entry, or NULL if not found.
`void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
Disallow/allow automatic rehashing of the hashmap during inserts
and deletes.
+
This is useful if the caller knows that the hashmap will be accessed
by multiple threads.
+
The caller is still responsible for any necessary locking; this simply
prevents unexpected rehashing. The caller is also responsible for properly
sizing the initial hashmap to ensure good performance.
+
A call to allow rehashing does not force a rehash; that might happen
with the next insert or delete.
`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
`void *hashmap_iter_next(struct hashmap_iter *iter)`::
`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
Used to iterate over all entries of a hashmap. Note that it is
not safe to add or remove entries to the hashmap while
iterating.
+
`hashmap_iter_init` initializes a `hashmap_iter` structure.
+
`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
more entries.
+
`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
and returns the first entry, if any).
`const char *strintern(const char *string)`::
`const void *memintern(const void *data, size_t len)`::
Returns the unique, interned version of the specified string or data,
similar to the `String.intern` API in Java and .NET, respectively.
Interned strings remain valid for the entire lifetime of the process.
+
Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
strings / data must not be modified or freed.
+
Interned strings are best used for short strings with high probability of
duplicates.
+
Uses a hashmap to store the pool of interned strings.
Usage example
-------------
Here's a simple usage example that maps long keys to double values.
------------
struct hashmap map;
struct long2double {
struct hashmap_entry ent; /* must be the first member! */
long key;
double value;
};
static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
{
return !(e1->key == e2->key);
}
void long2double_init(void)
{
hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
}
void long2double_free(void)
{
hashmap_free(&map, 1);
}
static struct long2double *find_entry(long key)
{
struct long2double k;
hashmap_entry_init(&k, memhash(&key, sizeof(long)));
k.key = key;
return hashmap_get(&map, &k, NULL);
}
double get_value(long key)
{
struct long2double *e = find_entry(key);
return e ? e->value : 0;
}
void set_value(long key, double value)
{
struct long2double *e = find_entry(key);
if (!e) {
e = malloc(sizeof(struct long2double));
hashmap_entry_init(e, memhash(&key, sizeof(long)));
e->key = key;
hashmap_add(&map, e);
}
e->value = value;
}
------------
Using variable-sized keys
-------------------------
The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
`hashmap_entry` structure as key to find the correct entry. If the key data is
variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
to create a full-fledged entry structure on the heap and copy all the key data
into the structure.
In this case, the `keydata` parameter can be used to pass
variable-sized key data directly to the comparison function, and the `key`
parameter can be a stripped-down, fixed size entry structure allocated on the
stack.
See test-hashmap.c for an example using arbitrary-length strings as keys.

7
attr.c
Просмотреть файл

@ -76,9 +76,10 @@ struct attr_hash_entry {
};
/* attr_hashmap comparison function */
static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
static int attr_hash_entry_cmp(void *unused_cmp_data,
const struct attr_hash_entry *a,
const struct attr_hash_entry *b,
void *unused)
void *unused_keydata)
{
return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
}
@ -86,7 +87,7 @@ static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
/* Initialize an 'attr_hashmap' object */
static void attr_hashmap_init(struct attr_hashmap *map)
{
hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, 0);
hashmap_init(&map->map, (hashmap_cmp_fn) attr_hash_entry_cmp, NULL, 0);
}
/*

Просмотреть файл

@ -54,8 +54,10 @@ static const char *prio_names[] = {
N_("head"), N_("lightweight"), N_("annotated"),
};
static int commit_name_cmp(const struct commit_name *cn1,
const struct commit_name *cn2, const void *peeled)
static int commit_name_cmp(const void *unused_cmp_data,
const struct commit_name *cn1,
const struct commit_name *cn2,
const void *peeled)
{
return oidcmp(&cn1->peeled, peeled ? peeled : &cn2->peeled);
}
@ -501,7 +503,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
return cmd_name_rev(args.argc, args.argv, prefix);
}
hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, 0);
hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, NULL, 0);
for_each_rawref(get_name, NULL);
if (!names.size && !always)
die(_("No names found, cannot describe anything."));

Просмотреть файл

@ -130,8 +130,10 @@ struct working_tree_entry {
char path[FLEX_ARRAY];
};
static int working_tree_entry_cmp(struct working_tree_entry *a,
struct working_tree_entry *b, void *keydata)
static int working_tree_entry_cmp(const void *unused_cmp_data,
struct working_tree_entry *a,
struct working_tree_entry *b,
void *unused_keydata)
{
return strcmp(a->path, b->path);
}
@ -146,7 +148,9 @@ struct pair_entry {
const char path[FLEX_ARRAY];
};
static int pair_cmp(struct pair_entry *a, struct pair_entry *b, void *keydata)
static int pair_cmp(const void *unused_cmp_data,
struct pair_entry *a, struct pair_entry *b,
void *unused_keydata)
{
return strcmp(a->path, b->path);
}
@ -174,7 +178,9 @@ struct path_entry {
char path[FLEX_ARRAY];
};
static int path_entry_cmp(struct path_entry *a, struct path_entry *b, void *key)
static int path_entry_cmp(const void *unused_cmp_data,
struct path_entry *a, struct path_entry *b,
void *key)
{
return strcmp(a->path, key ? key : b->path);
}
@ -367,9 +373,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
wtdir_len = wtdir.len;
hashmap_init(&working_tree_dups,
(hashmap_cmp_fn)working_tree_entry_cmp, 0);
hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, 0);
hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, 0);
(hashmap_cmp_fn)working_tree_entry_cmp, NULL, 0);
hashmap_init(&submodules, (hashmap_cmp_fn)pair_cmp, NULL, 0);
hashmap_init(&symlinks2, (hashmap_cmp_fn)pair_cmp, NULL, 0);
child.no_stdin = 1;
child.git_cmd = 1;
@ -580,9 +586,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
* files through the symlink.
*/
hashmap_init(&wt_modified, (hashmap_cmp_fn)path_entry_cmp,
wtindex.cache_nr);
NULL, wtindex.cache_nr);
hashmap_init(&tmp_modified, (hashmap_cmp_fn)path_entry_cmp,
wtindex.cache_nr);
NULL, wtindex.cache_nr);
for (i = 0; i < wtindex.cache_nr; i++) {
struct hashmap_entry dummy;

Просмотреть файл

@ -93,8 +93,9 @@ struct anonymized_entry {
size_t anon_len;
};
static int anonymized_entry_cmp(const void *va, const void *vb,
const void *data)
static int anonymized_entry_cmp(const void *unused_cmp_data,
const void *va, const void *vb,
const void *unused_keydata)
{
const struct anonymized_entry *a = va, *b = vb;
return a->orig_len != b->orig_len ||
@ -113,7 +114,7 @@ static const void *anonymize_mem(struct hashmap *map,
struct anonymized_entry key, *ret;
if (!map->cmpfn)
hashmap_init(map, anonymized_entry_cmp, 0);
hashmap_init(map, anonymized_entry_cmp, NULL, 0);
hashmap_entry_init(&key, memhash(orig, *len));
key.orig = orig;

Просмотреть файл

@ -1714,15 +1714,18 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
return 0;
}
static int config_set_element_cmp(const struct config_set_element *e1,
const struct config_set_element *e2, const void *unused)
static int config_set_element_cmp(const void *unused_cmp_data,
const struct config_set_element *e1,
const struct config_set_element *e2,
const void *unused_keydata)
{
return strcmp(e1->key, e2->key);
}
void git_configset_init(struct config_set *cs)
{
hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp, 0);
hashmap_init(&cs->config_hash, (hashmap_cmp_fn)config_set_element_cmp,
NULL, 0);
cs->hash_initialized = 1;
cs->list.nr = 0;
cs->list.alloc = 0;

Просмотреть файл

@ -583,7 +583,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
if (!subprocess_map_initialized) {
subprocess_map_initialized = 1;
hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp,
NULL, 0);
entry = NULL;
} else {
entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);

Просмотреть файл

@ -341,7 +341,7 @@ static int find_exact_renames(struct diff_options *options)
/* Add all sources to the hash table in reverse order, because
* later on they will be retrieved in LIFO order.
*/
hashmap_init(&file_table, NULL, rename_src_nr);
hashmap_init(&file_table, NULL, NULL, rename_src_nr);
for (i = rename_src_nr-1; i >= 0; i--)
insert_file_table(&file_table, i, rename_src[i].p->one);

Просмотреть файл

@ -95,7 +95,9 @@ static inline int entry_equals(const struct hashmap *map,
const struct hashmap_entry *e1, const struct hashmap_entry *e2,
const void *keydata)
{
return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata));
return (e1 == e2) ||
(e1->hash == e2->hash &&
!map->cmpfn(map->cmpfn_data, e1, e2, keydata));
}
static inline unsigned int bucket(const struct hashmap *map,
@ -140,19 +142,23 @@ static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
return e;
}
static int always_equal(const void *unused1, const void *unused2, const void *unused3)
static int always_equal(const void *unused_cmp_data,
const void *unused1,
const void *unused2,
const void *unused_keydata)
{
return 0;
}
void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
size_t initial_size)
const void *cmpfn_data, size_t initial_size)
{
unsigned int size = HASHMAP_INITIAL_SIZE;
memset(map, 0, sizeof(*map));
map->cmpfn = equals_function ? equals_function : always_equal;
map->cmpfn_data = cmpfn_data;
/* calculate initial table size and allocate the table */
initial_size = (unsigned int) ((uint64_t) initial_size * 100
@ -260,7 +266,8 @@ struct pool_entry {
unsigned char data[FLEX_ARRAY];
};
static int pool_entry_cmp(const struct pool_entry *e1,
static int pool_entry_cmp(const void *unused_cmp_data,
const struct pool_entry *e1,
const struct pool_entry *e2,
const unsigned char *keydata)
{
@ -275,7 +282,7 @@ const void *memintern(const void *data, size_t len)
/* initialize string pool hashmap */
if (!map.tablesize)
hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, 0);
hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, NULL, 0);
/* lookup interned string in pool */
hashmap_entry_init(&key, memhash(data, len));

364
hashmap.h
Просмотреть файл

@ -3,17 +3,123 @@
/*
* Generic implementation of hash-based key-value mappings.
* See Documentation/technical/api-hashmap.txt.
*
* An example that maps long to a string:
* For the sake of the example this allows to lookup exact values, too
* (i.e. it is operated as a set, the value is part of the key)
* -------------------------------------
*
* struct hashmap map;
* struct long2string {
* struct hashmap_entry ent; // must be the first member!
* long key;
* char value[FLEX_ARRAY]; // be careful with allocating on stack!
* };
*
* #define COMPARE_VALUE 1
*
* static int long2string_cmp(const struct long2string *e1,
* const struct long2string *e2,
* const void *keydata, const void *userdata)
* {
* char *string = keydata;
* unsigned *flags = (unsigned*)userdata;
*
* if (flags & COMPARE_VALUE)
* return !(e1->key == e2->key) || (keydata ?
* strcmp(e1->value, keydata) : strcmp(e1->value, e2->value));
* else
* return !(e1->key == e2->key);
* }
*
* int main(int argc, char **argv)
* {
* long key;
* char *value, *action;
*
* unsigned flags = ALLOW_DUPLICATE_KEYS;
*
* hashmap_init(&map, (hashmap_cmp_fn) long2string_cmp, &flags, 0);
*
* while (scanf("%s %l %s", action, key, value)) {
*
* if (!strcmp("add", action)) {
* struct long2string *e;
* e = malloc(sizeof(struct long2string) + strlen(value));
* hashmap_entry_init(e, memhash(&key, sizeof(long)));
* e->key = key;
* memcpy(e->value, value, strlen(value));
* hashmap_add(&map, e);
* }
*
* if (!strcmp("print_all_by_key", action)) {
* flags &= ~COMPARE_VALUE;
*
* struct long2string k;
* hashmap_entry_init(&k, memhash(&key, sizeof(long)));
* k.key = key;
*
* struct long2string *e = hashmap_get(&map, &k, NULL);
* if (e) {
* printf("first: %l %s\n", e->key, e->value);
* while (e = hashmap_get_next(&map, e))
* printf("found more: %l %s\n", e->key, e->value);
* }
* }
*
* if (!strcmp("has_exact_match", action)) {
* flags |= COMPARE_VALUE;
*
* struct long2string *e;
* e = malloc(sizeof(struct long2string) + strlen(value));
* hashmap_entry_init(e, memhash(&key, sizeof(long)));
* e->key = key;
* memcpy(e->value, value, strlen(value));
*
* printf("%s found\n", hashmap_get(&map, e, NULL) ? "" : "not");
* }
*
* if (!strcmp("has_exact_match_no_heap_alloc", action)) {
* flags |= COMPARE_VALUE;
*
* struct long2string e;
* hashmap_entry_init(e, memhash(&key, sizeof(long)));
* e.key = key;
*
* printf("%s found\n", hashmap_get(&map, e, value) ? "" : "not");
* }
*
* if (!strcmp("end", action)) {
* hashmap_free(&map, 1);
* break;
* }
* }
* }
*/
/* FNV-1 functions */
/*
* Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
* http://www.isthe.com/chongo/tech/comp/fnv).
* `strhash` and `strihash` take 0-terminated strings, while `memhash` and
* `memihash` operate on arbitrary-length memory.
* `strihash` and `memihash` are case insensitive versions.
* `memihash_cont` is a variant of `memihash` that allows a computation to be
* continued with another chunk of data.
*/
extern unsigned int strhash(const char *buf);
extern unsigned int strihash(const char *buf);
extern unsigned int memhash(const void *buf, size_t len);
extern unsigned int memihash(const void *buf, size_t len);
extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len);
/*
* Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
* for use in hash tables. Cryptographic hashes are supposed to have
* uniform distribution, so in contrast to `memhash()`, this just copies
* the first `sizeof(int)` bytes without shuffling any bits. Note that
* the results will be different on big-endian and little-endian
* platforms, so they should not be stored or transferred over the net.
*/
static inline unsigned int sha1hash(const unsigned char *sha1)
{
/*
@ -25,86 +131,255 @@ static inline unsigned int sha1hash(const unsigned char *sha1)
return hash;
}
/* data structures */
/*
* struct hashmap_entry is an opaque structure representing an entry in the
* hash table, which must be used as first member of user data structures.
* Ideally it should be followed by an int-sized member to prevent unused
* memory on 64-bit systems due to alignment.
*/
struct hashmap_entry {
/*
* next points to the next entry in case of collisions (i.e. if
* multiple entries map to the same bucket)
*/
struct hashmap_entry *next;
/* entry's hash code */
unsigned int hash;
};
typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
const void *keydata);
/*
* User-supplied function to test two hashmap entries for equality. Shall
* return 0 if the entries are equal.
*
* This function is always called with non-NULL `entry` and `entry_or_key`
* parameters that have the same hash code.
*
* When looking up an entry, the `key` and `keydata` parameters to hashmap_get
* and hashmap_remove are always passed as second `entry_or_key` and third
* argument `keydata`, respectively. Otherwise, `keydata` is NULL.
*
* When it is too expensive to allocate a user entry (either because it is
* large or varialbe sized, such that it is not on the stack), then the
* relevant data to check for equality should be passed via `keydata`.
* In this case `key` can be a stripped down version of the user key data
* or even just a hashmap_entry having the correct hash.
*
* The `hashmap_cmp_fn_data` entry is the pointer given in the init function.
*/
typedef int (*hashmap_cmp_fn)(const void *hashmap_cmp_fn_data,
const void *entry, const void *entry_or_key,
const void *keydata);
/*
* struct hashmap is the hash table structure. Members can be used as follows,
* but should not be modified directly.
*/
struct hashmap {
struct hashmap_entry **table;
hashmap_cmp_fn cmpfn;
unsigned int size, tablesize, grow_at, shrink_at;
unsigned disallow_rehash : 1;
};
struct hashmap_iter {
struct hashmap *map;
struct hashmap_entry *next;
unsigned int tablepos;
/* Stores the comparison function specified in `hashmap_init()`. */
hashmap_cmp_fn cmpfn;
const void *cmpfn_data;
/* total number of entries (0 means the hashmap is empty) */
unsigned int size;
/*
* tablesize is the allocated size of the hash table. A non-0 value
* indicates that the hashmap is initialized. It may also be useful
* for statistical purposes (i.e. `size / tablesize` is the current
* load factor).
*/
unsigned int tablesize;
unsigned int grow_at;
unsigned int shrink_at;
/* See `hashmap_disallow_rehash`. */
unsigned disallow_rehash : 1;
};
/* hashmap functions */
extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
size_t initial_size);
/*
* Initializes a hashmap structure.
*
* `map` is the hashmap to initialize.
*
* The `equals_function` can be specified to compare two entries for equality.
* If NULL, entries are considered equal if their hash codes are equal.
*
* The `equals_function_data` parameter can be used to provide additional data
* (a callback cookie) that will be passed to `equals_function` each time it
* is called. This allows a single `equals_function` to implement multiple
* comparison functions.
*
* If the total number of entries is known in advance, the `initial_size`
* parameter may be used to preallocate a sufficiently large table and thus
* prevent expensive resizing. If 0, the table is dynamically resized.
*/
extern void hashmap_init(struct hashmap *map,
hashmap_cmp_fn equals_function,
const void *equals_function_data,
size_t initial_size);
/*
* Frees a hashmap structure and allocated memory.
*
* If `free_entries` is true, each hashmap_entry in the map is freed as well
* using stdlibs free().
*/
extern void hashmap_free(struct hashmap *map, int free_entries);
/* hashmap_entry functions */
/*
* Initializes a hashmap_entry structure.
*
* `entry` points to the entry to initialize.
* `hash` is the hash code of the entry.
*
* The hashmap_entry structure does not hold references to external resources,
* and it is safe to just discard it once you are done with it (i.e. if
* your structure was allocated with xmalloc(), you can just free(3) it,
* and if it is on stack, you can just let it go out of scope).
*/
static inline void hashmap_entry_init(void *entry, unsigned int hash)
{
struct hashmap_entry *e = entry;
e->hash = hash;
e->next = NULL;
}
extern void *hashmap_get(const struct hashmap *map, const void *key,
const void *keydata);
extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
extern void hashmap_add(struct hashmap *map, void *entry);
extern void *hashmap_put(struct hashmap *map, void *entry);
extern void *hashmap_remove(struct hashmap *map, const void *key,
const void *keydata);
/*
* Returns the hashmap entry for the specified key, or NULL if not found.
*
* `map` is the hashmap structure.
*
* `key` is a user data structure that starts with hashmap_entry that has at
* least been initialized with the proper hash code (via `hashmap_entry_init`).
*
* `keydata` is a data structure that holds just enough information to check
* for equality to a given entry.
*
* If the key data is variable-sized (e.g. a FLEX_ARRAY string) or quite large,
* it is undesirable to create a full-fledged entry structure on the heap and
* copy all the key data into the structure.
*
* In this case, the `keydata` parameter can be used to pass
* variable-sized key data directly to the comparison function, and the `key`
* parameter can be a stripped-down, fixed size entry structure allocated on the
* stack.
*
* If an entry with matching hash code is found, `key` and `keydata` are passed
* to `hashmap_cmp_fn` to decide whether the entry matches the key.
*/
extern void *hashmap_get(const struct hashmap *map, const void *key,
const void *keydata);
/*
* Returns the hashmap entry for the specified hash code and key data,
* or NULL if not found.
*
* `map` is the hashmap structure.
* `hash` is the hash code of the entry to look up.
*
* If an entry with matching hash code is found, `keydata` is passed to
* `hashmap_cmp_fn` to decide whether the entry matches the key. The
* `entry_or_key` parameter of `hashmap_cmp_fn` points to a hashmap_entry
* structure that should not be used in the comparison.
*/
static inline void *hashmap_get_from_hash(const struct hashmap *map,
unsigned int hash, const void *keydata)
unsigned int hash,
const void *keydata)
{
struct hashmap_entry key;
hashmap_entry_init(&key, hash);
return hashmap_get(map, &key, keydata);
}
/*
* Returns the next equal hashmap entry, or NULL if not found. This can be
* used to iterate over duplicate entries (see `hashmap_add`).
*
* `map` is the hashmap structure.
* `entry` is the hashmap_entry to start the search from, obtained via a previous
* call to `hashmap_get` or `hashmap_get_next`.
*/
extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
/*
* Adds a hashmap entry. This allows to add duplicate entries (i.e.
* separate values with the same key according to hashmap_cmp_fn).
*
* `map` is the hashmap structure.
* `entry` is the entry to add.
*/
extern void hashmap_add(struct hashmap *map, void *entry);
/*
* Adds or replaces a hashmap entry. If the hashmap contains duplicate
* entries equal to the specified entry, only one of them will be replaced.
*
* `map` is the hashmap structure.
* `entry` is the entry to add or replace.
* Returns the replaced entry, or NULL if not found (i.e. the entry was added).
*/
extern void *hashmap_put(struct hashmap *map, void *entry);
/*
* Removes a hashmap entry matching the specified key. If the hashmap contains
* duplicate entries equal to the specified key, only one of them will be
* removed. Returns the removed entry, or NULL if not found.
*
* Argument explanation is the same as in `hashmap_get`.
*/
extern void *hashmap_remove(struct hashmap *map, const void *key,
const void *keydata);
/*
* Returns the `bucket` an entry is stored in.
* Useful for multithreaded read access.
*/
int hashmap_bucket(const struct hashmap *map, unsigned int hash);
/*
* Disallow/allow rehashing of the hashmap.
* This is useful if the caller knows that the hashmap
* needs multi-threaded access. The caller is still
* required to guard/lock searches and inserts in a
* manner appropriate to their usage. This simply
* prevents the table from being unexpectedly re-mapped.
* This is useful if the caller knows that the hashmap needs multi-threaded
* access. The caller is still required to guard/lock searches and inserts
* in a manner appropriate to their usage. This simply prevents the table
* from being unexpectedly re-mapped.
*
* If is up to the caller to ensure that the hashmap is
* initialized to a reasonable size to prevent poor
* performance.
* It is up to the caller to ensure that the hashmap is initialized to a
* reasonable size to prevent poor performance.
*
* When value=1, prevent future rehashes on adds and deleted.
* When value=0, allow future rehahses. This DOES NOT force
* a rehash now.
* A call to allow rehashing does not force a rehash; that might happen
* with the next insert or delete.
*/
static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value)
{
map->disallow_rehash = value;
}
/* hashmap_iter functions */
/*
* Used to iterate over all entries of a hashmap. Note that it is
* not safe to add or remove entries to the hashmap while
* iterating.
*/
struct hashmap_iter {
struct hashmap *map;
struct hashmap_entry *next;
unsigned int tablepos;
};
/* Initializes a `hashmap_iter` structure. */
extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
/* Returns the next hashmap_entry, or NULL if there are no more entries. */
extern void *hashmap_iter_next(struct hashmap_iter *iter);
/* Initializes the iterator and returns the first entry, if any. */
static inline void *hashmap_iter_first(struct hashmap *map,
struct hashmap_iter *iter)
{
@ -112,8 +387,21 @@ static inline void *hashmap_iter_first(struct hashmap *map,
return hashmap_iter_next(iter);
}
/* string interning */
/* String interning */
/*
* Returns the unique, interned version of the specified string or data,
* similar to the `String.intern` API in Java and .NET, respectively.
* Interned strings remain valid for the entire lifetime of the process.
*
* Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
* strings / data must not be modified or freed.
*
* Interned strings are best used for short strings with high probability of
* duplicates.
*
* Uses a hashmap to store the pool of interned strings.
*/
extern const void *memintern(const void *data, size_t len);
static inline const char *strintern(const char *string)
{

Просмотреть файл

@ -16,8 +16,10 @@ struct dir_entry {
char name[FLEX_ARRAY];
};
static int dir_entry_cmp(const struct dir_entry *e1,
const struct dir_entry *e2, const char *name)
static int dir_entry_cmp(const void *unused_cmp_data,
const struct dir_entry *e1,
const struct dir_entry *e2,
const char *name)
{
return e1->namelen != e2->namelen || strncasecmp(e1->name,
name ? name : e2->name, e1->namelen);
@ -107,8 +109,10 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
add_dir_entry(istate, ce);
}
static int cache_entry_cmp(const struct cache_entry *ce1,
const struct cache_entry *ce2, const void *remove)
static int cache_entry_cmp(const void *unused_cmp_data,
const struct cache_entry *ce1,
const struct cache_entry *ce2,
const void *remove)
{
/*
* For remove_name_hash, find the exact entry (pointer equality); for
@ -571,9 +575,9 @@ static void lazy_init_name_hash(struct index_state *istate)
if (istate->name_hash_initialized)
return;
hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp,
istate->cache_nr);
NULL, istate->cache_nr);
hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp,
istate->cache_nr);
NULL, istate->cache_nr);
if (lookup_lazy_params(istate)) {
hashmap_disallow_rehash(&istate->dir_hash, 1);

Просмотреть файл

@ -6,7 +6,8 @@ struct oidset_entry {
struct object_id oid;
};
static int oidset_hashcmp(const void *va, const void *vb,
static int oidset_hashcmp(const void *unused_cmp_data,
const void *va, const void *vb,
const void *vkey)
{
const struct oidset_entry *a = va, *b = vb;
@ -30,7 +31,7 @@ int oidset_insert(struct oidset *set, const struct object_id *oid)
struct oidset_entry *entry;
if (!set->map.cmpfn)
hashmap_init(&set->map, oidset_hashcmp, 0);
hashmap_init(&set->map, oidset_hashcmp, NULL, 0);
if (oidset_contains(set, oid))
return 1;

Просмотреть файл

@ -35,9 +35,10 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
* the side of safety. The actual value being negative does not have
* any significance; only that it is non-zero matters.
*/
static int patch_id_cmp(struct patch_id *a,
static int patch_id_cmp(struct diff_options *opt,
struct patch_id *a,
struct patch_id *b,
struct diff_options *opt)
const void *unused_keydata)
{
if (is_null_oid(&a->patch_id) &&
commit_patch_id(a->commit, opt, &a->patch_id, 0))
@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
ids->diffopts.detect_rename = 0;
DIFF_OPT_SET(&ids->diffopts, RECURSIVE);
diff_setup_done(&ids->diffopts);
hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256);
hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp,
&ids->diffopts, 256);
return 0;
}
@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
if (init_patch_id_entry(&patch, commit, ids))
return NULL;
return hashmap_get(&ids->patches, &patch, &ids->diffopts);
return hashmap_get(&ids->patches, &patch, NULL);
}
struct patch_id *add_commit_patch_id(struct commit *commit,

5
refs.c
Просмотреть файл

@ -1525,7 +1525,8 @@ struct ref_store_hash_entry
char name[FLEX_ARRAY];
};
static int ref_store_hash_cmp(const void *entry, const void *entry_or_key,
static int ref_store_hash_cmp(const void *unused_cmp_data,
const void *entry, const void *entry_or_key,
const void *keydata)
{
const struct ref_store_hash_entry *e1 = entry, *e2 = entry_or_key;
@ -1608,7 +1609,7 @@ static void register_ref_store_map(struct hashmap *map,
const char *name)
{
if (!map->tablesize)
hashmap_init(map, ref_store_hash_cmp, 0);
hashmap_init(map, ref_store_hash_cmp, NULL, 0);
if (hashmap_put(map, alloc_ref_store_hash_entry(name, refs)))
die("BUG: %s ref_store '%s' initialized twice", type, name);

Просмотреть файл

@ -133,7 +133,10 @@ struct remotes_hash_key {
int len;
};
static int remotes_hash_cmp(const struct remote *a, const struct remote *b, const struct remotes_hash_key *key)
static int remotes_hash_cmp(const void *unused_cmp_data,
const struct remote *a,
const struct remote *b,
const struct remotes_hash_key *key)
{
if (key)
return strncmp(a->name, key->str, key->len) || a->name[key->len];
@ -144,7 +147,7 @@ static int remotes_hash_cmp(const struct remote *a, const struct remote *b, cons
static inline void init_remotes_hash(void)
{
if (!remotes_hash.cmpfn)
hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, 0);
hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, NULL, 0);
}
static struct remote *make_remote(const char *name, int len)

Просмотреть файл

@ -2275,7 +2275,8 @@ static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
return a->p == b->p && a->base_offset == b->base_offset;
}
static int delta_base_cache_hash_cmp(const void *va, const void *vb,
static int delta_base_cache_hash_cmp(const void *unused_cmp_data,
const void *va, const void *vb,
const void *vkey)
{
const struct delta_base_cache_entry *a = va, *b = vb;
@ -2360,7 +2361,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
list_add_tail(&ent->lru, &delta_base_cache_lru);
if (!delta_base_cache.cmpfn)
hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, 0);
hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, NULL, 0);
hashmap_entry_init(ent, pack_entry_hash(p, base_offset));
hashmap_add(&delta_base_cache, ent);
}

Просмотреть файл

@ -5,9 +5,10 @@
#include "sigchain.h"
#include "pkt-line.h"
int cmd2process_cmp(const struct subprocess_entry *e1,
const struct subprocess_entry *e2,
const void *unused)
int cmd2process_cmp(const void *unused_cmp_data,
const struct subprocess_entry *e1,
const struct subprocess_entry *e2,
const void *unused_keydata)
{
return strcmp(e1->cmd, e2->cmd);
}

Просмотреть файл

@ -20,8 +20,10 @@ struct subprocess_entry {
/* subprocess functions */
int cmd2process_cmp(const struct subprocess_entry *e1,
const struct subprocess_entry *e2, const void *unused);
extern int cmd2process_cmp(const void *unused_cmp_data,
const struct subprocess_entry *e1,
const struct subprocess_entry *e2,
const void *unused_keydata);
typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,

Просмотреть файл

@ -33,17 +33,19 @@ enum lookup_type {
lookup_path
};
static int config_path_cmp(const struct submodule_entry *a,
static int config_path_cmp(const void *unused_cmp_data,
const struct submodule_entry *a,
const struct submodule_entry *b,
const void *unused)
const void *unused_keydata)
{
return strcmp(a->config->path, b->config->path) ||
hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
}
static int config_name_cmp(const struct submodule_entry *a,
static int config_name_cmp(const void *unused_cmp_data,
const struct submodule_entry *a,
const struct submodule_entry *b,
const void *unused)
const void *unused_keydata)
{
return strcmp(a->config->name, b->config->name) ||
hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
@ -56,8 +58,8 @@ static struct submodule_cache *submodule_cache_alloc(void)
static void submodule_cache_init(struct submodule_cache *cache)
{
hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, 0);
hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, 0);
hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, NULL, 0);
hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, NULL, 0);
cache->initialized = 1;
}

Просмотреть файл

@ -13,14 +13,18 @@ static const char *get_value(const struct test_entry *e)
return e->key + strlen(e->key) + 1;
}
static int test_entry_cmp(const struct test_entry *e1,
const struct test_entry *e2, const char* key)
static int test_entry_cmp(const void *unused_cmp_data,
const struct test_entry *e1,
const struct test_entry *e2,
const char* key)
{
return strcmp(e1->key, key ? key : e2->key);
}
static int test_entry_cmp_icase(const struct test_entry *e1,
const struct test_entry *e2, const char* key)
static int test_entry_cmp_icase(const void *unused_cmp_data,
const struct test_entry *e1,
const struct test_entry *e2,
const char* key)
{
return strcasecmp(e1->key, key ? key : e2->key);
}
@ -92,7 +96,8 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
if (method & TEST_ADD) {
/* test adding to the map */
for (j = 0; j < rounds; j++) {
hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp,
NULL, 0);
/* add entries */
for (i = 0; i < TEST_SIZE; i++) {
@ -104,7 +109,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
}
} else {
/* test map lookups */
hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, NULL, 0);
/* fill the map (sparsely if specified) */
j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
@ -147,7 +152,7 @@ int cmd_main(int argc, const char **argv)
/* init hash map */
icase = argc > 1 && !strcmp("ignorecase", argv[1]);
hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase
: test_entry_cmp), 0);
: test_entry_cmp), NULL, 0);
/* process commands from stdin */
while (fgets(line, sizeof(line), stdin)) {