git/sha1-array.h

#ifndef SHA1_ARRAY_H
#define SHA1_ARRAY_H

/**
 * The API provides storage and manipulation of sets of object identifiers.
 * The emphasis is on storage and processing efficiency, making them suitable
 * for large lists. Note that the ordering of items is not preserved over some
 * operations.
 *
 * Examples
 * --------
 * -----------------------------------------
 * int print_callback(const struct object_id *oid,
 * 		    void *data)
 * {
 * 	printf("%s\n", oid_to_hex(oid));
 * 	return 0; // always continue
 * }
 *
 * void some_func(void)
 * {
 *     struct sha1_array hashes = OID_ARRAY_INIT;
 *     struct object_id oid;
 *
 *     // Read objects into our set
 *     while (read_object_from_stdin(oid.hash))
 *         oid_array_append(&hashes, &oid);
 *
 *     // Check if some objects are in our set
 *     while (read_object_from_stdin(oid.hash)) {
 *         if (oid_array_lookup(&hashes, &oid) >= 0)
 *             printf("it's in there!\n");
 *
 *          // Print the unique set of objects. We could also have
 *          // avoided adding duplicate objects in the first place,
 *          // but we would end up re-sorting the array repeatedly.
 *          // Instead, this will sort once and then skip duplicates
 *          // in linear time.
 *
 *         oid_array_for_each_unique(&hashes, print_callback, NULL);
 *     }
 */

/**
 * A single array of object IDs. This should be initialized by assignment from
 * `OID_ARRAY_INIT`. The `oid` member contains the actual data. The `nr` member
 * contains the number of items in the set. The `alloc` and `sorted` members
 * are used internally, and should not be needed by API callers.
 */
struct oid_array {
	struct object_id *oid;
	int nr;
	int alloc;
	int sorted;
};

#define OID_ARRAY_INIT { NULL, 0, 0, 0 }

/**
 * Add an item to the set. The object ID will be placed at the end of the array
 * (but note that some operations below may lose this ordering).
 */
void oid_array_append(struct oid_array *array, const struct object_id *oid);

/**
 * Perform a binary search of the array for a specific object ID. If found,
 * returns the offset (in number of elements) of the object ID. If not found,
 * returns a negative integer. If the array is not sorted, this function has
 * the side effect of sorting it.
 */
int oid_array_lookup(struct oid_array *array, const struct object_id *oid);

/**
 * Free all memory associated with the array and return it to the initial,
 * empty state.
 */
void oid_array_clear(struct oid_array *array);

typedef int (*for_each_oid_fn)(const struct object_id *oid,
			       void *data);
/**
 * Iterate over each element of the list, executing the callback function for
 * each one. Does not sort the list, so any custom hash order is retained.
 * If the callback returns a non-zero value, the iteration ends immediately
 * and the callback's return is propagated; otherwise, 0 is returned.
 */
int oid_array_for_each(struct oid_array *array,
		       for_each_oid_fn fn,
		       void *data);

/**
 * Iterate over each unique element of the list in sorted order, but otherwise
 * behave like `oid_array_for_each`. If the array is not sorted, this function
 * has the side effect of sorting it.
 */
int oid_array_for_each_unique(struct oid_array *array,
			      for_each_oid_fn fn,
			      void *data);

/**
 * Apply the callback function `want` to each entry in the array, retaining
 * only the entries for which the function returns true. Preserve the order
 * of the entries that are retained.
 */
void oid_array_filter(struct oid_array *array,
		      for_each_oid_fn want,
		      void *cbdata);

#endif /* SHA1_ARRAY_H */
bisect: refactor sha1_array into a generic sha1 list This is a generally useful abstraction, so let's let others make use of it. The refactoring is more or less a straight copy; however, functions and struct members have had their names changed to match string_list, which is the most similar data structure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 01:34:33 +04:00			`#ifndef SHA1_ARRAY_H`
			`#define SHA1_ARRAY_H`

sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00			`/**`
			`* The API provides storage and manipulation of sets of object identifiers.`
			`* The emphasis is on storage and processing efficiency, making them suitable`
			`* for large lists. Note that the ordering of items is not preserved over some`
			`* operations.`
			`*`
			`* Examples`
			`* --------`
			`* -----------------------------------------`
			`* int print_callback(const struct object_id *oid,`
			`* void *data)`
			`* {`
			`* printf("%s\n", oid_to_hex(oid));`
			`* return 0; // always continue`
			`* }`
			`*`
			`* void some_func(void)`
			`* {`
			`* struct sha1_array hashes = OID_ARRAY_INIT;`
			`* struct object_id oid;`
			`*`
			`* // Read objects into our set`
			`* while (read_object_from_stdin(oid.hash))`
			`* oid_array_append(&hashes, &oid);`
			`*`
			`* // Check if some objects are in our set`
			`* while (read_object_from_stdin(oid.hash)) {`
			`* if (oid_array_lookup(&hashes, &oid) >= 0)`
			`* printf("it's in there!\n");`
			`*`
			`* // Print the unique set of objects. We could also have`
			`* // avoided adding duplicate objects in the first place,`
			`* // but we would end up re-sorting the array repeatedly.`
			`* // Instead, this will sort once and then skip duplicates`
			`* // in linear time.`
			`*`
			`* oid_array_for_each_unique(&hashes, print_callback, NULL);`
			`* }`
			`*/`

			`/**`
			`* A single array of object IDs. This should be initialized by assignment from`
			* `OID_ARRAY_INIT`. The `oid` member contains the actual data. The `nr` member
			* contains the number of items in the set. The `alloc` and `sorted` members
			`* are used internally, and should not be needed by API callers.`
			`*/`
Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:40:00 +03:00			`struct oid_array {`
sha1-array: convert internal storage for struct sha1_array to object_id Make the internal storage for struct sha1_array use an array of struct object_id internally. Update the users of this struct which inspect its internals. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-26 19:01:37 +03:00			`struct object_id *oid;`
bisect: refactor sha1_array into a generic sha1 list This is a generally useful abstraction, so let's let others make use of it. The refactoring is more or less a straight copy; however, functions and struct members have had their names changed to match string_list, which is the most similar data structure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 01:34:33 +04:00			`int nr;`
			`int alloc;`
			`int sorted;`
			`};`

Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:40:00 +03:00			`#define OID_ARRAY_INIT { NULL, 0, 0, 0 }`
bisect: refactor sha1_array into a generic sha1 list This is a generally useful abstraction, so let's let others make use of it. The refactoring is more or less a straight copy; however, functions and struct members have had their names changed to match string_list, which is the most similar data structure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 01:34:33 +04:00
sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00			`/**`
			`* Add an item to the set. The object ID will be placed at the end of the array`
			`* (but note that some operations below may lose this ordering).`
			`*/`
Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:40:00 +03:00			`void oid_array_append(struct oid_array array, const struct object_id oid);`
sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00
			`/**`
			`* Perform a binary search of the array for a specific object ID. If found,`
			`* returns the offset (in number of elements) of the object ID. If not found,`
			`* returns a negative integer. If the array is not sorted, this function has`
			`* the side effect of sorting it.`
			`*/`
Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:40:00 +03:00			`int oid_array_lookup(struct oid_array array, const struct object_id oid);`
sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00
			`/**`
			`* Free all memory associated with the array and return it to the initial,`
			`* empty state.`
			`*/`
Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:40:00 +03:00			`void oid_array_clear(struct oid_array *array);`
bisect: refactor sha1_array into a generic sha1 list This is a generally useful abstraction, so let's let others make use of it. The refactoring is more or less a straight copy; however, functions and struct members have had their names changed to match string_list, which is the most similar data structure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 01:34:33 +04:00
Convert sha1_array_for_each_unique and for_each_abbrev to object_id Make sha1_array_for_each_unique take a callback using struct object_id. Since one of these callbacks is an argument to for_each_abbrev, convert those as well. Rename various functions, replacing "sha1" with "oid". Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:39:59 +03:00			`typedef int (for_each_oid_fn)(const struct object_id oid,`
			`void *data);`
sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00			`/**`
			`* Iterate over each element of the list, executing the callback function for`
			`* each one. Does not sort the list, so any custom hash order is retained.`
			`* If the callback returns a non-zero value, the iteration ends immediately`
			`* and the callback's return is propagated; otherwise, 0 is returned.`
			`*/`
get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 15:43:02 +03:00			`int oid_array_for_each(struct oid_array *array,`
			`for_each_oid_fn fn,`
			`void *data);`
sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00
			`/**`
			`* Iterate over each unique element of the list in sorted order, but otherwise`
			* behave like `oid_array_for_each`. If the array is not sorted, this function
			`* has the side effect of sorting it.`
			`*/`
Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 04:40:00 +03:00			`int oid_array_for_each_unique(struct oid_array *array,`
sha1-array.h: align function arguments The arguments weren't lined up with the opening parenthesis, after 910650d2 ("Rename sha1_array to oid_array", 2017-03-31) renamed the function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 15:42:59 +03:00			`for_each_oid_fn fn,`
			`void *data);`
sha1-array: move doc to sha1-array.h Move the documentation from Documentation/technical/api-oid-array.txt to sha1-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-oid-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-18 00:04:44 +03:00
			`/**`
			* Apply the callback function `want` to each entry in the array, retaining
			`* only the entries for which the function returns true. Preserve the order`
			`* of the entries that are retained.`
			`*/`
sha1-array: provide oid_array_filter Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-11-29 03:27:48 +03:00			`void oid_array_filter(struct oid_array *array,`
			`for_each_oid_fn want,`
			`void *cbdata);`
receive-pack: eliminate duplicate .have refs When receiving a push, we advertise ref tips from any alternate repositories, in case that helps the client send a smaller pack. Since these refs don't actually exist in the destination repository, we don't transmit the real ref names, but instead use the pseudo-ref ".have". If your alternate has a large number of duplicate refs (for example, because it is aggregating objects from many related repositories, some of which will have the same tags and branch tips), then we will send each ".have $sha1" line multiple times. This is a pointless waste of bandwidth, as we are simply repeating the same fact to the client over and over. This patch eliminates duplicate .have refs early on. It does so efficiently by sorting the complete list and skipping duplicates. This has the side effect of re-ordering the .have lines by ascending sha1; this isn't a problem, though, as the original order was meaningless. There is a similar .have system in fetch-pack, but it does not suffer from the same problem. For each alternate ref we consider in fetch-pack, we actually open the object and mark it with the SEEN flag, so duplicates are automatically culled. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 01:34:46 +04:00
bisect: refactor sha1_array into a generic sha1 list This is a generally useful abstraction, so let's let others make use of it. The refactoring is more or less a straight copy; however, functions and struct members have had their names changed to match string_list, which is the most similar data structure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 01:34:33 +04:00			`#endif /* SHA1_ARRAY_H */`