microsoft/git - git

Граф коммитов

Автор	SHA1	Сообщение	Дата
Johannes Schindelin	652891de4f	read_index_from(): avoid memory leak In `998330ac2e` (read-cache: look for shared index files next to the index, too, 2021-08-26), we added code that allocates memory to store the base path of a shared index, but we never released that memory. Reported by Coverity. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-16 13:22:03 -07:00
Junio C Hamano	113656eca6	Merge branch 'zh/read-cache-copy-name-entry-fix' Remove redundant copying (with index v3 and older) or possible over-reading beyond end of mmapped memory (with index v4) has been corrected. * zh/read-cache-copy-name-entry-fix: read-cache.c: reduce unnecessary cache entry name copying	2022-06-13 15:53:43 -07:00
ZheNing Hu	6d858341d2	read-cache.c: reduce unnecessary cache entry name copying `575fa8a3` (read-cache: read data in a hash-independent way, 2019-02-19) added a new code to copy from the on-disk data into the name member of the in-core cache entry, which is already done immediately after that in a way that takes prefix-compression into account. Remove this code, as it is not just unnecessary, but also can be reading beyond the on-disk data, when we are copying very long prefix string from the previous entry. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: rewrote the log message with Réne's findings] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-06 10:37:06 -07:00
Junio C Hamano	c276c21da6	Merge branch 'ds/sparse-sparse-checkout' "sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add\|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test	2022-06-03 14:30:35 -07:00
Derrick Stolee	9fadb373dd	sparse-index: introduce partially-sparse indexes A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * INDEX_EXPANDED (0): No sparse directories exist. This is always the case for repositories that do not use cone-mode sparse-checkout. * INDEX_COLLAPSED: Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in INDEX_EXPANDED mode but to actually do the necessary work when in INDEX_PARTIALLY_SPARSE mode. The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Victoria Dye	491df5f679	read-cache: set sparsity when index is new When the index read in 'do_read_index()' does not exist on-disk, mark the index "sparse" if the executing command does not require a full index and sparse index is otherwise enabled. Some commands (such as 'git stash -u') implicitly create a new index (when the 'GIT_INDEX_FILE' variable points to a non-existent file) and perform some operation on it. However, when this index is created, it isn't created with the same sparsity settings as the repo index. As a result, while these indexes may be sparse during the operation, they are always expanded before being written to disk. We can avoid that expansion by defaulting the index to "sparse", in which case it will only be expanded if the full index is needed. Note that the function 'set_new_index_sparsity()' is created despite having only a single caller because additional callers will be added in a subsequent patch. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-10 16:45:12 -07:00
Junio C Hamano	909d5b646e	Merge branch 'vd/mv-refresh-stat' "git mv" failed to refresh the cached stat information for the entry it moved. * vd/mv-refresh-stat: mv: refresh stat info for moved entry	2022-04-04 10:56:24 -07:00
Victoria Dye	b7f9130a06	mv: refresh stat info for moved entry Update the stat info of the moved index entry in 'rename_index_entry_at()' if the entry is up-to-date with the index. Internally, 'git mv' uses 'rename_index_entry_at()' to move the source index entry to the destination. However, it directly copies the stat info of the original cache entry, which will not reflect the 'ctime' of the file renaming operation that happened as part of the move. If a file is otherwise up-to-date with the index, that difference in 'ctime' will make the entry appear out-of-date until the next index-refreshing operation (e.g., 'git status'). Some commands, such as 'git reset', use the cached stat information to determine whether a file is up-to-date; if this information is incorrect, the command will fail when it should pass. In order to ensure a moved entry is evaluated as 'up-to-date' when appropriate, refresh the destination index entry's stat info in 'git mv' if and only if the file is up-to-date. Note that the test added in 't7001-mv.sh' requires a "sleep 1" to ensure the 'ctime' of the file creation will be definitively older than the 'ctime' of the renamed file in 'git mv'. Reported-by: Maximilian Reichel <reichemn@icloud.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-29 09:45:02 -07:00
Junio C Hamano	eb804cd405	Merge branch 'ns/core-fsyncmethod' Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-03-25 16:38:24 -07:00
Junio C Hamano	430883a70c	Merge branch 'ab/object-file-api-updates' Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in .c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables	2022-03-16 17:53:08 -07:00
Neeraj Singh	ba95e96d4c	core.fsync: new option to harden the index This commit introduces the new ability for the user to harden the index. In the event of a system crash, the index must be durable for the user to actually find a file that has been added to the repo and then deleted from the working tree. We use the presence of the COMMIT_LOCK flag and absence of the alternate_index_output as a proxy for determining whether we're updating the persistent index of the repo or some temporary index. We don't sync these temporary indexes. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Neeraj Singh	020406eaa5	core.fsync: introduce granular fsync control infrastructure This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Ævar Arnfjörð Bjarmason	c80d226a04	object-file API: have write_object_file() take "enum object_type" Change the write_object_file() function to take an "enum object_type" instead of a "const char type". Its callers either passed {commit,tree,blob,tag}_type and can pass the corresponding OBJ_ type instead, or were hardcoding strings like "blob". This avoids the back & forth fragility where the callers of write_object_file() would have the enum type, and convert it themselves via type_name(). We do have to now do that conversion ourselves before calling write_object_file_prepare(), but those codepaths will be similarly adjusted in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Junio C Hamano	2f45f3e2bc	Merge branch 'vd/sparse-clean-etc' "git update-index", "git checkout-index", and "git clean" are taught to work better with the sparse checkout feature. * vd/sparse-clean-etc: update-index: reduce scope of index expansion in do_reupdate update-index: integrate with sparse index update-index: add tests for sparse-checkout compatibility checkout-index: integrate with sparse index checkout-index: add --ignore-skip-worktree-bits option checkout-index: expand sparse checkout compatibility tests clean: integrate with sparse index reset: reorder wildcard pathspec conditions reset: fix validation in sparse index test	2022-02-17 16:25:05 -08:00
Junio C Hamano	1b82b936e3	Merge branch 'js/sparse-vs-split-index' Mark in various places in the code that the sparse index and the split index features are mutually incompatible. * js/sparse-vs-split-index: split-index: it really is incompatible with the sparse index t1091: disable split index sparse-index: sparse index is disallowed when split index is active	2022-02-09 14:21:01 -08:00
Junio C Hamano	c70bc338e9	Merge branch 'ab/config-based-hooks-2' More "config-based hooks". * ab/config-based-hooks-2: run-command: remove old run_hook_{le,ve}() hook API receive-pack: convert push-to-checkout hook to hook.h read-cache: convert post-index-change to use hook.h commit: convert {pre-commit,prepare-commit-msg} hook to hook.h git-p4: use 'git hook' to run hooks send-email: use 'git hook run' for 'sendemail-validate' git hook run: add an --ignore-missing flag hooks: convert worktree 'post-checkout' hook to hook library hooks: convert non-worktree 'post-checkout' hook to hook library merge: convert post-merge to use hook.h am: convert applypatch-msg to use hook.h rebase: convert pre-rebase to use hook.h hook API: add a run_hooks_l() wrapper am: convert {pre,post}-applypatch to use hook.h gc: use hook library for pre-auto-gc hook hook API: add a run_hooks() wrapper hook: add 'run' subcommand	2022-02-09 14:21:00 -08:00
Johannes Schindelin	451b66c533	split-index: it really is incompatible with the sparse index ... at least for now. So let's error out if we are even trying to initialize the split index when the index is sparse, or when trying to write the split index extension for a sparse index. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-23 17:06:23 -08:00
Victoria Dye	c35e9f5ecd	update-index: integrate with sparse index Enable use of the sparse index with `update-index`. Most variations of `update-index` work without explicitly expanding the index or making any other updates in or outside of `update-index.c`. The one usage requiring additional changes is `--cacheinfo`; if a file inside a sparse directory was specified, the index would not be expanded until after the cache tree is invalidated, leading to a mismatch between the index and cache tree. This scenario is handled by rearranging `add_index_entry_with_check`, allowing `index_name_stage_pos` to expand the index before attempting to invalidate the relevant cache tree path, avoiding cache tree/index corruption. Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-13 13:49:45 -08:00
Emily Shaffer	dbb1c61365	read-cache: convert post-index-change to use hook.h Move the post-index-change hook away from run-command.h to and over to the new hook.h library. This removes the last direct user of "run_hook_ve()" outside of run-command.c ("run_hook_le()" still uses it). So we can make the function static now. A subsequent commit will remove this code entirely when "run_hook_le()" itself goes away. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-07 15:19:35 -08:00
Marc Strapetz	2ede073fd2	update-index: refresh should rewrite index in case of racy timestamps 'git update-index --refresh' and '--really-refresh' should force writing of the index file if racy timestamps have been encountered, as 'git status' already does [1]. Note that calling 'git update-index --refresh' still does not guarantee that there will be no more racy timestamps afterwards (the same holds true for 'git status'): - calling 'git update-index --refresh' immediately after touching and adding a file may still leave racy timestamps if all three operations occur within the racy-tolerance (usually 1 second unless USE_NSEC has been defined) - calling 'git update-index --refresh' for timestamps which are set into the future will leave them racy To guarantee that such racy timestamps will be resolved would require to wait until the system clock has passed beyond these timestamps and only then write the index file. Especially for future timestamps, this does not seem feasible because of possibly long delays/hangs. [1] https://lore.kernel.org/git/d3dd805c-7c1d-30a9-6574-a7bfcb7fc013@syntevo.com/ Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-07 12:37:31 -08:00
Junio C Hamano	f0850875fd	Merge branch 'vd/sparse-reset' Various operating modes of "git reset" have been made to work better with the sparse index. * vd/sparse-reset: unpack-trees: improve performance of next_cache_entry reset: make --mixed sparse-aware reset: make sparse-aware (except --mixed) reset: integrate with sparse index reset: expand test coverage for sparse checkouts sparse-index: update command for expand/collapse test reset: preserve skip-worktree bit in mixed reset reset: rename is_missing to !is_in_reset_tree	2021-12-10 14:35:12 -08:00
Junio C Hamano	5396d7b298	Merge branch 'vd/sparse-sparsity-fix-on-read' Ensure that the sparseness of the in-core index matches the index.sparse configuration specified by the repository immediately after the on-disk index file is read. * vd/sparse-sparsity-fix-on-read: sparse-index: update do_read_index to ensure correct sparsity sparse-index: add ensure_correct_sparsity function sparse-index: avoid unnecessary cache tree clearing test-read-cache.c: prepare_repo_settings after config init	2021-12-10 14:35:01 -08:00
Victoria Dye	20ec2d034c	reset: make sparse-aware (except --mixed) Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (without expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-29 12:51:26 -08:00
Victoria Dye	7ca4fc8819	sparse-index: update do_read_index to ensure correct sparsity Unless `command_requires_full_index` forces index expansion, ensure in-core index sparsity matches config settings on read by calling `ensure_correct_sparsity`. This makes the behavior of the in-core index more consistent between different methods of updating sparsity: manually changing the `index.sparse` config setting vs. executing `git sparse-checkout --[no-]sparse-index init` Although index sparsity is normally updated with `git sparse-checkout init`, ensuring correct sparsity after a manual `index.sparse` change has some practical benefits: 1. It allows for command-by-command sparsity toggling with `-c index.sparse=<true\|false>`, e.g. when troubleshooting issues with the sparse index. 2. It prevents users from experiencing abnormal slowness after setting `index.sparse` to `true` due to use of a full index in all commands until the on-disk index is updated. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-24 16:32:39 -08:00
Junio C Hamano	6ffb5fc069	Merge branch 'rs/add-dry-run-without-objects' Stop "git add --dry-run" from creating new blob and tree objects. * rs/add-dry-run-without-objects: add: don't write objects with --dry-run	2021-10-25 16:06:57 -07:00
Junio C Hamano	a86ed75f32	Merge branch 'rs/make-verify-path-really-verify-again' Recent sparse-index work broke safety against attempts to add paths with trailing slashes to the index, which has been corrected. * rs/make-verify-path-really-verify-again: read-cache: let verify_path() reject trailing dir separators again read-cache: add verify_path_internal() t3905: show failure to ignore sub-repo	2021-10-18 15:47:58 -07:00
René Scharfe	e578d0311d	add: don't write objects with --dry-run When the option --dry-run/-n is given, "git add" doesn't change the index, but still writes out new object files. Only hash the latter without writing instead to make the run as dry as possible. Use this opportunity to also make the hash_flags variable unsigned, to match the index_path() parameter it is used as. Reported-by: git.mexon@spamgourmet.com Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-12 13:15:49 -07:00
Junio C Hamano	ed4d535342	Merge branch 'sg/test-split-index-fix' Test updates. * sg/test-split-index-fix: read-cache: fix GIT_TEST_SPLIT_INDEX tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests read-cache: look for shared index files next to the index, too t1600-index: disable GIT_TEST_SPLIT_INDEX t1600-index: don't run git commands upstream of a pipe t1600-index: remove unnecessary redirection	2021-10-11 10:21:47 -07:00
René Scharfe	c8ad9d04c6	read-cache: let verify_path() reject trailing dir separators again `6e773527b6` (sparse-index: convert from full to sparse, 2021-03-30) made verify_path() accept trailing directory separators for directories, which is necessary for sparse directory entries. This clemency causes "git stash" to stumble over sub-repositories, though, and there may be more unintended side-effects. Avoid them by restoring the old verify_path() behavior and accepting trailing directory separators only in places that are supposed to handle sparse directory entries. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-07 17:52:26 -07:00
René Scharfe	2a1ae649a4	read-cache: add verify_path_internal() Turn verify_path() into an internal function that distinguishes between valid paths and those with trailing directory separators and rename it to verify_path_internal(). Provide a wrapper with the old behavior under the old name. No functional change intended. The new function will be used in the next patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-07 17:49:39 -07:00
Junio C Hamano	d8d33378ed	Merge branch 'ab/repo-settings-cleanup' Code cleanup. * ab/repo-settings-cleanup: repository.h: don't use a mix of int and bitfields repo-settings.c: simplify the setup read-cache & fetch-negotiator: check "enum" values in switch() environment.c: remove test-specific "ignore_untracked..." variable wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c	2021-10-06 13:40:11 -07:00
Ævar Arnfjörð Bjarmason	3050b6dfc7	repo-settings.c: simplify the setup Simplify the setup code in repo-settings.c in various ways, making the code shorter, easier to read, and requiring fewer hacks to do the same thing as it did before: Since `7211b9e753` (repo-settings: consolidate some config settings, 2019-08-13) we have memset() the whole "settings" structure to -1 in prepare_repo_settings(), and subsequently relied on the -1 value. Most of the fields did not need to be initialized to -1, and because we were doing that we had the enum labels "UNTRACKED_CACHE_UNSET" and "FETCH_NEGOTIATION_UNSET" purely to reflect the resulting state created this memset() in prepare_repo_settings(). No other code used or relied on them, more on that below. For the rest most of the subsequent "are we -1, then read xyz" can simply be removed by re-arranging what we read first. E.g. when setting the "index.version" setting we should have first read "feature.experimental", so that it (and "feature.manyfiles") can provide a default for our "index.version". Instead the code setting it, added when "feature.manyFiles"[1] was created, was using the UPDATE_DEFAULT_BOOL() macro added in an earlier commit[2]. That macro is now gone, since it was only needed for this pattern of reading things in the wrong order. This also fixes an (admittedly obscure) logic error where we'd conflate an explicit "-1" value in the config with our own earlier memset() -1. We can also remove the UPDATE_DEFAULT_BOOL() wrapper added in [3]. Using it is redundant to simply using the return value from repo_config_get_bool(), which is non-zero if the provided key exists in the config. Details on edge cases relating to the memset() to -1, continued from "more on that below" above: * UNTRACKED_CACHE_KEEP: In [4] the "unset" and "keep" handling for core.untrackedCache was consolidated. But it while we understand the "keep" value, we don't handle it differently than the case of any other unknown value. So let's retain UNTRACKED_CACHE_KEEP and remove the UNTRACKED_CACHE_UNSET setting (which was always implicitly UNTRACKED_CACHE_KEEP before). We don't need to inform any code after prepare_repo_settings() that the setting was "unset", as far as anyone else is concerned it's core.untrackedCache=keep. if "core.untrackedcache" isn't present in the config. * FETCH_NEGOTIATION_UNSET & FETCH_NEGOTIATION_NONE: Since these two two enum fields added in [5] don't rely on the memzero() setting them to "-1" anymore we don't have to provide them with explicit values. 1. `c6cc4c5afd` (repo-settings: create feature.manyFiles setting, 2019-08-13) 2. `31b1de6a09` (commit-graph: turn on commit-graph by default, 2019-08-13) 3. `31b1de6a09` (commit-graph: turn on commit-graph by default, 2019-08-13) 4. `ad0fb65999` (repo-settings: parse core.untrackedCache, 2019-08-13) 5. `aaf633c2ad` (repo-settings: create feature.experimental setting, 2019-08-13) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-22 13:15:00 -07:00
Ævar Arnfjörð Bjarmason	f1bee82873	read-cache & fetch-negotiator: check "enum" values in switch() Change tweak_untracked_cache() in "read-cache.c" to use a switch() to have the compiler assert that we checked all possible values in the "enum untracked_cache_setting" type, and likewise remove the "default" case in fetch_negotiator_init() in favor of checking for "FETCH_NEGOTIATION_UNSET" and "FETCH_NEGOTIATION_NONE". As will be discussed in a subsequent we'll only ever have either of these set to FETCH_NEGOTIATION_NONE, FETCH_NEGOTIATION_UNSET and UNTRACKED_CACHE_UNSET within the prepare_repo_settings() function itself. In preparation for fixing that code let's add a BUG() here to mark this as unreachable code. See `ad0fb65999` (repo-settings: parse core.untrackedCache, 2019-08-13) for when the "unset" and "keep" handling for core.untrackedCache was consolidated, and `aaf633c2ad` (repo-settings: create feature.experimental setting, 2019-08-13) for the addition of the "default" pattern in "fetch-negotiator.c". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-22 13:15:00 -07:00
SZEDER Gábor	e8ffd034c8	read-cache: fix GIT_TEST_SPLIT_INDEX Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the split index feature and trigger index splitting (mostly) randomly. Alas, this has been broken since `6e37c8ed3c` (read-cache.c: fix writing "link" index ext with null base oid, 2019-02-13), and GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since then. This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't restore the pre-6e37c8ed3c behavior. To understand the bug, the fix, and the behavior change we first have to look at how GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c: There are two places where we check the value of GIT_TEST_SPLIT_INDEX, and before `6e37c8ed3c` they worked like this: 1) In the lower-level do_write_index(), where, if GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index(). This call merely allocates and zero-initializes 'istate->split_index', but does nothing else (i.e. doesn't fill the base/shared index with cache entries, doesn't actually write a shared index file, etc.). Pertinent to this issue, the hash of the base index remains all zeroed out. 2) In the higher-level write_locked_index(), but only when 'istate->split_index' has already been initialized. Then, if GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that triggers index splitting later in this function. This randomness comes from the first byte of the hash of the base index via an 'if ((first_byte & 15) < 6)' condition. However, if 'istate->split_index' hasn't been initialized (i.e. it is still NULL), then write_locked_index() just calls do_write_locked_index(), which internally calls the above mentioned do_write_index(). This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index splitting randomly, the first two index writes were always deterministic (though I suspect this was unintentional): - The initial index write never splits the index. During the first index write write_locked_index() is called with 'istate->split_index' still uninitialized, so the check in 2) is not executed. It still calls do_write_index(), though, which then executes the check in 1). The resulting all zero base index hash then leads to the 'link' extension being written to '.git/index', though a shared index file is not written: $ rm .git/index $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file $ test-tool dump-split-index .git/index own c6ef71168597caec8553c83d9d0048f1ef416170 base 0000000000000000000000000000000000000000 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file replacements: deletions: $ ls -l .git/sharedindex.* ls: cannot access '.git/sharedindex.': No such file or directory - The second index write always splits the index. When the index written in the previous point is read, 'istate->split_index' is initialized because of the presence of the 'link' extension. So during the second write write_locked_index() does run the check in 2), and the first byte of the all zero base index hash always fulfills the randomness condition, which in turn always triggers the index splitting. - Subsequent index writes will find the 'link' extension with a real non-zero base index hash, so from then on the check in 2) is executed and the first byte of the base index hash is as random as it gets (coming from the SHA-1 of index data including timestamps and inodes...). All this worked until `6e37c8ed3c` came along, and stopped writing the 'link' extension if the hash of the base index was all zero: $ rm .git/index $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file $ test-tool dump-split-index .git/index own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7 not a split index $ ls -l .git/sharedindex. ls: cannot access '.git/sharedindex.*': No such file or directory So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't write a 'link' extension, in the second index write 'istate->split_index' remains uninitialized, and the check in 2) is not executed, and ultimately the index is never split. Fix this by modifying write_locked_index() to make sure to check GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still uninitialized, and initialize it if necessary. The check for GIT_TEST_SPLIT_INDEX and separate init_split_index() call in do_write_index() thus becomes unnecessary, so remove it. Furthermore, add a test to 't1700-split-index.sh' to make sure that GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the index splitting on the first index write, because after that it will be random). Note that this change does not restore the pre-6e37c8ed3c behaviour, as it will deterministically split the index already on the first index write. Since GIT_TEST_SPLIT_INDEX is purely a developer aid, there is no backwards compatibility issue here. The new behaviour did trigger test failures in 't0003-attributes.sh' and 't1600-index.sh', though, which have been fixed in preparatory patches in this series. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 23:28:04 -07:00
SZEDER Gábor	998330ac2e	read-cache: look for shared index files next to the index, too When reading a split index git always looks for its referenced shared base index in the gitdir of the current repository, even when reading an alternate index specified via GIT_INDEX_FILE, and even when that alternate index file is the "main" '.git/index' file of an other repository. However, if that split index and its referenced shared index files were written by a git command running entirely in that other repository, then, naturally, the shared index file is written to that other repository's gitdir. Consequently, a git command attempting to read that shared index file while running in a different repository won't be able find it and will error out. I'm not sure in what use case it is necessary to read the index of one repository by a git command running in a different repository, but it is certainly possible to do so, and in fact the test 'bare repository: check that --cached honors index' in 't0003-attributes.sh' does exactly that. If GIT_TEST_SPLIT_INDEX=1 were to split the index in just the right moment [1], then this test would indeed fail, because the referenced shared index file could not be found. Let's look for the referenced shared index file not only in the gitdir of the current directory, but, if the shared index is not there, right next to the split index as well. [1] We haven't seen this issue trigger a failure in t0003 yet, because: - While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index splitting randomly, the first index write has always been deterministic and it has never split the index. - That alternate index file in the other repository is written only once in the entire test script, so it's never split. However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while doing so it will slightly change its behavior to always split the index already on the first index write, and t0003 would always fail without this patch. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 23:23:47 -07:00
Derrick Stolee	ce7a9f0141	sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX environment variable or the "index.sparse" config setting before converting the index to a sparse one. This is for ease of use since all current consumers are preparing to compress the index before writing it to disk. If these settings are not enabled, then convert_to_sparse() silently returns without doing anything. We will add a consumer in the next change that wants to use the sparse index as an in-memory data structure, regardless of whether the on-disk format should be sparse. To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip these config checks when enabled. All current consumers are modified to pass '0' in the new 'flags' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:10 -07:00
Junio C Hamano	31f9acf9ce	Merge branch 'ah/plugleaks' Leak plugging. * ah/plugleaks: reset: clear_unpack_trees_porcelain to plug leak builtin/rebase: fix options.strategy memory lifecycle builtin/merge: free found_ref when done builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv convert: release strbuf to avoid leak read-cache: call diff_setup_done to avoid leak ref-filter: also free head for ATOM_HEAD to avoid leak diffcore-rename: move old_dir/new_dir definition to plug leak builtin/for-each-repo: remove unnecessary argv copy to plug leak builtin/submodule--helper: release unused strbuf to avoid leak environment: move strbuf into block to plug leak fmt-merge-msg: free newly allocated temporary strings when done	2021-08-04 13:28:52 -07:00
Junio C Hamano	8230107f33	Merge branch 'jt/bulk-prefetch' "git read-tree" had a codepath where blobs are fetched one-by-one from the promisor remote, which has been corrected to fetch in bulk. * jt/bulk-prefetch: cache-tree: prefetch in partial clone read-tree unpack-trees: refactor prefetching code	2021-08-02 14:06:42 -07:00
Junio C Hamano	b271a3034f	Merge branch 'ds/status-with-sparse-index' "git status" codepath learned to work with sparsely populated index without hydrating it fully. * ds/status-with-sparse-index: t1092: document bad sparse-checkout behavior fsmonitor: integrate with sparse index wt-status: expand added sparse directory entries status: use sparse-index throughout status: skip sparse-checkout percentage with sparse-index diff-lib: handle index diffs with sparse dirs dir.c: accept a directory as part of cone-mode patterns unpack-trees: unpack sparse directory entries unpack-trees: rename unpack_nondirectories() unpack-trees: compare sparse directories correctly unpack-trees: preserve cache_bottom t1092: add tests for status/add and sparse files t1092: expand repository data shape t1092: replace incorrect 'echo' with 'cat' sparse-index: include EXTENDED flag when expanding sparse-index: skip indexes with unmerged entries	2021-07-28 13:18:02 -07:00
Andrzej Hunt	8d833e9337	read-cache: call diff_setup_done to avoid leak repo_diff_setup() calls through to diff.c's static prep_parse_options(), which in turn allocates a new array into diff_opts.parseopts. diff_setup_done() is responsible for freeing that array, and has the benefit of verifying diff_opts too - hence we add a call to diff_setup_done() to avoid leaking parseopts. Output from the leak as found while running t0090 with LSAN: Direct leak of 7120 byte(s) in 1 object(s) allocated from: #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3 #1 0xa8bf89 in do_xmalloc wrapper.c:41:8 #2 0x7a7bae in prep_parse_options diff.c:5636:2 #3 0x7a7bae in repo_diff_setup diff.c:4611:2 #4 0x93716c in repo_index_has_changes read-cache.c:2518:3 #5 0x872233 in unclean merge-ort-wrappers.c:12:14 #6 0x872233 in merge_ort_recursive merge-ort-wrappers.c:53:6 #7 0x5d5b11 in try_merge_strategy builtin/merge.c:752:12 #8 0x5d0b6b in cmd_merge builtin/merge.c:1666:9 #9 0x4ce83e in run_builtin git.c:475:11 #10 0x4ccafe in handle_builtin git.c:729:3 #11 0x4cb01c in run_argv git.c:818:4 #12 0x4cb01c in cmd_main git.c:949:19 #13 0x6bdc2d in main common-main.c:52:11 #14 0x7f551eb51349 in __libc_start_main (/lib64/libc.so.6+0x24349) SUMMARY: AddressSanitizer: 7120 byte(s) leaked in 1 allocation(s) Signed-off-by: Andrzej Hunt <andrzej@ahunt.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-26 12:19:20 -07:00
Jonathan Tan	b2896d2739	unpack-trees: refactor prefetching code Refactor the prefetching code in unpack-trees.c into its own function, because it will be used elsewhere in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-23 14:21:57 -07:00
Junio C Hamano	a93c6fd677	Merge branch 'ew/mmap-failures' Error message update. * ew/mmap-failures: xmmap: inform Linux users of tuning knobs on ENOMEM	2021-07-16 17:42:47 -07:00
Derrick Stolee	d76723ee53	status: use sparse-index throughout By testing 'git -c core.fsmonitor= status -uno', we can check for the simplest index operations that can be made sparse-aware. The necessary implementation details are already integrated with sparse-checkout, so modify command_requires_full_index to be zero for cmd_status(). In refresh_index(), we loop through the index entries to refresh their stat() information. However, sparse directories have no stat() information to populate. Ignore these entries. This allows 'git status' to no longer expand a sparse index to a full one. This is further tested by dropping the "-uno" option and adding an untracked file into the worktree. The performance test p2000-sparse-checkout-operations.sh demonstrates these improvements: Test HEAD~1 HEAD ----------------------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0% 2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7% 2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3% 2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9% Note that since HEAD~1 was expanding the sparse index by parsing trees, it was artificially slower than the full index case. Thus, the 98% improvement is misleading, and instead we should celebrate the 0.34s to 0.05s improvement of 85%. This is more indicative of the peformance gains we are expecting by using a sparse index. Note: we are dropping the assignment of core.fsmonitor here. This is not necessary for the test script as we are not altering the config any other way. Correct integration with FS Monitor will be validated in later changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Junio C Hamano	2134c3fb1f	Merge branch 'ab/progress-cleanup' Code clean-up. * ab/progress-cleanup: read-cache.c: don't guard calls to progress.c API	2021-07-08 13:15:05 -07:00
Eric Wong	dc05929411	xmmap: inform Linux users of tuning knobs on ENOMEM Linux users may benefit from additional information on how to avoid ENOMEM from mmap despite the system having enough RAM to accomodate them. We can't reliably unmap pack windows to work around the issue since malloc and other library routines may mmap without our knowledge. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-29 23:14:25 -07:00
Ævar Arnfjörð Bjarmason	b7b793d1e7	read-cache.c: don't guard calls to progress.c API Don't guard the calls to the progress.c API with "if (progress)". The API itself will check this. This doesn't change any behavior, but makes this code consistent with the rest of the codebase. See `ae9af12287` (status: show progress bar if refreshing the index takes too long, 2018-09-15) for the commit that added the pattern we're changing here. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-08 10:09:13 +09:00
Derrick Stolee	f6e2cd0625	read-cache: delete unused hashing methods These methods were marked as MAYBE_UNUSED in the previous change to avoid a complicated diff. Delete them entirely, since we now use the hashfile API instead of this custom hashing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-19 16:41:21 +09:00
Derrick Stolee	410334ed52	read-cache: use hashfile instead of git_hash_ctx The do_write_index() method in read-cache.c has its own hashing logic and buffering mechanism. Specifically, the ce_write() method was introduced by `4990aadc` (Speed up index file writing by chunking it nicely, 2005-04-20) and similar mechanisms were introduced a few months later in `c38138cd` (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). Based on the timing, in the early days of the Git codebase, I figured that these roughly equivalent code paths were never unified only because it got lost in the shuffle. The hashfile API has since been used extensively in other file formats, such as pack-indexes, multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to unify the index writing code to use the same mechanism. I discovered this disparity while trying to create a new index format that uses the chunk-format API. That API uses a hashfile as its base, so it is incompatible with the custom code in read-cache.c. This rewrite is rather straightforward. It replaces all writes to the temporary file with writes to the hashfile struct. This takes care of many of the direct interactions with the_hash_algo. There are still some git_hash_ctx uses remaining: the extension headers are hashed for use in the End of Index Entries (EOIE) extension. This use of the git_hash_ctx is left as-is. There are multiple reasons to not use a hashfile here, including the fact that the data is not actually writing to a file, just a hash computation. These hashes do not block our adoption of the chunk-format API in a future change to the index, so leave it as-is. The internals of the algorithms are mostly identical. Previously, the hashfile API used a smaller 8KB buffer instead of the 128KB buffer from read-cache.c. The previous change already unified these sizes. There is one subtle point: we do not pass the CSUM_FSYNC to the finalize_hashfile() method, which differs from most consumers of the hashfile API. The extra fsync() call indicated by this flag causes a significant peformance degradation that is noticeable for quick commands that write the index, such as "git add". Other consumers can absorb this cost with their more complicated data structure organization, and further writing structures such as pack-files and commit-graphs is rarely in the critical path for common user interactions. Some static methods become orphaned in this diff, so I marked them as MAYBE_UNUSED. The diff is much harder to read if they are deleted during this change. Instead, they will be deleted in the following change. In addition to the test suite passing, I computed indexes using the previous binaries and the binaries compiled after this change, and found the index data to be exactly equal. Finally, I did extensive performance testing of "git update-index --force-write" on repos of various sizes, including one with over 2 million paths at HEAD. These tests demonstrated less than 1% difference in behavior. As expected, the performance should be considered unchanged. The previous changes to increase the hashfile buffer size from 8K to 128K ensured this change would not create a peformance regression. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-19 16:41:21 +09:00
Junio C Hamano	a737e1f1d2	Merge branch 'mt/parallel-checkout-part-3' The final part of "parallel checkout". * mt/parallel-checkout-part-3: ci: run test round with parallel-checkout enabled parallel-checkout: add tests related to .gitattributes t0028: extract encoding helpers to lib-encoding.sh parallel-checkout: add tests related to path collisions parallel-checkout: add tests for basic operations checkout-index: add parallel checkout support builtin/checkout.c: complete parallel checkout support make_transient_cache_entry(): optionally alloc from mem_pool	2021-05-16 21:05:23 +09:00
Junio C Hamano	aaa3c8065d	Merge branch 'bc/hash-transition-interop-part-1' SHA-256 transition. * bc/hash-transition-interop-part-1: hex: print objects using the hash algorithm member hex: default to the_hash_algo on zero algorithm value builtin/pack-objects: avoid using struct object_id for pack hash commit-graph: don't store file hashes as struct object_id builtin/show-index: set the algorithm for object IDs hash: provide per-algorithm null OIDs hash: set, copy, and use algo field in struct object_id builtin/pack-redundant: avoid casting buffers to struct object_id Use the final_oid_fn to finalize hashing of object IDs hash: add a function to finalize object IDs http-push: set algorithm when reading object ID Always use oidread to read into struct object_id hash: add an algo member to struct object_id	2021-05-10 16:59:46 +09:00

1 2 3 4 5 ...

720 Коммитов