microsoft/git - git

Граф коммитов

Автор	SHA1	Сообщение	Дата
Victoria Dye	0f329b9ae4	unpack-trees: preserve index sparsity When unpacking trees, set the default sparsity of the resultant index based on repo settings and 'is_sparse_index_allowed()'. Normally, when executing 'unpack_trees', the output index is marked sparse when (and only when) it unpacks a sparse directory. However, an index may be "sparse" even if it contains no sparse directories - when all files fall inside the sparse-checkout definition or otherwise have SKIP_WORKTREE disabled. Therefore, the output index may be marked "full" even when it is "sparse", resulting in unnecessary 'ensure_full_index' calls when writing to disk. Avoid this by setting the "default" index sparsity to match what is expected for the repository. As a consequence of this fix, the (non-merge) 'read-tree' performed when applying a stash with untracked files no longer expands the index. Update the corresponding test in 't1092'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-10 16:45:13 -07:00
Victoria Dye	99430aa12c	Revert "unpack-trees: improve performance of next_cache_entry" This reverts commit `f2a454e0a5` (unpack-trees: improve performance of next_cache_entry, 2021-11-29). The "hint" value was originally needed to improve performance in 'git reset -- <pathspec>' caused by 'cache_bottom' lagging behind its correct value when using a sparse index. The 'cache_bottom' tracking has since been corrected, removing the need for an additional "pseudo-cache_bottom" tracking variable. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-17 09:13:21 -07:00
Victoria Dye	bfc763df77	unpack-trees: increment cache_bottom for sparse directories Correct tracking of the 'cache_bottom' for cases where sparse directories are present in the index. BACKGROUND ---------- The 'unpack_trees_options.cache_bottom' is a variable that tracks the in-progress "bottom" of the cache as 'unpack_trees()' iterates through the contents of the index. Most importantly, this value informs the sequential return values of 'next_cache_entry()' which, in the "diff cache" usage of 'unpack_callback()', are either unpacked as-is or are passed into the diff machinery. The 'cache_bottom' is intended to track the position of the first entry in the index that has not yet been diffed or unpacked. It is advanced in two main ways: either it is incremented when an index entry is marked as "used" (in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a directory is unpacked, in which case it is increased by an amount equaling the number of index entries inside that tree. In `17a1bb570b` (unpack-trees: preserve cache_bottom, 2021-07-14), it was identified that sparse directories posed a problem to the above 'cache_bottom' advancement logic - because a sparse directory was both an index entry that could be "used" and a directory that can be unpacked, the 'cache_bottom' would be incremented too many times. To solve this problem, the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse directories. INCORRECT CACHE_BOTTOM TRACKING ------------------------------- Skipping the 'cache_bottom' advancement for sparse directories in 'mark_ce_used()' breaks down in two cases: 1. When the 'unpack_trees()' operation is not a "cache diff" (because the directory contents-based incrementing of 'cache_bottom' does not happen). 2. When a cache diff is performed with a pathspec (because 'unpack_index_entry()' will unpack a sparse directory not matched by the pathspec without performing the directory contents-based increment). The former luckily does not appear to affect 'git' behavior, likely because 'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses 'find_index_entry()' - rather than 'next_cache_entry()' - to find the index entries to unpack). The latter, however, causes 'cache_bottom' to "lag behind" its intended position by an amount equal to the number of sparse directories unpacked so far with 'unpack_index_entry()'. If a repository is structured such that any sparse directories are ordered lexicographically after any pathspec-matching directories, though, this issue won't present any adverse behavior. This was the case with the 't1092-sparse-checkout-compatibility.sh' tests before the addition of the 'before/' sparse directory (ordered before the in-cone 'deep/' directory), therefore sidestepping the issue. Once the 'before/' directory was added, though, 'cache_bottom' began to lag behind its intended position, causing 'next_cache_entry()' to return index entries it had already processed and, ultimately, an incorrect diff. CORRECTING CACHE_BOTTOM ----------------------- The problems observed in 't1092' come from 'cache_bottom' lagging behind in cases where the cache tree-based advancement doesn't occur. To solve this, then, the fix in `17a1bb570b` is "reversed"; rather than skipping 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory contents-based advancement for sparse directories. Now, every index entry can be accounted for in 'cache_bottom': * if you're working with a single index entry, 'cache_bottom' is incremented in 'mark_ce_used()' * if you're working with a directory that contains index entries (but is not one itself), 'cache_bottom' is incremented by the number of entries in that directory. Finally, change the 'test_expect_failure' tests in 't1092' failing due to this bug back to 'test_expect_success'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-17 09:13:21 -07:00
Junio C Hamano	190f9bf62a	Merge branch 'vd/sparse-read-tree' "git read-tree" has been made to be aware of the sparse-index feature. * vd/sparse-read-tree: read-tree: make three-way merge sparse-aware read-tree: make two-way merge sparse-aware read-tree: narrow scope of index expansion for '--prefix' read-tree: integrate with sparse index read-tree: expand sparse checkout test coverage read-tree: explicitly disallow prefixes with a leading '/' status: fix nested sparse directory diff in sparse index sparse-index: prevent repo root from becoming sparse	2022-03-16 17:53:08 -07:00
Victoria Dye	f27c170f64	read-tree: make three-way merge sparse-aware Enable use of 'merged_sparse_dir' in 'threeway_merge'. As with two-way merge, the contents of each conflicted sparse directory are merged without referencing the index, avoiding sparse index expansion. As with two-way merge, the 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with edit/edit conflicts in sparse directories' confirms that three-way merges with edit/edit changes (both with and without conflicts) inside a sparse directory result in the correct index state or error message. To ensure the index is not unnecessarily expanded, add three-way merge cases to 'sparse index is not expanded: read-tree'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 12:36:01 -08:00
Victoria Dye	ab81047a6c	read-tree: make two-way merge sparse-aware Enable two-way merge with 'git read-tree' without expanding the sparse index. When in a sparse index, a two-way merge will trivially succeed as long as there are not changes to the same sparse directory in multiple trees (i.e., sparse directory-level "edit-edit" conflicts). If there are such conflicts, the merge will fail despite the possibility that individual files could merge cleanly. In order to resolve these "edit-edit" conflicts, "conflicted" sparse directories are - rather than rejected - merged by traversing their associated trees by OID. For each child of the sparse directory: 1. Files are merged as normal (see Documentation/git-read-tree.txt for details). 2. Subdirectories are treated as sparse directories and merged in 'twoway_merge'. If there are no conflicts, they are merged according to the rules in Documentation/git-read-tree.txt; otherwise, the subdirectory is recursively traversed and merged. This process allows sparse directories to be individually merged at the necessary depth without expanding a full index. The 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with edit/edit conflicts in sparse directories' tests two-way merges with 1) changes inside sparse directories that do not conflict and 2) changes that do conflict (with the correct file(s) reported in the error message). Additionally, add two-way merge cases to 'sparse index is not expanded: read-tree' to confirm that the index is not expanded regardless of whether edit/edit conflicts are present in a sparse directory. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 12:36:01 -08:00
Victoria Dye	7497039241	read-tree: narrow scope of index expansion for '--prefix' When 'git read-tree' is provided with a prefix, expand the index only if the prefix is equivalent to a sparse directory or contained within one. If the index is not expanded in these cases, 'ce_in_traverse_path' will indicate that the relevant sparse directory is not in the prefix/traverse path, skipping past it and not unpacking the appropriate tree(s). If the prefix is in-cone, its sparse subdirectories (if any) will be traversed correctly without index expansion. The behavior of 'git read-tree' with prefixes 1) inside of cone, 2) equal to a sparse directory, and 3) inside a sparse directory are all tested as part of the 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --prefix', ensuring that the sparse index case works the way it did prior to this change as well as matching non-sparse index sparse-checkout. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 12:36:01 -08:00
Elijah Newren	26b5d6b0e5	unpack-trees: fix accidental loss of user changes For sparse-checkouts, we don't want unpack-trees to error out on files that are missing from the worktree, so there has traditionally been logic to make it skip the verify_uptodate() check for these. Unfortunately, it was skipping the verify_uptodate() check for files that were expected to become SKIP_WORKTREE. For files that were not already SKIP_WORKTREE, that can cause us to later delete the file in apply_sparse_checkout(). Only skip the check for files that were already SKIP_WORKTREE as well to avoid lightly discarding important changes users may have made to files. Note 1: unpack-trees.c is already a bit complex, and the logic around CE_SKIP_WORKTREE and CE_NEW_SKIP_WORKTREE in that file are no exception. I also tried just replacing CE_NEW_SKIP_WORKTREE with CE_SKIP_WORKTREE in the verify_uptodate() check instead of checking for both flags, and found that it also fixed this bug and passed all the tests. I also attempted to devise a few testcases that might trip either variant of my fix and was unable to find any problems. It may be that just checking CE_SKIP_WORKTREE is a better fix, but I'm not sure. I thought it was a bit safer to strictly reduce the number of cases where we skip the up-to-date check rather than just toggling which kind of cases skip it, and thus went with the current variant of the fix. Note 2: I also wondered if verify_absent() might have a similar bug, but despite my attempts to try to devise a testcase that would trigger such a thing, I couldn't find any problematic testcases. Thus, this patch makes no attempt to apply similar changes to verify_absent() and verify_absent_if_directory(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-14 14:42:20 -08:00
Junio C Hamano	da81d473fc	Merge branch 'en/keep-cwd' Many git commands that deal with working tree files try to remove a directory that becomes empty (i.e. "git switch" from a branch that has the directory to another branch that does not would attempt remove all files in the directory and the directory itself). This drops users into an unfamiliar situation if the command was run in a subdirectory that becomes subject to removal due to the command. The commands have been taught to keep an empty directory if it is the directory they were started in to avoid surprising users. * en/keep-cwd: t2501: simplify the tests since we can now assume desired behavior dir: new flag to remove_dir_recurse() to spare the original_cwd dir: avoid incidentally removing the original_cwd in remove_path() stash: do not attempt to remove startup_info->original_cwd rebase: do not attempt to remove startup_info->original_cwd clean: do not attempt to remove startup_info->original_cwd symlinks: do not include startup_info->original_cwd in dir removal unpack-trees: add special cwd handling unpack-trees: refuse to remove startup_info->original_cwd setup: introduce startup_info->original_cwd t2501: add various tests for removing the current working directory	2022-01-05 14:01:28 -08:00
Junio C Hamano	e1d9288267	Merge branch 'ds/sparse-deep-pattern-checkout-fix' The sparse-index/sparse-checkout feature had a bug in its use of the matching code to determine which path is in or outside the sparse checkout patterns. * ds/sparse-deep-pattern-checkout-fix: unpack-trees: use traverse_path instead of name t1092: add deeper changes during a checkout	2021-12-15 09:39:50 -08:00
Junio C Hamano	f0850875fd	Merge branch 'vd/sparse-reset' Various operating modes of "git reset" have been made to work better with the sparse index. * vd/sparse-reset: unpack-trees: improve performance of next_cache_entry reset: make --mixed sparse-aware reset: make sparse-aware (except --mixed) reset: integrate with sparse index reset: expand test coverage for sparse checkouts sparse-index: update command for expand/collapse test reset: preserve skip-worktree bit in mixed reset reset: rename is_missing to !is_in_reset_tree	2021-12-10 14:35:12 -08:00
Elijah Newren	0b0ee3388c	unpack-trees: add special cwd handling When running commands such as `git reset --hard` from a subdirectory, if that subdirectory is in the way of adding needed files, bail with an error message. Note that this change looks kind of like it duplicates the new lines of code from the previous commit in verify_clean_subdirectory(). However, when we are preserving untracked files, we would rather any error messages about untracked files being in the way take precedence over error messages about a subdirectory that happens to be the_original_cwd being in the way. But in the UNPACK_RESET_OVERWRITE_UNTRACKED case, there is no untracked checking to be done, so we simply add a special case near the top of verify_absent_1. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-09 13:33:12 -08:00
Elijah Newren	b817e54533	unpack-trees: refuse to remove startup_info->original_cwd In the past, when a directory needs to be removed to make room for a file, we have always errored out when that directory contains any untracked (but not ignored) files. Add an extra condition on that: also error out if the directory is the current working directory we inherited from our parent process. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-09 13:33:12 -08:00
Derrick Stolee	8c5de0d265	unpack-trees: use traverse_path instead of name The sparse_dir_matches_path() method compares a cache entry that is a sparse directory entry against a 'struct traverse_info info' and a 'struct name_entry p' to see if the cache entry has exactly the right name for those other inputs. This method was introduced in `523506d` (unpack-trees: unpack sparse directory entries, 2021-07-14), but included a significant mistake. The path comparisons used 'info->name' instead of 'info->traverse_path'. Since 'info->name' only stores a single tree entry name while 'info->traverse_path' stores the full path from root, this method does not work when 'info' is in a subdirectory of a directory. Replacing the right strings and their corresponding lengths make the method work properly. The previous change included a failing test that exposes this issue. That test now passes. The critical detail is that as we go deep into unpack_trees(), the logic for merging a sparse directory entry with a tree entry during 'git checkout' relies on this sparse_dir_matches_path() in order to avoid calling traverse_trees_recursive() during unpack_callback() in this hunk: if (!is_sparse_directory_entry(src[0], names, info) && traverse_trees_recursive(n, dirmask, mask & ~dirmask, names, info) < 0) { return -1; } For deep paths, the short-circuit never occurred and traverse_trees_recursive() was being called incorrectly and that was causing other strange issues. Specifically, the error message from the now-passing test previously included this: error: Your local changes to the following files would be overwritten by checkout: deep/deeper1/deepest2/a deep/deeper1/deepest3/a Please commit your changes or stash them before you switch branches. Aborting These messages occurred because the 'current' cache entry in twoway_merge() was showing as NULL because the index did not contain entries for the paths contained within the sparse directory entries. We instead had 'oldtree' given as the entry at HEAD and 'newtree' as the entry in the target tree. This led to reject_merge() listing these paths. Now that sparse_dir_matches_path() works the same for deep paths as it does for shallow depths, the rest of the logic kicks in to properly handle modifying the sparse directory entries as designed. Reported-by: Gustave Granroth <gus.gran@gmail.com> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-06 09:24:54 -08:00
Victoria Dye	f2a454e0a5	unpack-trees: improve performance of next_cache_entry To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see `17a1bb570b` (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-29 12:51:26 -08:00
Junio C Hamano	bfa646c2cb	Merge branch 'ab/unpack-trees-leakfix' Leakfix. * ab/unpack-trees-leakfix: sequencer: fix a memory leak in do_reset() sequencer: add a "goto cleanup" to do_reset() unpack-trees: don't leak memory in verify_clean_subdirectory()	2021-10-25 16:06:56 -07:00
Junio C Hamano	a7c2daa06d	Merge branch 'en/removing-untracked-fixes' Various fixes in code paths that move untracked files away to make room. * en/removing-untracked-fixes: Documentation: call out commands that nuke untracked files/directories Comment important codepaths regarding nuking untracked files/dirs unpack-trees: avoid nuking untracked dir in way of locally deleted file unpack-trees: avoid nuking untracked dir in way of unmerged file Change unpack_trees' 'reset' flag into an enum Remove ignored files by default when they are in the way unpack-trees: make dir an internal-only struct unpack-trees: introduce preserve_ignored to unpack_trees_options read-tree, merge-recursive: overwrite ignored files by default checkout, read-tree: fix leak of unpack_trees_options.dir t2500: add various tests for nuking untracked files	2021-10-13 15:15:57 -07:00
Ævar Arnfjörð Bjarmason	e5a917fcf4	unpack-trees: don't leak memory in verify_clean_subdirectory() Fix two different but related memory leaks in verify_clean_subdirectory(). We leaked both the "pathbuf" if read_directory() returned non-zero, and we never cleaned up our own "struct dir_struct" either. * "pathbuf": When the read_directory() call followed by the free(pathbuf) was added in `c81935348b` (Fix switching to a branch with D/F when current branch has file D., 2007-03-15) we didn't bother to free() before we called die(). But when this code was later libified in `203a2fe117` (Allow callers of unpack_trees() to handle failure, 2008-02-07) we started to leak as we returned data to the caller. This fixes that memory leak, which can be observed under SANITIZE=leak with e.g. the "t1001-read-tree-m-2way.sh" test. * "struct dir_struct": We've leaked the dir_struct ever since this code was added back in `c81935348b`. When that commit was written there wasn't an equivalent of dir_clear(). Since it was added in `270be81604` (dir.c: provide clear_directory() for reclaiming dir_struct memory, 2013-01-06) we've omitted freeing the memory allocated here. This memory leak could also be observed under SANITIZE=leak and the "t1001-read-tree-m-2way.sh" test. This makes all the test in "t1001-read-tree-m-2way.sh" pass under "GIT_TEST_PASSING_SANITIZE_LEAK=true", we'd previously die in tests 25, 26 & 28. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-07 15:29:02 -07:00
Elijah Newren	56d06fe4aa	unpack-trees: avoid nuking untracked dir in way of locally deleted file Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 13:38:37 -07:00
Elijah Newren	1fdd51aa13	unpack-trees: avoid nuking untracked dir in way of unmerged file Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 13:38:37 -07:00
Elijah Newren	480d3d6bf9	Change unpack_trees' 'reset' flag into an enum Traditionally, unpack_trees_options->reset was used to signal that it was okay to delete any untracked files in the way. This was used by `git read-tree --reset`, but then started appearing in other places as well. However, many of the other uses should not be deleting untracked files in the way. Change this value to an enum so that a value of 1 (i.e. "true") can be split into two: UNPACK_RESET_PROTECT_UNTRACKED, UNPACK_RESET_OVERWRITE_UNTRACKED In order to catch accidental misuses (i.e. where folks call it the way they traditionally used to), define the special enum value of UNPACK_RESET_INVALID = 1 which will trigger a BUG(). Modify existing callers so that read-tree --reset reset --hard checkout --force continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other callers, including am checkout without --force stash (though currently dead code; reset always had a value of 0) numerous callers from rebase/sequencer to reset_head() will use the new UNPACK_RESET_PROTECT_UNTRACKED value. Also, note that it has been reported that 'git checkout <treeish> <pathspec>' currently also allows overwriting untracked files[1]. That case should also be fixed, but it does not use unpack_trees() and thus is outside the scope of the current changes. [1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 13:38:37 -07:00
Elijah Newren	c42e0b6409	unpack-trees: make dir an internal-only struct Avoid accidental misuse or confusion over ownership by clearly making unpack_trees_options.dir an internal-only variable. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 13:38:37 -07:00
Elijah Newren	04988c8d18	unpack-trees: introduce preserve_ignored to unpack_trees_options Currently, every caller of unpack_trees() that wants to ensure ignored files are overwritten by default needs to: * allocate unpack_trees_options.dir * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags * call setup_standard_excludes AND then after the call to unpack_trees() needs to * call dir_clear() * deallocate unpack_trees_options.dir That's a fair amount of boilerplate, and every caller uses identical code. Make this easier by instead introducing a new boolean value where the default value (0) does what we want so that new callers of unpack_trees() automatically get the appropriate behavior. And move all the handling of unpack_trees_options.dir into unpack_trees() itself. While preserve_ignored = 0 is the behavior we feel is the appropriate default, we defer fixing commands to use the appropriate default until a later commit. So, this commit introduces several locations where we manually set preserve_ignored=1. This makes it clear where code paths were previously preserving ignored files when they should not have been; a future commit will flip these to instead use a value of 0 to get the behavior we want. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 13:38:37 -07:00
Junio C Hamano	dc89c34d9e	Merge branch 'ds/sparse-index-ignored-files' In cone mode, the sparse-index code path learned to remove ignored files (like build artifacts) outside the sparse cone, allowing the entire directory outside the sparse cone to be removed, which is especially useful when the sparse patterns change. * ds/sparse-index-ignored-files: sparse-checkout: clear tracked sparse dirs sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag attr: be careful about sparse directories sparse-checkout: create helper methods sparse-index: use WRITE_TREE_MISSING_OK sparse-index: silently return when cache tree fails unpack-trees: fix nested sparse-dir search sparse-index: silently return when not using cone-mode patterns t7519: rewrite sparse index test	2021-09-20 15:20:44 -07:00
Junio C Hamano	fd0d7036e0	Merge branch 'ab/retire-advice-config' Code clean up to migrate callers from older advice_config[] based API to newer advice_if_enabled() and advice_enabled() API. * ab/retire-advice-config: advice: move advice.graftFileDeprecated squashing to commit.[ch] advice: remove use of global advice_add_embedded_repo advice: remove read uses of most global `advice_` variables advice: add enum variants for missing advice variables	2021-09-10 11:46:29 -07:00
Derrick Stolee	72d84ea347	unpack-trees: fix nested sparse-dir search The iterated search in find_cache_entry() was recently modified to include a loop that searches backwards for a sparse directory entry that matches the given traverse_info and name_entry. However, the string comparison failed to actually concatenate those two strings, so this failed to find a sparse directory when it was not a top-level directory. This caused some errors in rare cases where a 'git checkout' spanned a diff that modified files within the sparse directory entry, but we could not correctly find the entry. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:09 -07:00
Matheus Tavares	7a132c628e	checkout: make delayed checkout respect --quiet and --no-progress The 'Filtering contents...' progress report from delayed checkout is displayed even when checkout and clone are invoked with --quiet or --no-progress. Furthermore, it is displayed unconditionally, without first checking whether stdout is a tty. Let's fix these issues and also add some regression tests for the two code paths that currently use delayed checkout: unpack_trees.c:check_updates() and builtin/checkout.c:checkout_worktree(). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-26 23:15:33 -07:00
Ben Boeckel	ed9bff0817	advice: remove read uses of most global `advice_` variables In `c4a09cc9cc` (Merge branch 'hw/advise-ng', 2020-03-25), a new API for accessing advice variables was introduced and deprecated `advice_config` in favor of a new array, `advice_setting`. This patch ports all but two uses which read the status of the global `advice_` variables over to the new `advice_enabled` API. We'll deal with advice_add_embedded_repo and advice_graft_file_deprecated separately. Signed-off-by: Ben Boeckel <mathstuf@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-25 12:07:52 -07:00
Junio C Hamano	506d2a354a	Merge branch 'ds/commit-and-checkout-with-sparse-index' "git checkout" and "git commit" learn to work without unnecessarily expanding sparse indexes. * ds/commit-and-checkout-with-sparse-index: unpack-trees: resolve sparse-directory/file conflicts t1092: document bad 'git checkout' behavior checkout: stop expanding sparse indexes sparse-index: recompute cache-tree commit: integrate with sparse-index p2000: compress repo names p2000: add 'git checkout -' test and decrease depth	2021-08-04 13:28:53 -07:00
Junio C Hamano	8230107f33	Merge branch 'jt/bulk-prefetch' "git read-tree" had a codepath where blobs are fetched one-by-one from the promisor remote, which has been corrected to fetch in bulk. * jt/bulk-prefetch: cache-tree: prefetch in partial clone read-tree unpack-trees: refactor prefetching code	2021-08-02 14:06:42 -07:00
Jonathan Tan	b2896d2739	unpack-trees: refactor prefetching code Refactor the prefetching code in unpack-trees.c into its own function, because it will be used elsewhere in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-23 14:21:57 -07:00
Derrick Stolee	e05cdb17e8	unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-20 14:59:11 -07:00
Derrick Stolee	523506df51	unpack-trees: unpack sparse directory entries During unpack_callback(), index entries are compared against tree entries. These are matched according to names and types. One goal is to decide if we should recurse into subtrees or simply operate on one index entry. In the case of a sparse-directory entry, we do not want to recurse into that subtree and instead simply compare the trees. In some cases, we might want to perform a merge operation on the entry, such as during 'git checkout <commit>' which wants to replace a sparse tree entry with the tree for that path at the target commit. We extend the logic within unpack_single_entry() to create a sparse-directory entry in this case, and then that is sent to call_unpack_fn(). There are some subtleties in this process. For instance, we need to update find_cache_entry() to allow finding a sparse-directory entry that exactly matches a given path. Use the new helper method sparse_dir_matches_path() for this. We also need to ignore conflict markers in the case that the entries correspond to directories and we already have a sparse directory entry. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	bd6a3fd7f1	unpack-trees: rename unpack_nondirectories() In the next change, we will use this method to unpack a sparse directory entry, so change the name to unpack_single_entry() so these entries apply. The new name reflects that we will not recurse into trees in order to resolve the conflicts. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	cd807a5cda	unpack-trees: compare sparse directories correctly As we further integrate the sparse-index into unpack-trees, we need to ensure that we compare sparse directory entries correctly with other entries. This affects searching for an exact path as well as sorting index entries. Sparse directory entries contain the trailing directory separator. This is important for the sorting, in particular. Thus, within do_compare_entry() we stop using S_IFREG in all cases, since sparse directories should use S_IFDIR to indicate that the comparison should treat the entry name as a dirctory. Within compare_entry(), it first calls do_compare_entry() to check the leading portion of the name. When the input path is a directory name, we could match exactly already. Thus, we should return 0 if we have an exact string match on a sparse directory entry. The final check is a length comparison between the strings. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:48 -07:00
Derrick Stolee	17a1bb570b	unpack-trees: preserve cache_bottom The cache_bottom member of 'struct unpack_trees_options' is used to track the range of index entries corresponding to a node of the cache tree. While recursing with traverse_by_cache_tree(), this value is preserved on the call stack using a local and then restored as that method returns. The mark_ce_used() method normally modifies the cache_bottom member when it refers to the marked cache entry. However, sparse directory entries are stored as nodes in the cache-tree data structure as of `2de37c53` (cache-tree: integrate with sparse directory entries, 2021-03-30). Thus, the cache_bottom will be modified as the cache-tree walk advances. Do not update it as well within mark_ce_used(). Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:48 -07:00
Junio C Hamano	a737e1f1d2	Merge branch 'mt/parallel-checkout-part-3' The final part of "parallel checkout". * mt/parallel-checkout-part-3: ci: run test round with parallel-checkout enabled parallel-checkout: add tests related to .gitattributes t0028: extract encoding helpers to lib-encoding.sh parallel-checkout: add tests related to path collisions parallel-checkout: add tests for basic operations checkout-index: add parallel checkout support builtin/checkout.c: complete parallel checkout support make_transient_cache_entry(): optionally alloc from mem_pool	2021-05-16 21:05:23 +09:00
Matheus Tavares	9616882780	make_transient_cache_entry(): optionally alloc from mem_pool Allow make_transient_cache_entry() to optionally receive a mem_pool struct in which it should allocate the entry. This will be used in the following patch, to store some transient entries which should persist until parallel checkout finishes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-05 12:25:25 +09:00
Junio C Hamano	a1cac26cc6	Merge branch 'mt/parallel-checkout-part-2' The checkout machinery has been taught to perform the actual write-out of the files in parallel when able. * mt/parallel-checkout-part-2: parallel-checkout: add design documentation parallel-checkout: support progress displaying parallel-checkout: add configuration options parallel-checkout: make it truly parallel unpack-trees: add basic support for parallel checkout	2021-04-30 13:50:26 +09:00
Junio C Hamano	8e97852919	Merge branch 'ds/sparse-index-protections' Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...	2021-04-30 13:50:26 +09:00
Matheus Tavares	1c4d6f46be	parallel-checkout: support progress displaying Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-19 11:57:05 -07:00
Matheus Tavares	7531e4b66e	parallel-checkout: add configuration options Make parallel checkout configurable by introducing two new settings: checkout.workers and checkout.thresholdForParallelism. The first defines the number of workers (where one means sequential checkout), and the second defines the minimum number of entries to attempt parallel checkout. To decide the default value for checkout.workers, the parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The four tables below show the mean run times and standard deviations for 5 runs in: a local file system on SSD, a local file system on HDD, a Linux NFS server, and Amazon EFS (all on Linux). Each parallel checkout test was executed with the number of workers that brings the best overall results in that environment. Local SSD: Sequential 10 workers Speedup Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03 Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03 Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03 Local HDD: Sequential 10 workers Speedup Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03 Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18 Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07 Linux NFS server (v4.1, on EBS, single availability zone): Sequential 32 workers Speedup Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13 Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07 Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08 EFS (v4.1, replicated over multiple availability zones): Sequential 32 workers Speedup Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07 Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03 Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03 The above benchmarks show that parallel checkout is most effective on repositories located on an SSD or over a distributed file system. For local file systems on spinning disks, and/or older machines, the parallelism does not always bring a good performance. For this reason, the default value for checkout.workers is one, a.k.a. sequential checkout. To decide the default value for checkout.thresholdForParallelism, another benchmark was executed in the "Local SSD" setup, where parallel checkout showed to be beneficial. This time, we compared the runtime of a `git checkout -f`, with and without parallelism, after randomly removing an increasing number of files from the Linux working tree. The "sequential fallback" column below corresponds to the executions where checkout.workers was 10 but checkout.thresholdForParallelism was equal to the number of to-be-updated files plus one (so that we end up writing sequentially). Each test case was sampled 15 times, and each sample had a randomly different set of files removed. Here are the results: sequential fallback 10 workers speedup 10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02 20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02 50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02 100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04 200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05 500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09 1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12 2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12 From the above numbers, 100 files seems to be a reasonable default value for the threshold setting. Note: Up to 1000 files, we observe a drop in the execution time of the parallel code with an increase in the number of files. This is a rather odd behavior, but it was observed in multiple repetitions. Above 1000 files, the execution time increases according to the number of files, as one would expect. About the test environments: Local SSD tests were executed on an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing, the linux repository was removed (or checked out back to its previous state), and `sync && sysctl vm.drop_caches=3` was executed. Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-19 11:57:05 -07:00
Matheus Tavares	04155bdad8	unpack-trees: add basic support for parallel checkout This new interface allows us to enqueue some of the entries being checked out to later uncompress them, apply in-process filters, and write out the files in parallel. For now, the parallel checkout machinery is enabled by default and there is no user configuration, but run_parallel_checkout() just writes the queued entries in sequence (without spawning additional workers). The next patch will actually implement the parallelism and, later, we will make it configurable. Note that, to avoid potential data races, not all entries are eligible for parallel checkout. Also, paths that collide on disk (e.g. case-sensitive paths in case-insensitive file systems), are detected by the parallel checkout code and skipped, so that they can be safely sequentially handled later. The collision detection works like the following: - If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework detects it by looking for EEXIST and EISDIR errors after an open(O_CREAT \| O_EXCL) failure. - If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected at the has_dirs_only_path() check, which is done for the leading path of each item in the parallel checkout queue. Both verifications rely on the fact that, before enqueueing an entry for parallel checkout, checkout_entry() makes sure that there is no file at the entry's path and that its leading components are all real directories. So, any later change in these conditions indicates that there was a collision (either between two parallel-eligible entries or between an eligible and an ineligible one). After all parallel-eligible entries have been processed, the collided (and thus, skipped) entries are sequentially fed to checkout_entry() again. This is similar to the way the current code deals with collisions, overwriting the previously checked out entries with the subsequent ones. The only difference is that, since we no longer create the files in the same order that they appear on index, we are not able to determine which of the colliding entries will survive on disk (for the classic code, it is always the last entry). Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-19 11:57:05 -07:00
Junio C Hamano	c47679d040	Merge branch 'mt/parallel-checkout-part-1' Preparatory API changes for parallel checkout. * mt/parallel-checkout-part-1: entry: add checkout_entry_ca() taking preloaded conv_attrs entry: move conv_attrs lookup up to checkout_entry() entry: extract update_ce_after_write() from write_entry() entry: make fstat_output() and read_blob_entry() public entry: extract a header file for entry.c functions convert: add classification for conv_attrs struct convert: add get_stream_filter_ca() variant convert: add [async_]convert_to_working_tree_ca() variants convert: make convert_attrs() and convert structs public	2021-04-02 14:43:14 -07:00
Junio C Hamano	9210c68d2a	Merge branch 'mt/checkout-remove-nofollow' When "git checkout" removes a path that does not exist in the commit it is checking out, it wasn't careful enough not to follow symbolic links, which has been corrected. * mt/checkout-remove-nofollow: checkout: don't follow symlinks when removing entries symlinks: update comment on threaded_check_leading_path()	2021-03-30 14:35:36 -07:00
Derrick Stolee	13e1331247	unpack-trees: allow sparse directories The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:47 -07:00
Derrick Stolee	6863df3550	unpack-trees: ensure full index The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:46 -07:00
Matheus Tavares	d052cc0382	entry: extract a header file for entry.c functions The declarations of entry.c's public functions and structures currently reside in cache.h. Although not many, they contribute to the size of cache.h and, when changed, cause the unnecessary recompilation of modules that don't really use these functions. So let's move them to a new entry.h header. While at it let's also move a comment related to checkout_entry() from entry.c to entry.h as it's more useful to describe the function there. Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-23 10:34:05 -07:00
Junio C Hamano	f5c73f69fd	Merge branch 'dl/stash-show-untracked' "git stash show" learned to optionally show untracked part of the stash. * dl/stash-show-untracked: stash show: learn stash.showIncludeUntracked stash show: teach --include-untracked and --only-untracked	2021-03-22 14:00:24 -07:00
Junio C Hamano	1dd4e74522	Merge branch 'js/fsmonitor-unpack-fix' The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases	2021-03-19 15:25:37 -07:00

1 2 3 4 5 ...

542 Коммитов