Граф коммитов

558 Коммитов

Автор SHA1 Сообщение Дата
Junio C Hamano bd72a08d6c Merge branch 'ds/sparse-cone'
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.

* ds/sparse-cone: (21 commits)
  sparse-checkout: improve OS ls compatibility
  sparse-checkout: respect core.ignoreCase in cone mode
  sparse-checkout: check for dirty status
  sparse-checkout: update working directory in-process for 'init'
  sparse-checkout: cone mode should not interact with .gitignore
  sparse-checkout: write using lockfile
  sparse-checkout: use in-process update for disable subcommand
  sparse-checkout: update working directory in-process
  sparse-checkout: sanitize for nested folders
  unpack-trees: add progress to clear_ce_flags()
  unpack-trees: hash less in cone mode
  sparse-checkout: init and set in cone mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: add 'cone' mode
  trace2: add region in clear_ce_flags
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add '--stdin' option to set subcommand
  sparse-checkout: 'set' subcommand
  clone: add --sparse mode
  sparse-checkout: create 'init' subcommand
  ...
2019-12-25 11:21:58 -08:00
Junio C Hamano 6836d2fe06 dir.c: use st_add3() for allocation size
When preparing a manufactured dirent instance, we add a length of
path to the size of struct to decide how many bytes to allocate.
Make sure this addition does not wrap-around to cause us
underallocate.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-20 09:55:53 -08:00
Elijah Newren c847dfafee dir: consolidate similar code in treat_directory()
Both the DIR_SKIP_NESTED_GIT and DIR_NO_GITLINKS cases were checking for
whether a path was actually a nonbare repository.  That code could be
shared, with just the result of how to act differing between the two
cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-19 13:45:47 -08:00
Elijah Newren 777b420347 dir: synchronize treat_leading_path() and read_directory_recursive()
Our optimization to avoid calling into read_directory_recursive() when
all pathspecs have a common leading directory mean that we need to match
the logic that read_directory_recursive() would use if we had just
called it from the root.  Since it does more than call treat_path() we
need to copy that same logic.

Alternatively, we could try to change treat_path to return path_recurse
for an untracked directory under the given special circumstances that
this logic checks for, but a simple switch results in many test failures
such as 'git clean -d' not wiping out untracked but empty directories.
To work around that, we'd need the caller of treat_path to check for
path_recurse and sometimes special case it into path_untracked.  In
other words, we'd still have extra logic in both places.

Needing to duplicate logic like this means it is guaranteed someone will
eventually need to make further changes and forget to update both
locations.  It is tempting to just nuke the leading_directory special
casing to avoid such bugs and simplify the code, but unpack_trees'
verify_clean_subdirectory() also calls read_directory() and does so with
a non-empty leading path, so I'm hesitant to try to restructure further.
Add obnoxious warnings to treat_leading_path() and
read_directory_recursive() to try to warn people of such problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-19 13:45:47 -08:00
Elijah Newren b9670c1f5e dir: fix checks on common prefix directory
Many years ago, the directory traversing logic had an optimization that
would always recurse into any directory that was a common prefix of all
the pathspecs without walking the leading directories to get down to
the desired directory.  Thus,
   git ls-files -o .git/                        # case A
would notice that .git/ was a common prefix of all pathspecs (since
it is the only pathspec listed), and then traverse into it and start
showing unknown files under that directory.  Unfortunately, .git/ is not
a directory we should be traversing into, which made this optimization
problematic.  This also affected cases like
   git ls-files -o --exclude-standard t/        # case B
where t/ was in the .gitignore file and thus isn't interesting and
shouldn't be recursed into.  It also affected cases like
   git ls-files -o --directory untracked_dir/   # case C
where untracked_dir/ is indeed untracked and thus interesting, but the
--directory flag means we only want to show the directory itself, not
recurse into it and start listing untracked files below it.

The case B class of bugs were noted and fixed in commits 16e2cfa909
("read_directory(): further split treat_path()", 2010-01-08) and
48ffef966c ("ls-files: fix overeager pathspec optimization",
2010-01-08), with the idea being that we first wanted to check whether
the common prefix was interesting.  The former patch noted that
treat_path() couldn't be used when checking the common prefix because
treat_path() requires a dir_entry() and we haven't read any directories
at the point we are checking the common prefix.  So, that patch split
treat_one_path() out of treat_path().  The latter patch then created a
new treat_leading_path() which duplicated by hand the bits of
treat_path() that couldn't be broken out and then called
treat_one_path() for the remainder.  There were three problems with this
approach:

  * The duplicated logic in treat_leading_path() accidentally missed the
    check for special paths (such as is_dot_or_dotdot and matching
    ".git"), causing case A types of bugs to continue to be an issue.
  * The treat_leading_path() logic assumed we should traverse into
    anything where path_treatment was not path_none, i.e. it perpetuated
    class C types of bugs.
  * It meant we had split logic that needed to kept in sync, running the
    risk that people introduced new inconsistencies (such as in commit
    be8a84c526, which we reverted earlier in this series, or in commit
    df5bcdf83a which we'll fix in a subsequent commit)

Fix most these problems by making treat_leading_path() not only loop
over each leading path component, but calling treat_path() directly on
each.  To do so, we have to create a synthetic dir_entry, but that only
takes a few lines.  Then, pay attention to the path_treatment result we
get from treat_path() and don't treat path_excluded, path_untracked, and
path_recurse all the same as path_recurse.

This leaves one remaining problem, the new inconsistency from commit
df5bcdf83a.  That will be addressed in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-19 13:45:47 -08:00
Junio C Hamano 26c816a67d Merge branch 'hw/doc-in-header'
* hw/doc-in-header:
  trace2: move doc to trace2.h
  submodule-config: move doc to submodule-config.h
  tree-walk: move doc to tree-walk.h
  trace: move doc to trace.h
  run-command: move doc to run-command.h
  parse-options: add link to doc file in parse-options.h
  credential: move doc to credential.h
  argv-array: move doc to argv-array.h
  cache: move doc to cache.h
  sigchain: move doc to sigchain.h
  pathspec: move doc to pathspec.h
  revision: move doc to revision.h
  attr: move doc to attr.h
  refs: move doc to refs.h
  remote: move doc to remote.h and refspec.h
  sha1-array: move doc to sha1-array.h
  merge: move doc to ll-merge.h
  graph: move doc to graph.h and graph.c
  dir: move doc to dir.h
  diff: move doc to diff.h and diffcore.h
2019-12-16 13:08:39 -08:00
Derrick Stolee 190a65f9db sparse-checkout: respect core.ignoreCase in cone mode
When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.

Perform case-insensitive checks while updating the skip-worktree
bits during unpack_trees(). This is done by changing the hash
algorithm and hashmap comparison methods to optionally use case-
insensitive methods.

When this is enabled, there is a small performance cost in the
hashing algorithm. To tease out the worst possible case, the
following was run on a repo with a deep directory structure:

	git ls-tree -d -r --name-only HEAD |
		git sparse-checkout set --stdin

The 'set' command was timed with core.ignoreCase disabled or
enabled. For the repo with a deep history, the numbers were

	core.ignoreCase=false: 62s
	core.ignoreCase=true:  74s (+19.3%)

For reproducibility, the equivalent test on the Linux kernel
repository had these numbers:

	core.ignoreCase=false: 3.1s
	core.ignoreCase=true:  3.6s (+16%)

Now, this is not an entirely fair comparison, as most users
will define their sparse cone using more shallow directories,
and the performance improvement from eb42feca97 ("unpack-trees:
hash less in cone mode" 2019-11-21) can remove most of the
hash cost. For a more realistic test, drop the "-r" from the
ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in
each case, and the deep repository takes one second, plus or
minus 0.05s, in each case.

Thus, we _can_ demonstrate a cost to this change, but it is
unlikely to matter to any reasonable sparse-checkout cone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-13 12:01:02 -08:00
Elijah Newren c5c4eddd56 dir: break part of read_directory_recursive() out for reuse
Create an add_path_to_appropriate_result_list() function from the code
at the end of read_directory_recursive() so we can use it elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11 12:23:24 -08:00
Elijah Newren 072a231016 dir: exit before wildcard fall-through if there is no wildcard
The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a
wildcard, noting that we don't yet have enough information to determine
if a further paths under the current directory might match due to the
presence of wildcards.  But if we have no wildcards in our pathspec,
then we shouldn't get to that fall-through case.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11 12:23:23 -08:00
Elijah Newren 2f5d3847d4 dir: remove stray quote character in comment
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11 12:23:23 -08:00
Elijah Newren a2b13367fe Revert "dir.c: make 'git-status --ignored' work within leading directories"
Commit be8a84c526 ("dir.c: make 'git-status --ignored' work within
leading directories", 2013-04-15) noted that
   git status --ignored <SOMEPATH>
would not list ignored files and directories within <SOMEPATH> if
<SOMEPATH> was untracked, and modified the behavior to make it show
them.  However, it did so via a hack that broke consistency; it would
show paths under <SOMEPATH> differently than a simple
   git status --ignored | grep <SOMEPATH>
would show them.  A correct fix is slightly more involved, and
complicated slightly by this hack, so we revert this commit (but keep
corrected versions of the testcases) and will later fix the original
bug with a subsequent patch.

Some history may be helpful:

A very, very similar case to the commit we are reverting was raised in
commit 48ffef966c ("ls-files: fix overeager pathspec optimization",
2010-01-08); but it actually went in somewhat the opposite direction.  In
that commit, it mentioned how
   git ls-files -o --exclude-standard t/
used to show untracked files under t/ even when t/ was ignored, and then
changed the behavior to stop showing untracked files under an ignored
directory.  More importantly, this commit considered keeping this
behavior but noted that it would be inconsistent with the behavior when
multiple pathspecs were specified and thus rejected it.

The reason for this whole inconsistency when one pathspec is specified
versus zero or two is because common prefixes of pathspecs are sent
through a different set of checks (in treat_leading_path()) than normal
file/directory traversal (those go through read_directory_recursive()
and treat_path()).  As such, for consistency, one needs to check that
both codepaths produce the same result.

Revert commit be8a84c526, except instead
of removing the testcase it added, modify it to check for correct and
consistent behavior.  A subsequent patch in this series will fix the
testcase.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11 12:23:23 -08:00
Derrick Stolee eb42feca97 unpack-trees: hash less in cone mode
The sparse-checkout feature in "cone mode" can use the fact that
the recursive patterns are "connected" to the root via parent
patterns to decide if a directory is entirely contained in the
sparse-checkout or entirely removed.

In these cases, we can skip hashing the paths within those
directories and simply set the skipworktree bit to the correct
value.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee af09ce24a9 sparse-checkout: init and set in cone mode
To make the cone pattern set easy to use, update the behavior of
'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config
option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only
needs to supply a list of recursive folder matches. Git will
automatically add the necessary parent matches for the leading
directories.

When testing 'git sparse-checkout set' in cone mode, check the
error stream to ensure we do not see any errors. Specifically,
we want to avoid the warning that the patterns do not match
the cone-mode patterns.

Helped-by: Eric Wong <e@80x24.org>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee 96cc8ab531 sparse-checkout: use hashmaps for cone patterns
The parent and recursive patterns allowed by the "cone mode"
option in sparse-checkout are restrictive enough that we
can avoid using the regex parsing. Everything is based on
prefix matches, so we can use hashsets to store the prefixes
from the sparse-checkout file. When checking a path, we can
strip path entries from the path and check the hashset for
an exact match.

As a test, I created a cone-mode sparse-checkout file for the
Linux repository that actually includes every file. This was
constructed by taking every folder in the Linux repo and creating
the pattern pairs here:

	/$folder/
	!/$folder/*/

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following
performance:

    core.sparseCheckout=false: 0.21 s (0.00 s)
     core.sparseCheckout=true: 3.75 s (3.50 s)
 core.sparseCheckoutCone=true: 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent
in the first clear_ce_flags() call, according to the trace2
performance traces.

While this example is contrived, it demonstrates how these
patterns can slow the sparse-checkout feature.

Helped-by: Eric Wong <e@80x24.org>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Heba Waly 266f03eccd dir: move doc to dir.h
Move the documentation from Documentation/technical/api-directory-listing.txt
to dir.h as it's easier for the developers to find the usage information
beside the code instead of looking for it in another doc file.

Also documentation/technical/api-directory-listing.txt is removed because
the information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header files.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-18 15:21:28 +09:00
Elijah Newren 15beaaa3d1 Fix spelling errors in code comments
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10 16:00:54 +09:00
Junio C Hamano aafb75452b Merge branch 'en/clean-nested-with-ignored'
"git clean" fixes.

* en/clean-nested-with-ignored:
  dir: special case check for the possibility that pathspec is NULL
  clean: fix theoretical path corruption
  clean: rewrap overly long line
  clean: avoid removing untracked files in a nested git repository
  clean: disambiguate the definition of -d
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  dir: add commentary explaining match_pathspec_item's return value
  dir: if our pathspec might match files under a dir, recurse into it
  dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
  dir: also check directories for matching pathspecs
  dir: fix off-by-one error in match_pathspec_item
  dir: fix typo in comment
  t7300: add testcases showing failure to clean specified pathspecs
2019-10-11 14:24:46 +09:00
Elijah Newren 69f272b922 dir: special case check for the possibility that pathspec is NULL
Commits 404ebceda0 ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf7 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
    series, and this series only tried to change behavior when there was
    a pathspec, thus we only want to go into the if-block if pathspec is
    non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
    other is for after we've recursed into it for whether we want to
    remove the subdirectory itself (i.e. the subdirectory didn't match
    but something under it could have).  That difference in situation
    leads to the slight differences in logic used (well, that and the
    slightly unusual fact that we don't want empty pathspecs to remove
    untracked directories by default).

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

Co-authored-by: Denton Liu <liu.denton@gmail.com>
Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-02 12:06:58 +09:00
Junio C Hamano 9755f70fe6 Merge branch 'ds/include-exclude'
The internal code originally invented for ".gitignore" processing
got reshuffled and renamed to make it less tied to "excluding" and
stress more that it is about "matching", as it has been reused for
things like sparse checkout specification that want to check if a
path is "included".

* ds/include-exclude:
  unpack-trees: rename 'is_excluded_from_list()'
  treewide: rename 'exclude' methods to 'pattern'
  treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
  treewide: rename 'struct exclude_list' to 'struct pattern_list'
  treewide: rename 'struct exclude' to 'struct path_pattern'
2019-09-30 13:19:32 +09:00
Elijah Newren 09487f2cba clean: avoid removing untracked files in a nested git repository
Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   >tracked
   git add tracked
   git commit -m init
   >untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   >nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:35 -07:00
Elijah Newren 29b577b960 dir: add commentary explaining match_pathspec_item's return value
The way match_pathspec_item() handles names and pathspecs with trailing
slash characters, in conjunction with special options like
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and
broken until this patch series.  Add a table in a comment explaining the
intent of how these work.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:35 -07:00
Elijah Newren 89a1f4aaf7 dir: if our pathspec might match files under a dir, recurse into it
For git clean, if a directory is entirely untracked and the user did not
specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
not want to remove that directory and thus do not recurse into it.
However, if the user manually specified specific (or even globbed) paths
somewhere under that directory to remove, then we need to recurse into
the directory to make sure we remove the relevant paths under that
directory as the user requested.

Note that this does not mean that the recursed-into directory will be
added to dir->entries for later removal; as of a few commits earlier in
this series, there is another more strict match check that is run after
returning from a recursed-into directory before deciding to add it to the
list of entries.  Therefore, this will only result in files underneath
the given directory which match one of the pathspecs being added to the
entries list.

Two notes of potential interest to future readers:

  * If we wanted to only recurse into a directory when it is specifically
    matched rather than matched-via-glob (e.g. '*.c'), then we could do
    so via making the final non-zero return in match_pathspec_item be
    MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    and MATCHED_RECURSIVELY are important for such a change.)  I was
    leaving open that possibility while writing an RFC asking for the
    behavior we want, but even though we don't want it, that knowledge
    might help you understand the code flow better.

  * There is a growing amount of logic in read_directory_recursive() for
    deciding whether to recurse into a subdirectory.  However, there is a
    comment immediately preceding this logic that says to recurse if
    instructed by treat_path().   It may be better for the logic in
    read_directory_recursive() to ultimately be moved to treat_path() (or
    another function it calls, such as treat_directory()), but I have
    left that for someone else to tackle in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:35 -07:00
Elijah Newren a3d89d8f76 dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE
case are useful for other cases which have nothing to do with submodules.
Rename this constant; a subsequent commit will make use of this change.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:35 -07:00
Elijah Newren 404ebceda0 dir: also check directories for matching pathspecs
Even if a directory doesn't match a pathspec, it is possible, depending
on the precise pathspecs, that some file underneath it might.  So we
special case and recurse into the directory for such situations.  However,
we previously always added any untracked directory that we recursed into
to the list of untracked paths, regardless of whether the directory
itself matched the pathspec.

For the case of git-clean and a set of pathspecs of "dir/file" and "more",
this caused a problem because we'd end up with dir entries for both of
  "dir"
  "dir/file"
Then correct_untracked_entries() would try to helpfully prune duplicates
for us by removing "dir/file" since it's under "dir", leaving us with
  "dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed.  (Note that if only one
pathspec was specified, e.g. only "dir/file", then the common_prefix_len
optimizations in fill_directory would cause us to bypass this problem,
making it appear in simple tests that we could correctly remove manually
specified pathspecs.)

Fix this by actually checking whether the directory we are about to add
to the list of dir entries actually matches the pathspec; only do this
matching check after we have already returned from recursing into the
directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:35 -07:00
Elijah Newren a5e916c745 dir: fix off-by-one error in match_pathspec_item
For a pathspec like 'foo/bar' comparing against a path named "foo/",
namelen will be 4, and match[namelen] will be 'b'.  The correct location
of the directory separator is namelen-1.

However, other callers of match_pathspec_item() such as builtin/grep.c's
submodule_path_match() will compare against a path named "foo" instead of
"foo/".  It might be better to change all the callers to be consistent,
as discussed at
   https://public-inbox.org/git/xmqq7e6cdnkr.fsf@gitster-ct.c.googlers.com/
and
   https://public-inbox.org/git/CABPp-BERWUPCPq-9fVW1LNocqkrfsoF4BPj3gJd9+En43vEkTQ@mail.gmail.com/
but there are many cases to audit, so for now just make sure we handle
both cases with and without a trailing slash.

The reason the code worked despite this sometimes-off-by-one error was
that the subsequent code immediately checked whether the first matchlen
characters matched (which they do) and then bailed and return
MATCHED_RECURSIVELY anyway since wildmatch doesn't have the ability to
check if "name" can be matched as a directory (or prefix) against the
pathspec.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:35 -07:00
Elijah Newren bbbb6b0b89 dir: fix typo in comment
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17 12:20:34 -07:00
Derrick Stolee 468ce99b77 unpack-trees: rename 'is_excluded_from_list()'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

Now that this library is renamed to use 'struct pattern_list'
and 'struct pattern', we can now rename the method used by
the sparse-checkout feature to determine which paths should
appear in the working directory.

The method is_excluded_from_list() is only used by the
sparse-checkout logic in unpack-trees and list-objects-filter.
The confusing part is that it returned 1 for "excluded" (i.e.
it matches the list of exclusions) but that really manes that
the path matched the list of patterns for _inclusion_ in the
working directory.

Rename the method to be path_matches_pattern_list() and have
it return an explicit 'enum pattern_match_result'. Here, the
values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree
with the previous integer values. This shift allows future
consumers to better understand what the retur values mean,
and provides more type checking for handling those values.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Derrick Stolee 65edd96aec treewide: rename 'exclude' methods to 'pattern'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':

 * last_exclude_matching() -> last_matching_pattern()
 * parse_exclude() -> parse_path_pattern()

In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:

 * add_exclude_list()
 * add_excludes_from_file_to_list()
 * add_excludes_from_file()
 * add_excludes_from_blob_to_list()
 * add_exclude()
 * clear_exclude_list()

A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Derrick Stolee 4ff89ee52c treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit replaces 'EXCL_FLAG_' to 'PATTERN_FLAG_' in the
names of the flags used on 'struct path_pattern'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Derrick Stolee caa3d55444 treewide: rename 'struct exclude_list' to 'struct pattern_list'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames 'struct exclude_list' to 'struct pattern_list'
and renames several variables called 'el' to 'pl'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:11 -07:00
Derrick Stolee ab8db61390 treewide: rename 'struct exclude' to 'struct path_pattern'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a list of 'struct exclude' items makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames 'struct exclude' to 'struct path_pattern'
and renames several variable names to match. 'struct pattern'
was already taken by attr.c, and this more completely describes
that the patterns are specific to file paths.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:11 -07:00
René Scharfe 568a05c5ec cleanup: fix possible overflow errors in binary search, part 2
Calculating the sum of two array indexes to find the midpoint between
them can overflow, i.e. code like this is unsafe for big arrays:

	mid = (first + last) >> 1;

Make sure the intermediate value stays within the boundaries instead,
like this:

	mid = first + ((last - first) >> 1);

The loop condition of the binary search makes sure that 'last' is
always greater than 'first', so this is safe as long as 'first' is
not negative.  And that can be verified easily using the pre-context
of each change, except for name-hash.c, so add an assertion to that
effect there.

The unsafe calculations were found with:

	git grep '(.*+.*) *>> *1'

This is a continuation of 19716b21a4 (cleanup: fix possible overflow
errors in binary search, 2017-10-08).

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 11:28:53 -07:00
Junio C Hamano 5b5def9a99 Merge branch 'jk/untracked-cache-more-fixes'
Code clean-up.

* jk/untracked-cache-more-fixes:
  untracked-cache: simplify parsing by dropping "len"
  untracked-cache: simplify parsing by dropping "next"
  untracked-cache: be defensive about missing NULs in index
2019-05-09 00:37:28 +09:00
Junio C Hamano 0b179f3175 Merge branch 'nd/sha1-name-c-wo-the-repository'
Further code clean-up to allow the lowest level of name-to-object
mapping layer to work with a passed-in repository other than the
default one.

* nd/sha1-name-c-wo-the-repository: (34 commits)
  sha1-name.c: remove the_repo from get_oid_mb()
  sha1-name.c: remove the_repo from other get_oid_*
  sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name
  submodule-config.c: use repo_get_oid for reading .gitmodules
  sha1-name.c: add repo_get_oid()
  sha1-name.c: remove the_repo from get_oid_with_context_1()
  sha1-name.c: remove the_repo from resolve_relative_path()
  sha1-name.c: remove the_repo from diagnose_invalid_index_path()
  sha1-name.c: remove the_repo from handle_one_ref()
  sha1-name.c: remove the_repo from get_oid_1()
  sha1-name.c: remove the_repo from get_oid_basic()
  sha1-name.c: remove the_repo from get_describe_name()
  sha1-name.c: remove the_repo from get_oid_oneline()
  sha1-name.c: add repo_interpret_branch_name()
  sha1-name.c: remove the_repo from interpret_branch_mark()
  sha1-name.c: remove the_repo from interpret_nth_prior_checkout()
  sha1-name.c: remove the_repo from get_short_oid()
  sha1-name.c: add repo_for_each_abbrev()
  sha1-name.c: store and use repo in struct disambiguate_state
  sha1-name.c: add repo_find_unique_abbrev_r()
  ...
2019-05-09 00:37:25 +09:00
Junio C Hamano 4ab701b2ee Merge branch 'km/empty-repo-is-still-a-repo'
Running "git add" on a repository created inside the current
repository is an explicit indication that the user wants to add it
as a submodule, but when the HEAD of the inner repository is on an
unborn branch, it cannot be added as a submodule.  Worse, the files
in its working tree can be added as if they are a part of the outer
repository, which is not what the user wants.  These problems are
being addressed.

* km/empty-repo-is-still-a-repo:
  add: error appropriately on repository with no commits
  dir: do not traverse repositories with no commits
  submodule: refuse to add repository with no commits
2019-05-09 00:37:23 +09:00
Junio C Hamano 0830eac14c Merge branch 'js/untracked-cache-allocfix'
An underallocation in the code to read the untracked cache
extension has been corrected.

* js/untracked-cache-allocfix:
  untracked cache: fix off-by-one
2019-04-25 16:41:22 +09:00
Junio C Hamano d4e568b2a3 Merge branch 'bc/hash-transition-16'
Conversion from unsigned char[20] to struct object_id continues.

* bc/hash-transition-16: (35 commits)
  gitweb: make hash size independent
  Git.pm: make hash size independent
  read-cache: read data in a hash-independent way
  dir: make untracked cache extension hash size independent
  builtin/difftool: use parse_oid_hex
  refspec: make hash size independent
  archive: convert struct archiver_args to object_id
  builtin/get-tar-commit-id: make hash size independent
  get-tar-commit-id: parse comment record
  hash: add a function to lookup hash algorithm by length
  remote-curl: make hash size independent
  http: replace sha1_to_hex
  http: compute hash of downloaded objects using the_hash_algo
  http: replace hard-coded constant with the_hash_algo
  http-walker: replace sha1_to_hex
  http-push: remove remaining uses of sha1_to_hex
  http-backend: allow 64-character hex names
  http-push: convert to use the_hash_algo
  builtin/pull: make hash-size independent
  builtin/am: make hash size independent
  ...
2019-04-25 16:41:17 +09:00
Jeff King 08bf354de7 untracked-cache: simplify parsing by dropping "len"
The code which parses untracked-cache extensions from disk keeps a "len"
variable, which is the size of the string we are parsing. But since we
now have an "end of string" variable, we can just use that to get the
length when we need it. This eliminates the need to keep "len" up to
date (and removes the possibility of any errors where "len" and "eos"
get out of sync).

As a bonus, it means we are not storing a string length in an "int",
which is a potential source of overflows (though in this case it seems
fairly unlikely for that to cause any memory problems).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-19 14:33:21 +09:00
Jeff King b511d6d569 untracked-cache: simplify parsing by dropping "next"
When we parse an on-disk untracked cache, we have two pointers, "data"
and "next". As we parse, we point "next" to the end of an element, and
then later update "data" to match.

But we actually don't need two pointers. Each parsing step can just
update "data" directly from other variables we hold (and we don't have
to worry about bailing in an intermediate state, since any parsing
failure causes us to immediately discard "data" and return).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-19 14:30:58 +09:00
Jeff King c6909f9959 untracked-cache: be defensive about missing NULs in index
The on-disk format for the untracked-cache extension contains
NUL-terminated filenames. We parse these from the mmap'd file using
string functions like strlen(). This works fine in the normal case, but
if we see a malformed or corrupted index, we might read off the end of
our mmap.

Instead, let's use memchr() to find the trailing NUL within the bytes we
know are available, and return an error if it's missing.

Note that we can further simplify by folding another range check into
our conditional. After we find the end of the string, we set "next" to
the byte after the string and treat it as an error if there are no such
bytes left. That saves us from having to do a range check at the
beginning of each subsequent string (and works because there is always
data after each string). We can do both range checks together by
checking "!eos" (we didn't find a NUL) and "eos == end" (it was on the
last available byte, meaning there's nothing after). This replaces the
existing "next > end" checks.

Note also that the decode_varint() calls have a similar problem (we
don't even pass them "end"; they just keep parsing). These are probably
OK in practice since varints have a finite length (we stop parsing when
we'd overflow a uintmax_t), so the worst case is that we'd overflow into
reading the trailing bytes of the index.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-19 14:27:07 +09:00
Nguyễn Thái Ngọc Duy 0488481e79 sha1-name.c: remove the_repo from diagnose_invalid_index_path()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16 18:56:53 +09:00
Johannes Schindelin 3a7b45a623 untracked cache: fix off-by-one
In f9e6c64958 (untracked cache: load from UNTR index extension,
2015-03-08), code was added to read back the untracked cache from an
index extension.

Probably in the endeavor to avoid the `calloc()` implied by
`FLEX_ALLOC_STR()` (it is hard to know why exactly, the commit message
of that commit is a bit parsimonious with information), it calls
`malloc()` manually and then `memcpy()`s the bits and pieces into place.

It allocates the size of `struct untracked_cache_dir` plus the string
length of the untracked file name, then copies the information in two
steps: first the fixed-size metadata, then the name. And here lies the
rub: it includes the trailing NUL byte in the name.

If `FLEX_ARRAY` is defined as 0, this results in a buffer overrun.

To fix this, let's just add 1, for the trailing NUL byte. Technically,
this overallocates on platforms where `FLEX_ARRAY` is 1, but it should
not matter much in reality, as `malloc()` usually overallocates anyway,
unless the size to allocate aligns exactly with some internal chunk size
(see below for more on that).

The real strange thing is that neither valgrind nor DrMemory catches
this bug. In this developer's tests, a `memcpy()` (but not a
`memset()`!) could write up to 4 bytes after the allocated memory range
before valgrind would start reporting an issue.

However, when running Git built with nedmalloc as allocator, under rare
conditions (and inconsistently at that), this bug triggered an `abort()`
because nedmalloc rounds up the size to be `malloc()`ed to a multiple of
a certain chunk size, then adds a few bytes to be used for storing some
internal state. If there is no rounding up to do (because the size is
already a multiple of that chunk size), and if the buffer is overrun as
in the code patched in this commit, the internal state is corrupted.

The scenario that triggered this here bug fix entailed a git.git
checkout with an extra copy of the source code in an untracked
subdirectory, meaning that there was an untracked subdirectory called
"thunderbird-patch-inline" whose name's length is exactly 24 bytes,
which, added to the size of above-mentioned `struct untracked_cache_dir`
that weighs in with 104 bytes on a 64-bit system, amounts to 128,
aligning perfectly with nedmalloc's chunk size.

As there is no obvious way to trigger this bug reliably, on all
platforms supported by Git, and as the bug is obvious enough, this patch
comes without a regression test.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-12 10:49:40 +09:00
Kyle Meyer b22827045e dir: do not traverse repositories with no commits
When treat_directory() encounters a directory that is not in the index
and DIR_NO_GITLINKS is unset, it calls resolve_gitlink_ref() to decide
if a directory looks like a repository, in which case the directory
won't be traversed.  As a result, 'status -uall' and 'ls-files -o'
will show only the directory, even when there are untracked files
within the directory.

For the unusual case where a repository doesn't have a commit checked
out, resolve_gitlink_ref() returns -1 because HEAD cannot be resolved,
and the directory is treated as a normal directory (i.e. traversal
does not stop at the repository boundary).  The status and ls-files
commands above list untracked files within the repository rather than
showing only the top-level directory.  And if 'git add' is called on a
repository with no commit checked out, any untracked files under the
repository are added as blobs in the top-level project, a behavior
that is unlikely to be what the caller intended.

The above case is a corner case in an already unusual situation of the
working tree containing a repository that is not a tracked submodule,
but we might as well treat anything that looks like a repository
consistently.  Loosen the "looks like a repository" criteria in
treat_directory() by replacing resolve_gitlink_ref() with
is_nonbare_repository_dir(), one of the checks that is performed
downstream when resolve_gitlink_ref() is called.

As the required update to t3700-add shows, calling 'git add' on a
repository with no commit checked out will now raise an error.  While
this is the desired behavior, note that the output isn't yet
appropriate.  The next commit will improve this output.

Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 12:52:49 +09:00
brian m. carlson 3899b88b49 dir: make untracked cache extension hash size independent
Instead of using a struct with a flex array member to read and write the
untracked cache extension, use a shorter, fixed-length struct and add
the name and hash data explicitly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 11:57:39 +09:00
Jeff King c5c33504c9 report_path_error(): drop unused prefix parameter
This hasn't been used since 17ddc66e70 (convert report_path_error to
take struct pathspec, 2013-07-14), as the names in the struct will have
already been prefixed when they were parsed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20 18:34:09 +09:00
Junio C Hamano 7589e63648 Merge branch 'nd/the-index-final'
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.

* nd/the-index-final:
  cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
  read-cache.c: remove the_* from index_has_changes()
  merge-recursive.c: remove implicit dependency on the_repository
  merge-recursive.c: remove implicit dependency on the_index
  sha1-name.c: remove implicit dependency on the_index
  read-cache.c: replace update_index_if_able with repo_&
  read-cache.c: kill read_index()
  checkout: avoid the_index when possible
  repository.c: replace hold_locked_index() with repo_hold_locked_index()
  notes-utils.c: remove the_repository references
  grep: use grep_opt->repo instead of explict repo argument
2019-02-06 22:05:23 -08:00
Nguyễn Thái Ngọc Duy f8adbec9fe cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
By default, index compat macros are off from now on, because they
could hide the_index dependency.

Only those in builtin can use it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24 11:55:06 -08:00
Nguyễn Thái Ngọc Duy 22af33bece dir.c: move, rename and export match_attrs()
The function will be reused for matching attributes in pathspec when
walking trees (currently it's used for matching pathspec when walking
a list). pathspec.c would be a more neutral place for this.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-19 10:50:33 +09:00
Jeff King 8a2c174677 pathspec: handle non-terminated strings with :(attr)
The pathspec code always takes names to be matched as a
name/namelen pair, but match_attrs() never looks at namelen,
and just treats "name" like a NUL-terminated string, passing
it to git_check_attr().

This usually works anyway. Every caller passes a
NUL-terminated string, and in all but one the passed-in
length is the same as the length of the string (the
exception is dir_path_match(), which may pass a smaller
length to drop a trailing slash). So we won't currently ever
read random memory, and the one case I found actually
happens to work correctly because the attr code can handle
the trailing slash itself.

But it's still worth addressing, as the function interface
implies that the name does not have to be NUL-terminated,
making this an accident waiting to happen.

Since teaching git_check_attr() to take a ptr/len pair would
be a big refactor, we'll just allocate a new string. We can
do this only when necessary, which avoids paying the cost
for most callers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-02 20:49:53 +09:00
Junio C Hamano 769af0fd9e Merge branch 'jk/cocci'
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.

* jk/cocci:
  show_dirstat: simplify same-content check
  read-cache: use oideq() in ce_compare functions
  convert hashmap comparison functions to oideq()
  convert "hashcmp() != 0" to "!hasheq()"
  convert "oidcmp() != 0" to "!oideq()"
  convert "hashcmp() == 0" to hasheq()
  convert "oidcmp() == 0" to oideq()
  introduce hasheq() and oideq()
  coccinelle: use <...> for function exclusion
2018-09-17 13:53:57 -07:00
Junio C Hamano 7e794d0a3f Merge branch 'nd/unpack-trees-with-cache-tree'
The unpack_trees() API used in checking out a branch and merging
walks one or more trees along with the index.  When the cache-tree
in the index tells us that we are walking a tree whose flattened
contents is known (i.e. matches a span in the index), as linearly
scanning a span in the index is much more efficient than having to
open tree objects recursively and listing their entries, the walk
can be optimized, which is done in this topic.

* nd/unpack-trees-with-cache-tree:
  Document update for nd/unpack-trees-with-cache-tree
  cache-tree: verify valid cache-tree in the test suite
  unpack-trees: add missing cache invalidation
  unpack-trees: reuse (still valid) cache-tree from src_index
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: add performance tracing
  trace.h: support nested performance tracing
2018-09-17 13:53:53 -07:00
Jeff King 9001dc2a74 convert "oidcmp() != 0" to "!oideq()"
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:

  if (oidcmp(E1, E2))

As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.

There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Junio C Hamano dc0f6f9e1d Merge branch 'nd/no-the-index'
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.

* nd/no-the-index: (24 commits)
  blame.c: remove implicit dependency on the_index
  apply.c: remove implicit dependency on the_index
  apply.c: make init_apply_state() take a struct repository
  apply.c: pass struct apply_state to more functions
  resolve-undo.c: use the right index instead of the_index
  archive-*.c: use the right repository
  archive.c: avoid access to the_index
  grep: use the right index instead of the_index
  attr: remove index from git_attr_set_direction()
  entry.c: use the right index instead of the_index
  submodule.c: use the right index instead of the_index
  pathspec.c: use the right index instead of the_index
  unpack-trees: avoid the_index in verify_absent()
  unpack-trees: convert clear_ce_flags* to avoid the_index
  unpack-trees: don't shadow global var the_index
  unpack-trees: add a note about path invalidation
  unpack-trees: remove 'extern' on function declaration
  ls-files: correct index argument to get_convert_attr_ascii()
  preload-index.c: use the right index instead of the_index
  dir.c: remove an implicit dependency on the_index in pathspec code
  ...
2018-08-20 11:33:53 -07:00
Nguyễn Thái Ngọc Duy c46c406ae1 trace.h: support nested performance tracing
Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.

Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18 09:47:46 -07:00
Junio C Hamano 4bea8485e3 Merge branch 'nd/i18n'
Many more strings are prepared for l10n.

* nd/i18n: (23 commits)
  transport-helper.c: mark more strings for translation
  transport.c: mark more strings for translation
  sha1-file.c: mark more strings for translation
  sequencer.c: mark more strings for translation
  replace-object.c: mark more strings for translation
  refspec.c: mark more strings for translation
  refs.c: mark more strings for translation
  pkt-line.c: mark more strings for translation
  object.c: mark more strings for translation
  exec-cmd.c: mark more strings for translation
  environment.c: mark more strings for translation
  dir.c: mark more strings for translation
  convert.c: mark more strings for translation
  connect.c: mark more strings for translation
  config.c: mark more strings for translation
  commit-graph.c: mark more strings for translation
  builtin/replace.c: mark more strings for translation
  builtin/pack-objects.c: mark more strings for translation
  builtin/grep.c: mark strings for translation
  builtin/config.c: mark more strings for translation
  ...
2018-08-15 15:08:23 -07:00
Nguyễn Thái Ngọc Duy 6d2df284e7 dir.c: remove an implicit dependency on the_index in pathspec code
Make the match_patchspec API and friends take an index_state instead
of assuming the_index in dir.c. All external call sites are converted
blindly to keep the patch simple and retain current behavior.
Individual call sites may receive further updates to use the right
index instead of the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Nguyễn Thái Ngọc Duy 7a400a2c02 attr: remove an implicit dependency on the_index
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.

There is one ugly temporary workaround added in attr.c that needs some
more explanation.

Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.

The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Nguyễn Thái Ngọc Duy a80897c1e9 dir.c: mark more strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:10 -07:00
Nguyễn Thái Ngọc Duy 1a07e59c3e Update messages in preparation for i18n
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are

- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
  messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
  not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
  (on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
  if not redundant
- errno_errno() is used instead of explicit strerror()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:09 -07:00
Junio C Hamano 3c5b6ee92e Merge branch 'tz/exclude-doc-smallfixes'
Doc updates.

* tz/exclude-doc-smallfixes:
  dir.c: fix typos in core.excludesfile comment
  gitignore.txt: clarify default core.excludesfile path
2018-07-18 12:20:34 -07:00
Junio C Hamano 00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Todd Zullinger 51d1863168 dir.c: fix typos in core.excludesfile comment
Make it easier to find references to core.excludesfile and the default
$XDG_CONFIG_HOME/git/ignore path.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 12:17:17 -07:00
Junio C Hamano f35f43f565 Merge branch 'jk/ewah-bounds-check'
The code to read compressed bitmap was not careful to avoid reading
past the end of the file, which has been corrected.

* jk/ewah-bounds-check:
  ewah: adjust callers of ewah_read_mmap()
  ewah_read_mmap: bounds-check mmap reads
2018-06-18 11:23:22 -07:00
Jeff King 1140bf01ec ewah: adjust callers of ewah_read_mmap()
The return value of ewah_read_mmap() is now an ssize_t,
since we could (in theory) process up to 32GB of data. This
would never happen in practice, but a corrupt or malicious
.bitmap or index file could convince us to do so.

Let's make sure that we don't stuff the value into an int,
which would cause us to incorrectly move our pointer
forward.  We'd always move too little, since negative values
are used for reporting errors. So the worst case is just
that we end up reporting a corrupt file, not an
out-of-bounds read.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 09:13:57 -07:00
Junio C Hamano 42c8ce1c49 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (42 commits)
  merge-one-file: compute empty blob object ID
  add--interactive: compute the empty tree value
  Update shell scripts to compute empty tree object ID
  sha1_file: only expose empty object constants through git_hash_algo
  dir: use the_hash_algo for empty blob object ID
  sequencer: use the_hash_algo for empty tree object ID
  cache-tree: use is_empty_tree_oid
  sha1_file: convert cached object code to struct object_id
  builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
  builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
  wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
  submodule: convert several uses of EMPTY_TREE_SHA1_HEX
  sequencer: convert one use of EMPTY_TREE_SHA1_HEX
  merge: convert empty tree constant to the_hash_algo
  builtin/merge: switch tree functions to use object_id
  builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
  sha1-file: add functions for hex empty tree and blob OIDs
  builtin/receive-pack: avoid hard-coded constants for push certs
  diff: specify abbreviation size in terms of the_hash_algo
  upload-pack: replace use of several hard-coded constants
  ...
2018-05-30 14:04:10 +09:00
Junio C Hamano 7913f53b56 Sync with Git 2.17.1
* maint: (25 commits)
  Git 2.17.1
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  ...
2018-05-29 17:10:05 +09:00
Junio C Hamano 68f95b26e4 Sync with Git 2.16.4
* maint-2.16:
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:25:26 +09:00
Stefan Beller cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Junio C Hamano 0c7ecb7c31 Merge branch 'sb/submodule-move-nested'
Moving a submodule that itself has submodule in it with "git mv"
forgot to make necessary adjustment to the nested sub-submodules;
now the codepath learned to recurse into the submodules.

* sb/submodule-move-nested:
  submodule: fixup nested submodules after moving the submodule
  submodule-config: remove submodule_from_cache
  submodule-config: add repository argument to submodule_from_{name, path}
  submodule-config: allow submodule_free to handle arbitrary repositories
  grep: remove "repo" arg from non-supporting funcs
  submodule.h: drop declaration of connect_work_tree_and_git_dir
2018-05-08 15:59:17 +09:00
brian m. carlson ba2df7519a dir: use the_hash_algo for empty blob object ID
To ensure that we are hash algorithm agnostic, use the_hash_algo to look
up the object ID for the empty blob instead of using the empty_tree_oid
variable.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:53 +09:00
brian m. carlson 70c369cde0 dir: convert struct untracked_cache_dir to object_id
Convert the exclude_sha1 member of struct untracked_cache_dir and rename
it to exclude_oid.  Eliminate several hard-coded integral constants, and
update a function name that referred to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:51 +09:00
Stefan Beller da62f786d2 submodule: fixup nested submodules after moving the submodule
connect_work_tree_and_git_dir is used to connect a submodule worktree with
its git directory and vice versa after events that require a reconnection
such as moving around the working tree. As submodules can have nested
submodules themselves, we'd also want to fix the nested submodules when
asked to. Add an option to recurse into the nested submodules and connect
them as well.

As submodules are identified by their name (which determines their git
directory in relation to their superproject's git directory) internally
and by their path in the working tree of the superproject, we need to
make sure that the mapping of name <-> path is kept intact. We can do
that in the git-mv command by writing out the gitmodules file first
and then forcing a reload of the submodule config machinery.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-29 09:44:51 -07:00
brian m. carlson b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
Junio C Hamano cdda65acae Merge branch 'bp/untracked-cache-noflush'
Writing out the index file when the only thing that changed in it
is the untracked cache information is often wasteful, and this has
been optimized out.

* bp/untracked-cache-noflush:
  untracked cache: use git_env_bool() not getenv() for customization
  dir.c: don't flag the index as dirty for changes to the untracked cache
2018-03-08 12:36:30 -08:00
Junio C Hamano 026336cb27 untracked cache: use git_env_bool() not getenv() for customization
GIT_DISABLE_UNTRACKED_CACHE and GIT_TEST_UNTRACKED_CACHE are only
sensed for their presense by using getenv(); use git_env_bool()
instead so that GIT_DISABLE_UNTRACKED_CACHE=false would work as
naïvely expected.

Also rename GIT_TEST_UNTRACKED_CACHE to GIT_FORCE_UNTRACKED_CACHE
to express what it does more honestly.  Forcing its use may be one
useful thing to do while testing the feature, but testing does not
have to be the only use of the knob.

While at it, avoid repeated calls to git_env_bool() by capturing the
return value from the first call in a static variable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-28 13:27:10 -08:00
Junio C Hamano cf44c1e0a2 Merge branch 'nd/fix-untracked-cache-invalidation'
Some bugs around "untracked cache" feature have been fixed.

* nd/fix-untracked-cache-invalidation:
  dir.c: ignore paths containing .git when invalidating untracked cache
  dir.c: stop ignoring opendir() error in open_cached_dir()
  dir.c: fix missing dir invalidation in untracked code
  dir.c: avoid stat() in valid_cached_dir()
  status: add a failing test showing a core.untrackedCache bug
2018-02-27 10:33:50 -08:00
Junio C Hamano 090dbea684 Merge branch 'nd/trace-index-ops'
* nd/trace-index-ops:
  trace: measure where the time is spent in the index-heavy operations
2018-02-15 14:55:44 -08:00
Junio C Hamano 8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Nguyễn Thái Ngọc Duy 0cacebf099 dir.c: ignore paths containing .git when invalidating untracked cache
read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unncessarily,
which:

- makes read_directory() fall back to slow path for root directory
  (slower)

- makes the index dirty (because UNTR extension is updated). Depending
  on the index size, writing it down could also be slow.

A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.

Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-07 12:27:02 -08:00
Ben Peart fc9ecbeb93 dir.c: don't flag the index as dirty for changes to the untracked cache
The untracked cache saves its current state in the UNTR index extension.
Currently, _any_ change to that state causes the index to be flagged as dirty
and written out to disk.  Unfortunately, the cost to write out the index can
exceed the savings gained by using the untracked cache.  Since it is a cache
that can be updated from the current state of the working directory, there is
no functional requirement that the index be written out for every change to the
untracked cache.

Update the untracked cache logic so that it no longer forces the index to be
written to disk except in the case where the extension is being turned on or
off.  When some other git command requires the index to be written to disk, the
untracked cache will take advantage of that to save it's updated state as well.
This results in a performance win when looked at over common sequences of git
commands (ie such as a status followed by add, commit, etc).

After this patch, all the logic to track statistics for the untracked cache
could be removed as it is only used by debug tracing used to debug the untracked
cache.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-05 12:55:49 -08:00
Nguyễn Thái Ngọc Duy ca54d9baa4 trace: measure where the time is spent in the index-heavy operations
All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below:

    0.001791141 s: read cache ...
    0.004011363 s: preload index
    0.000516161 s: refresh index
    0.003139257 s: git command: ... 'status' '--porcelain=2'
    0.006788129 s: diff-files
    0.002090267 s: diff-index
    0.001885735 s: initialize name hash
    0.032013138 s: read directory
    0.051781209 s: git command: './git' 'status'

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:20:16 -08:00
Nguyễn Thái Ngọc Duy b673155074 dir.c: stop ignoring opendir() error in open_cached_dir()
A follow-up to the recently fixed bugs in the untracked
invalidation. If opendir() fails it should show a warning, perhaps
this should die, but if this ever happens the error is probably
recoverable for the user, and dying would just make things worse.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 10:16:23 -08:00
Patryk Obara f070faccc1 sha1_file: convert hash_sha1_file to object_id
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.

Rename this function to hash_object_file.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara 4b33e60201 dir: convert struct sha1_stat to use object_id
Convert the declaration of struct sha1_stat. Adjust all usages of this
struct and replace hash{clr,cmp,cpy} with oid{clr,cmp,cpy} wherever
possible.  Rename it to struct oid_stat.

Rename static function load_sha1_stat to load_oid_stat.

Remove macro EMPTY_BLOB_SHA1_BIN, as it's no longer used.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Nguyễn Thái Ngọc Duy b640313110 dir.c: fix missing dir invalidation in untracked code
Let's start with how create a new directory cache after the last one
becomes invalid (e.g. because its dir mtime has changed...). In
open_cached_dir():

1. We start out with valid_cached_dir() returning false, which should
   call invalidate_directory() to put a directory state back to
   initial state, no untracked entries (untracked_nr zero), no sub
   directory traversal (dirs[].recurse zero).

2. Since the cache cannot be used, we go the slow path opendir() and
   go through items one by one via readdir(). All the directories on
   disk will be added back to the cache (if not already exist in
   dirs[]) and its flag "recurse" gets changed to one to note that
   it's part of the cached dir travesal next time.

3. By the time we reach close_cached_dir() we should have a good
   subdir list in dirs[]. Those with "recurse" flag set are the ones
   present in the on-disk directory. The directory is now marked
   "valid".

Next time read_directory() is called, since the directory is marked
valid, it will skip readdir(), go fast path and traverse through
dirs[] array instead.

Steps one and two need some tight cooperation. If a subdir is removed,
readdir() will not find it and of course we cannot examine/invalidate
it. To make sure removed directories on disk are gone from the cache,
step one must make sure recurse flag of all subdirs are zero.

But that's not true. If "valid" flag is already false, there is a
chance we go straight to the end of valid_cached_dir() without calling
invalidate_directory(). Or we fail to meet the "if (untracked-valid)"
condition and skip over the invalidate_directory().

After step 3, we mark the cache valid. Any stale subdir with incorrect
recurse flag becomes a real subdir next time we traverse the directory
using dirs[] array.

We could avoid this by making sure invalidate_directory() is always
called (therefore dirs[].recurse cleared) at the beginning of
open_cached_dir(). Which is what this patch does.

As to how we get into this situation, the key in the test is this
command

    git checkout master

where "one/file" is replaced with "one" in the index. This index
update triggers untracked_cache_invalidate_path(), which clears valid
flag of the root directory while keeping "recurse" flag on the subdir
"one" on. On the next git-status, we go through steps 1-3 above and
save an incorrect cache on disk. The second git-status blindly follows
the bad cache data and shows the problem.

This is arguably because of a bad design where "recurse" flag plays
double roles: whether a directory should be saved on disk, and whether
it is part of a directory traversal.

We need to keep recurse flag set at "checkout master" because of the
first role: we need to keep subdir caches (dir "two" for example has
not been touched at all, no reason to throw its cache away).

As long as we make sure to ignore/reset "recurse" flag at the
beginning of a directory traversal, we're good. But maybe eventually
we should separate these two roles.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24 12:40:14 -08:00
Nguyễn Thái Ngọc Duy 2523c4be85 dir.c: avoid stat() in valid_cached_dir()
stat() may follow a symlink and return stat data of the link's target
instead of the link itself. We are concerned about the link itself.

It's kind of hard to demonstrate the bug. I think when path->buf is a
symlink, we most likely find that its target's stat data does not
match our cached one, which means we ignore the cache and fall back to
slow path.

This is performance issue, not correctness (though we could still
catch it by verifying test-dump-untracked-cache. The less unlikely
case is, link target stat data matches the cached version and we
incorrectly go fast path, ignoring real data on disk. A test for this
may involve manipulating stat data, which may be not portable.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24 12:40:13 -08:00
SZEDER Gábor f919ffebed Use MOVE_ARRAY
Use the helper macro MOVE_ARRAY to move arrays.  This is shorter and
safer, as it automatically infers the size of elements.

Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22 11:32:51 -08:00
Junio C Hamano 61061abba7 Merge branch 'jh/object-filtering'
In preparation for implementing narrow/partial clone, the object
walking machinery has been taught a way to tell it to "filter" some
objects from enumeration.

* jh/object-filtering:
  rev-list: support --no-filter argument
  list-objects-filter-options: support --no-filter
  list-objects-filter-options: fix 'keword' typo in comment
  pack-objects: add list-objects filtering
  rev-list: add list-objects filtering support
  list-objects: filter objects in traverse_commit_list
  oidset: add iterator methods to oidset
  oidmap: add oidmap iterator methods
  dir: allow exclusions from blob in addition to file
2017-12-27 11:16:21 -08:00
Jeff Hostetler 578d81d0c4 dir: allow exclusions from blob in addition to file
Refactor add_excludes() to separate the reading of the
exclude file into a buffer and the parsing of the buffer
into exclude_list items.

Add add_excludes_from_blob_to_list() to allow an exclude
file be specified with an OID without assuming a local
worktree or index exists.

Refactor read_skip_worktree_file_from_index() and add
do_read_blob() to eliminate duplication of preliminary
processing of blob contents.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 14:11:56 +09:00
Junio C Hamano e05336bdda Merge branch 'bp/fsmonitor'
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.

* bp/fsmonitor:
  fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
  fsmonitor: read entirety of watchman output
  fsmonitor: MINGW support for watchman integration
  fsmonitor: add a performance test
  fsmonitor: add a sample integration script for Watchman
  fsmonitor: add test cases for fsmonitor extension
  split-index: disable the fsmonitor extension when running the split index test
  fsmonitor: add a test tool to dump the index extension
  update-index: add fsmonitor support to update-index
  ls-files: Add support in ls-files to display the fsmonitor valid bit
  fsmonitor: add documentation for the fsmonitor extension.
  fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  update-index: add a new --force-write-index option
  preload-index: add override to enable testing preload-index
  bswap: add 64 bit endianness helper get_be64
2017-11-21 14:07:50 +09:00
Junio C Hamano d8df70f273 Merge branch 'jm/status-ignored-files-list'
The set of paths output from "git status --ignored" was tied
closely with its "--untracked=<mode>" option, but now it can be
controlled more flexibly.  Most notably, a directory that is
ignored because it is listed to be ignored in the ignore/exclude
mechanism can be handled differently from a directory that ends up
to be ignored only because all files in it are ignored.

* jm/status-ignored-files-list:
  status: test ignored modes
  status: document options to show matching ignored files
  status: report matching ignored and normal untracked
  status: add option to show ignored files differently
2017-11-13 14:44:59 +09:00
Junio C Hamano e7e456f500 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (25 commits)
  refs/files-backend: convert static functions to object_id
  refs: convert read_raw_ref backends to struct object_id
  refs: convert peel_object to struct object_id
  refs: convert resolve_ref_unsafe to struct object_id
  worktree: convert struct worktree to object_id
  refs: convert resolve_gitlink_ref to struct object_id
  Convert remaining callers of resolve_gitlink_ref to object_id
  sha1_file: convert index_path and index_fd to struct object_id
  refs: convert reflog_expire parameter to struct object_id
  refs: convert read_ref_at to struct object_id
  refs: convert peel_ref to struct object_id
  builtin/pack-objects: convert to struct object_id
  pack-bitmap: convert traverse_bitmap_commit_list to object_id
  refs: convert dwim_log to struct object_id
  builtin/reflog: convert remaining unsigned char uses to object_id
  refs: convert dwim_ref and expand_ref to struct object_id
  refs: convert read_ref and read_ref_full to object_id
  refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
  Convert check_connected to use struct object_id
  refs: update ref transactions to use struct object_id
  ...
2017-11-06 14:24:27 +09:00
Junio C Hamano da7996aaf7 Merge branch 'js/submodule-in-excluded'
"git status --ignored -u" did not stop at a working tree of a
separate project that is embedded in an ignored directory and
listed files in that other project, instead of just showing the
directory itself as ignored.

* js/submodule-in-excluded:
  status: do not get confused by submodules in excluded directories
2017-11-06 13:11:26 +09:00
Jameson Miller 07966ed19e status: report matching ignored and normal untracked
Teach status command to handle `--ignored=matching` with
`--untracked-files=normal`

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-31 11:54:21 +09:00
Jameson Miller eec0f7f2b7 status: add option to show ignored files differently
Teach the status command more flexibility in how ignored files are
reported. Currently, the reporting of ignored files and untracked
files are linked. You cannot control how ignored files are reported
independently of how untracked files are reported (i.e. `all` vs
`normal`). This makes it impossible to show untracked files with the
`all` option, but show ignored files with the `normal` option.

This work 1) adds the ability to control the reporting of ignored
files independently of untracked files and 2) introduces the concept
of status reporting ignored paths that explicitly match an ignored
pattern. There are 2 benefits to these changes: 1) if a consumer needs
all untracked files but not all ignored files, there is a performance
benefit to not scanning all contents of an ignored directory and 2)
returning ignored files that explicitly match a path allow a consumer
to make more informed decisions about when a status result might be
stale.

This commit implements --ignored=matching with --untracked-files=all.
The following commit will implement --ignored=matching with
--untracked=files=normal.

As an example of where this flexibility could be useful is that our
application (Visual Studio) runs the status command and presents the
output. It shows all untracked files individually (e.g. using the
'--untracked-files==all' option), and would like to know about which
paths are ignored. It uses information about ignored paths to make
decisions about when the status result might have changed.
Additionally, many projects place build output into directories inside
a repository's working directory (e.g. in "bin/" and "obj/"
directories). Normal usage is to explicitly ignore these 2 directory
names in the .gitignore file (rather than or in addition to the *.obj
pattern).If an application could know that these directories are
explicitly ignored, it could infer that all contents are ignored as
well and make better informed decisions about files in these
directories. It could infer that any changes under these paths would
not affect the output of status. Additionally, there can be a
significant performance benefit by avoiding scanning through ignored
directories.

When status is set to report matching ignored files, it has the
following behavior. Ignored files and directories that explicitly
match an exclude pattern are reported. If an ignored directory matches
an exclude pattern, then the path of the directory is returned. If a
directory does not match an exclude pattern, but all of its contents
are ignored, then the contained files are reported instead of the
directory.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-31 11:54:21 +09:00
Johannes Schindelin fadb4820c4 status: do not get confused by submodules in excluded directories
We meticulously pass the `exclude` flag to the `treat_directory()`
function so that we can indicate that files in it are excluded rather
than untracked when recursing.

But we did not yet treat submodules the same way.

Because of that, `git status --ignored --untracked` with a submodule
`submodule` in a gitignored `tracked/` would show the submodule in the
"Untracked files" section, e.g.

	On branch master
	Untracked files:
	  (use "git add <file>..." to include in what will be committed)

		tracked/submodule/

	Ignored files:
	  (use "git add -f <file>..." to include in what will be committed)

		tracked/submodule/initial.t

Instead, we would want it to show the submodule in the "Ignored files"
section:

	On branch master
	Ignored files:
	  (use "git add -f <file>..." to include in what will be committed)

		tracked/submodule/

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-26 11:29:06 +09:00
brian m. carlson a98e6101f0 refs: convert resolve_gitlink_ref to struct object_id
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:

@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)

@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
brian m. carlson 1053fe829c Convert remaining callers of resolve_gitlink_ref to object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
Ben Peart 883e248b8a fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01 17:23:01 +09:00
Jameson Miller 5aaa7fd39a Improve performance of git status --ignored
Improve the performance of the directory listing logic when it wants to list
non-empty ignored directories. In order to show non-empty ignored directories,
the existing logic will recursively iterate through all contents of an ignored
directory. This change introduces the optimization to stop iterating through
the contents once it finds the first file. This can have a significant
improvement in 'git status --ignored' performance in repositories with a large
number of files in ignored directories.

For an example of the performance difference on an example repository with
196,000 files in 400 ignored directories:

| Command                    |  Time (s) |
| -------------------------- | --------- |
| git status                 |   1.2     |
| git status --ignored (old) |   3.9     |
| git status --ignored (new) |   1.4     |

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-19 12:28:06 +09:00