A more structured way to obtain execution trace has been added.
* jh/trace2:
trace2: add for_each macros to clang-format
trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh
trace2:data: add subverb for rebase
trace2:data: add subverb to reset command
trace2:data: add subverb to checkout command
trace2:data: pack-objects: add trace2 regions
trace2:data: add trace2 instrumentation to index read/write
trace2:data: add trace2 hook classification
trace2:data: add trace2 transport child classification
trace2:data: add trace2 sub-process classification
trace2:data: add editor/pager child classification
trace2:data: add trace2 regions to wt-status
trace2: collect Windows-specific process information
trace2: create new combined trace facility
trace2: Documentation/technical/api-trace2.txt
Four new configuration variables {author,committer}.{name,email}
have been introduced to override user.{name,email} in more specific
cases.
* wh/author-committer-ident-config:
config: allow giving separate author and committer idents
"git checkout --no-overlay" can be used to trigger a new mode of
checking out paths out of the tree-ish, that allows paths that
match the pathspec that are in the current index and working tree
and are not in the tree-ish.
* tg/checkout-no-overlay:
revert "checkout: introduce checkout.overlayMode config"
checkout: introduce checkout.overlayMode config
checkout: introduce --{,no-}overlay option
checkout: factor out mark_cache_entry_for_checkout function
checkout: clarify comment
read-cache: add invalidate parameter to remove_marked_cache_entries
entry: support CE_WT_REMOVE flag in checkout_entry
entry: factor out unlink_entry function
move worktree tests to t24*
Create a new unified tracing facility for git. The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.
In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.
Trace2 defines 3 output targets. These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT". These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).
* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
summary data.
* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
It extends the output with columns for the command process, thread,
repo, absolute and relative elapsed times. It reports events for
child process start/stop, thread start/stop, and per-thread function
nesting.
* GIT_TR2_EVENT is a new structured format. It writes event data as a
series of JSON records.
Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new date format "--date=human" that morphs its output depending
on how far the time is from the current time has been introduced.
"--date=auto" can be used to use this new format when the output is
going to the pager or to the terminal and otherwise the default
format.
* lt/date-human:
Add `human` date format tests.
Add `human` format to test-tool
Add 'human' date format documentation
Replace the proposed 'auto' mode with 'auto:'
Add 'human' date format
Code cleanup.
* jk/unused-parameter-cleanup:
convert: drop path parameter from actual conversion functions
convert: drop len parameter from conversion checks
config: drop unused parameter from maybe_remove_section()
show_date_relative(): drop unused "tz" parameter
column: drop unused "opts" parameter in item_length()
create_bundle(): drop unused "header" parameter
apply: drop unused "def" parameter from find_name_gnu()
match-trees: drop unused path parameter from score functions
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.
* nd/the-index-final:
cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
read-cache.c: remove the_* from index_has_changes()
merge-recursive.c: remove implicit dependency on the_repository
merge-recursive.c: remove implicit dependency on the_index
sha1-name.c: remove implicit dependency on the_index
read-cache.c: replace update_index_if_able with repo_&
read-cache.c: kill read_index()
checkout: avoid the_index when possible
repository.c: replace hold_locked_index() with repo_hold_locked_index()
notes-utils.c: remove the_repository references
grep: use grep_opt->repo instead of explict repo argument
More code in "git bisect" has been rewritten in C.
* tt/bisect-in-c:
bisect--helper: `bisect_start` shell function partially in C
bisect--helper: `get_terms` & `bisect_terms` shell function in C
bisect--helper: `bisect_next_check` shell function in C
bisect--helper: `check_and_set_terms` shell function in C
wrapper: move is_empty_file() and rename it as is_empty_or_missing_file()
bisect--helper: `bisect_write` shell function in C
bisect--helper: `bisect_reset` shell function in C
"git cat-file --batch" reported a dangling symbolic link by
mistake, when it wanted to report that a given name is ambiguous.
* dt/cat-file-batch-ambiguous:
t1512: test ambiguous cat-file --batch and --batch-output
Do not print 'dangling' for cat-file in case of ambiguity
"git add --ignore-errors" did not work as advertised and instead
worked as an unintended synonym for "git add --renormalize", which
has been fixed.
* jk/add-ignore-errors-bit-assignment-fix:
add: use separate ADD_CACHE_RENORMALIZE flag
The author.email, author.name, committer.email and committer.name
settings are analogous to the GIT_AUTHOR_* and GIT_COMMITTER_*
environment variables, but for the git config system. This allows them
to be set separately for each repository.
Git supports setting different authorship and committer
information with environment variables. However, environment variables
are set in the shell, so if different authorship and committer
information is needed for different repositories an external tool is
required.
This adds support to git config for author.email, author.name,
committer.email and committer.name settings so this information
can be set per repository.
Also, it generalizes the fmt_ident function so it can handle author vs
committer identification.
Signed-off-by: William Hubbs <williamh@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.
* bc/tree-walk-oid:
cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
tree-walk: store object_id in a separate member
match-trees: use hashcpy to splice trees
match-trees: compute buffer offset correctly when splicing
tree-walk: copy object ID before use
Add sha-256 hash and plug it through the code to allow building Git
with the "NewHash".
* bc/sha-256:
hash: add an SHA-256 implementation using OpenSSL
sha256: add an SHA-256 implementation using libgcrypt
Add a base implementation of SHA-256 support
commit-graph: convert to using the_hash_algo
t/helper: add a test helper to compute hash speed
sha1-file: add a constant for hash block size
t: make the sha1 test-tool helper generic
t: add basic tests for our SHA-1 implementation
cache: make hashcmp and hasheq work with larger hashes
hex: introduce functions to print arbitrary hashes
sha1-file: provide functions to look up hash algorithms
sha1-file: rename algorithm to "sha1"
Add the human format support to the test tool so that
GIT_TEST_DATE_NOW can be used to specify the current time.
The get_time() helper function was created and and checks the
GIT_TEST_DATE_NOW environment variable. If GIT_TEST_DATE_NOW is set,
then that date is used instead of the date returned by by
gettimeofday().
All calls to gettimeofday() were replaced by calls to get_time().
Renamed occurances of TEST_DATE_NOW to GIT_TEST_DATE_NOW since the
variable is now used in the get binary and not just in the test-tool.
Signed-off-by: Stephen P. Smith <ischis2@cox.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The timestamp we receive is in epoch time, so there's no need for a
timezone parameter to interpret it. The matching show_date() uses "tz"
to show dates in author local time, but relative dates show only the
absolute time difference. The author's location is irrelevant, barring
relativistic effects from using Git close to the speed of light.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return values -1 and -2 from get_oid could mean two different
things, depending on whether they were from an enum returned by
get_tree_entry_follow_symlinks, or from a different code path. This
caused 'dangling' to be printed from a git cat-file in the case of an
ambiguous (-2) result.
Unify the results of get_oid* and get_tree_entry_follow_symlinks to be
one common type, with unambiguous values.
Signed-off-by: David Turner <novalis@novalis.org>
Reported-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds --date=human, which skips the timezone if it matches the
current time-zone, and doesn't print the whole date if that matches (ie
skip printing year for dates that are "this year", but also skip the
whole date itself if it's in the last few days and we can just say what
weekday it was).
For really recent dates (same day), use the relative date stamp, while
for old dates (year doesn't match), don't bother with time and timezone.
Also add 'auto' date mode, which defaults to human if we're using the
pager. So you can do
git config --add log.date auto
and your "git log" commands will show the human-legible format unless
you're scripting things.
Note that this time format still shows the timezone for recent enough
events (but not so recent that they show up as relative dates). You can
combine it with the "-local" suffix to never show timezones for an even
more simplified view.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephen P. Smith <ischis2@cox.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught
git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes
the flag along to index_path(). However, the flags taken by
add_to_index() and the ones taken by index_path() are distinct
namespaces. We cannot take HASH_* flags in add_to_index(), because they
overlap with the ADD_CACHE_* flags we already take (in this case,
HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS).
We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using
it to set HASH_RENORMALIZE within add_to_index(). In order to make it
clear that these two flags come from distinct sets, let's also change
the name "newflags" in the function to "hash_flags".
Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some situations in which we want to store an object ID into
struct object_id without the_hash_algo necessarily being set correctly.
One such case is when cloning a repository, where we must read refs from
the remote side without having a repository from which to read the
preferred algorithm.
In this cases, we may have the_hash_algo set to SHA-1, which is the
default, but read refs into struct object_id that are SHA-256. When
copying these values, we will want to copy them completely, not just the
first 20 bytes. Consequently, make sure that oidcpy copies the maximum
number of bytes at all times, regardless of the setting of
the_hash_algo.
Since oidcpy and hashcpy are no longer functionally identical, remove
the Cocinelle object_id transformations that convert from one into the
other.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This kills the_index dependency in get_oid_with_context() but for
get_oid() and friends, they still assume the_repository (which also
means the_index).
Unfortunately the widespread use of get_oid() will make it hard to
make the conversion now. We probably will add repo_get_oid() at some
point and limit the use of get_oid() in builtin/ instead of forcing
all get_oid() call sites to carry struct repository.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_index() shares the same problem as hold_locked_index(): it
assumes $GIT_DIR/index. Move all call sites to repo_read_index()
instead. read_index_preload() and read_index_unmerged() are also
killed as a consequence.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hold_locked_index() assumes the index path at $GIT_DIR/index. This is
not good for places that take an arbitrary index_state instead of
the_index, which is basically everywhere except builtin/.
Replace it with repo_hold_locked_index(). hold_locked_index() remains
as a wrapper around repo_hold_locked_index() to reduce changes in builtin/
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the open/map/close functions for loose objects that were
recently converted, the functions for parsing the loose object stream
use the name "sha1" and a bare "unsigned char *". Let's fix that so that
unpack_sha1_header() becomes unpack_loose_header(), etc.
These conversions are less clear-cut than the file access functions.
You could argue that the they are parsing Git's canonical object format
(i.e., "type size\0contents", over which we compute the hash), which is
not strictly tied to loose storage. But in practice these functions are
used only for loose objects, and using the term "loose_header" (instead
of "object_header") distinguishes it from the object header found in
packfiles (which contains the same information in a different format).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit abef9020e3 (sha1_file: convert sha1_object_info* to object_id,
2018-03-12) renamed the function to oid_object_info(), but missed some
comments which mention it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When marking cache entries for removal, and later removing them all at
once using 'remove_marked_cache_entries()', cache entries currently
have to be invalidated manually in the cache tree and in the untracked
cache.
Add an invalidate flag to the function. With the flag set, the
function will take care of invalidating the path in the cache tree and
in the untracked cache.
Note that the current callsites already do the invalidation properly
in other places, so we're just passing 0 from there to keep the status
quo.
This will be useful in a subsequent commit.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor out the 'unlink_entry()' function from unpack-trees.c to
entry.c. It will be used in other places as well in subsequent
steps.
As it's no longer a static function, also move the documentation to
the header file to make it more discoverable.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
is_empty_file() can help to refactor a lot of code. This will be very
helpful in porting "git bisect" to C.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SHA-1 is weak and we need to transition to a new hash function. For
some time, we have referred to this new function as NewHash. Recently,
we decided to pick SHA-256 as NewHash. The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.
Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain. Optimize it and restructure it to meet our coding
standards. Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.
Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.
Note that with this patch, it is still not possible to switch to using
SHA-256 in Git. Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.
[1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is one place we need the hash algorithm block size: the HMAC code
for push certs. Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp. In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.
However, we want to support hash algorithms other than SHA-1, namely
SHA-256. When doing so, we must handle the case where these values are
32 bytes long as well as 20. Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches. Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider. This will allow the compiler to write the most efficient
possible code.
Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly. Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer. Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant. Update the
documentation as well.
While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.
We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.
Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the problems with "git checkout" is that it does so many
different things and could confuse people specially when we fail to
handle ambiguation correctly.
One way to help with that is tell the user what sort of operation is
actually carried out. When switching branches, we always print
something unless --quiet, either
- "HEAD is now at ..."
- "Reset branch ..."
- "Already on ..."
- "Switched to and reset ..."
- "Switched to a new branch ..."
- "Switched to branch ..."
Checking out paths however is silent. Print something so that if we
got the user intention wrong, they won't waste too much time to find
that out. For the remaining cases of checkout we now print either
- "Checked out ... paths out of the index"
- "Checked out ... paths out of <abbrev hash>"
Since the purpose of printing this is to help disambiguate. Only do it
when "--" is missing.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper function to refresh the cached stat information in the
in-core index has learned to perform the lstat() part of the
operation in parallel on multi-core platforms.
* bp/refresh-index-using-preload:
refresh_index: remove unnecessary calls to preload_index()
speed up refresh_index() by utilizing preload_index()
The submodule support has been updated to read from the blob at
HEAD:.gitmodules when the .gitmodules file is missing from the
working tree.
* ao/submodule-wo-gitmodules-checked-out:
t/helper: add test-submodule-nested-repo-config
submodule: support reading .gitmodules when it's not in the working tree
submodule: add a helper to check if it is safe to write to .gitmodules
t7506: clean up .gitmodules properly before setting up new scenario
submodule: use the 'submodule--helper config' command
submodule--helper: add a new 'config' subcommand
t7411: be nicer to future tests and really clean things up
t7411: merge tests 5 and 6
submodule: factor out a config_set_in_gitmodules_file_gently function
submodule: add a print_config_from_gitmodules() helper
A fourth class of configuration files (in addition to the
traditional "system wide", "per user in the $HOME directory" and
"per repository in the $GIT_DIR/config") has been introduced so
that different worktrees that share the same repository (hence the
same $GIT_DIR/config file) can use different customization.
* nd/per-worktree-config:
worktree: add per-worktree config files
t1300: extract and use test_cmp_config()
Rewrite of the remaining "rebase -i" machinery in C.
* ag/rebase-i-in-c:
rebase -i: move rebase--helper modes to rebase--interactive
rebase -i: remove git-rebase--interactive.sh
rebase--interactive2: rewrite the submodes of interactive rebase in C
rebase -i: implement the main part of interactive rebase as a builtin
rebase -i: rewrite init_basic_state() in C
rebase -i: rewrite write_basic_state() in C
rebase -i: rewrite the rest of init_revisions_and_shortrevisions() in C
rebase -i: implement the logic to initialize $revisions in C
rebase -i: remove unused modes and functions
rebase -i: rewrite complete_action() in C
t3404: todo list with commented-out commands only aborts
sequencer: change the way skip_unnecessary_picks() returns its result
sequencer: refactor append_todo_help() to write its message to a buffer
rebase -i: rewrite checkout_onto() in C
rebase -i: rewrite setup_reflog_action() in C
sequencer: add a new function to silence a command, except if it fails
rebase -i: rewrite the edit-todo functionality in C
editor: add a function to launch the sequence editor
rebase -i: rewrite append_todo_help() in C
sequencer: make three functions and an enum from sequencer.c public
Speed up refresh_index() by utilizing preload_index() to do most of the work
spread across multiple threads. This works because most cache entries will
get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when
called from within refresh_index().
On a Windows repo with ~200K files, this drops refresh times from 6.64
seconds to 2.87 seconds for a savings of 57%.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codepath to support the experimental split-index mode had
remaining "racily clean" issues fixed.
* sg/split-index-racefix:
split-index: BUG() when cache entry refers to non-existing shared entry
split-index: smudge and add racily clean cache entries to split index
split-index: don't compare cached data of entries already marked for split index
split-index: count the number of deleted entries
t1700-split-index: date back files to avoid racy situations
split-index: add tests to demonstrate the racy split index problem
t1700-split-index: document why FSMONITOR is disabled in this test script
A new repo extension is added, worktreeConfig. When it is present:
- Repository config reading by default includes $GIT_DIR/config _and_
$GIT_DIR/config.worktree. "config" file remains shared in multiple
worktree setup.
- The special treatment for core.bare and core.worktree, to stay
effective only in main worktree, is gone. These config settings are
supposed to be in config.worktree.
This extension is most useful in multiple worktree setup because you
now have an option to store per-worktree config (which is either
.git/config.worktree for main worktree, or
.git/worktrees/xx/config.worktree for linked ones).
This extension can be used in single worktree mode, even though it's
pretty much useless (but this can happen after you remove all linked
worktrees and move back to single worktree).
"git config" reads from both "config" and "config.worktree" by default
(i.e. without either --user, --file...) when this extension is
present. Default writes still go to "config", not "config.worktree". A
new option --worktree is added for that (*).
Since a new repo extension is introduced, existing git binaries should
refuse to access to the repo (both from main and linked worktrees). So
they will not misread the config file (i.e. skip the config.worktree
part). They may still accidentally write to the config file anyway if
they use with "git config --file <path>".
This design places a bet on the assumption that the majority of config
variables are shared so it is the default mode. A safer move would be
default writes go to per-worktree file, so that accidental changes are
isolated.
(*) "git config --worktree" points back to "config" file when this
extension is not present and there is only one worktree so that it
works in any both single and multiple worktree setups.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status" learns to show progress bar when refreshing the index
takes a long time.
* nd/status-refresh-progress:
status: show progress bar if refreshing the index takes too long
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".
* nd/the-index: (23 commits)
revision.c: reduce implicit dependency the_repository
revision.c: remove implicit dependency on the_index
ws.c: remove implicit dependency on the_index
tree-diff.c: remove implicit dependency on the_index
submodule.c: remove implicit dependency on the_index
line-range.c: remove implicit dependency on the_index
userdiff.c: remove implicit dependency on the_index
rerere.c: remove implicit dependency on the_index
sha1-file.c: remove implicit dependency on the_index
patch-ids.c: remove implicit dependency on the_index
merge.c: remove implicit dependency on the_index
merge-blobs.c: remove implicit dependency on the_index
ll-merge.c: remove implicit dependency on the_index
diff-lib.c: remove implicit dependency on the_index
read-cache.c: remove implicit dependency on the_index
diff.c: remove implicit dependency on the_index
grep.c: remove implicit dependency on the_index
diff.c: remove the_index dependency in textconv() functions
blame.c: rename "repo" argument to "r"
combine-diff.c: remove implicit dependency on the_index
...
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
Consider the following sequence of commands updating the split index
when the shared index contains a racily clean cache entry, i.e. an
entry whose cached stat data matches with the corresponding file in
the worktree and the cached mtime matches that of the index:
echo "cached content" >file
git update-index --split-index --add file
echo "dirty worktree" >file # size stays the same!
# ... wait ...
git update-index --add other-file
Normally, when a non-split index is updated, then do_write_index()
(the function responsible for writing all kinds of indexes, "regular",
split, and shared) recognizes racily clean cache entries, and writes
them with smudged stat data, i.e. with file size set to 0. When
subsequent git commands read the index, they will notice that the
smudged stat data doesn't match with the file in the worktree, and
then go on to check the file's content and notice its dirtiness.
In the above example, however, in the second 'git update-index'
prepare_to_write_split_index() decides which cache entries stored only
in the shared index should be replaced in the new split index. Alas,
this function never looks out for racily clean cache entries, and
since the file's stat data in the worktree hasn't changed since the
shared index was written, it won't be replaced in the new split index.
Consequently, do_write_index() doesn't even get this racily clean
cache entry, and can't smudge its stat data. Subsequent git commands
will then see that the index has more recent mtime than the file and
that the (not smudged) cached stat data still matches with the file in
the worktree, and, ultimately, will erroneously consider the file
clean.
Modify prepare_to_write_split_index() to recognize racily clean cache
entries, and mark them to be added to the split index. Note that
there are two places where it should check raciness: first those cache
entries that are only stored in the shared index, and then those that
have been copied by unpack_trees() from the shared index while it
constructed a new index. This way do_write_index() will get these
racily clean cache entries as well, and will then write them with
smudged stat data to the new split index.
This change makes all tests in 't1701-racy-split-index.sh' pass, so
flip the two 'test_expect_failure' tests to success. Also add the '#'
(as in nr. of trial) to those tests' description that were omitted
when the tests expected failure.
Note that after this change if the index is split when it contains a
racily clean cache entry, then a smudged cache entry will be written
both to the new shared and to the new split indexes. This doesn't
affect regular git commands: as far as they are concerned this is just
an entry in the split index replacing an outdated entry in the shared
index. It did affect a few tests in 't1700-split-index.sh', though,
because they actually check which entries are stored in the split
index; a previous patch in this series has already made the necessary
adjustments in 't1700'. And racily clean cache entries and index
splitting are rare enough to not worry about the resulting duplicated
smudged cache entries, and the additional complexity required to
prevent them is not worth it.
Several tests failed occasionally when the test suite was run with
'GIT_TEST_SPLIT_INDEX=yes'. Here are those that I managed to trace
back to this racy split index problem, starting with those failing
more frequently, with a link to a failing Travis CI build job for
each. The highlighted line [2] shows when the racy file was written,
which is not always in the failing test but in a preceeding setup
test.
t3903-stash.sh:
https://travis-ci.org/git/git/jobs/385542084#L5858
t4024-diff-optimize-common.sh:
https://travis-ci.org/git/git/jobs/386531969#L3174
t4015-diff-whitespace.sh:
https://travis-ci.org/git/git/jobs/360797600#L8215
t2200-add-update.sh:
https://travis-ci.org/git/git/jobs/382543426#L3051
t0090-cache-tree.sh:
https://travis-ci.org/git/git/jobs/416583010#L3679
There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and
others using 'lib-read-tree-m-3way.sh', but I couldn't confirm yet.
[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
branch 'nd/split-index', 2014-07-16).
[2] Note that those highlighted lines are in the 'after failure' fold,
and your browser might unhelpfully fold it up before you could
take a good look.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a helper function named is_writing_gitmodules_ok() to verify
that the .gitmodules file is safe to write.
The function name follows the scheme of is_staging_gitmodules_ok().
The two symbolic constants GITMODULES_INDEX and GITMODULES_HEAD are used
to get help from the C preprocessor in preventing typos, especially for
future users.
This is in preparation for a future change which teaches git how to read
.gitmodules from the index or from the current branch if the file is not
available in the working tree.
The rationale behind the check is that writing to .gitmodules requires
the file to be present in the working tree, unless a brand new
.gitmodules is being created (in which case the .gitmodules file would
not exist at all: neither in the working tree nor in the index or in the
current branch).
Expose the functionality also via a "submodule-helper config
--check-writeable" command, as git scripts may want to perform the check
before modifying submodules configuration.
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Signed-off-by: Junio C Hamano <gitster@pobox.com>