Simplify the code for moving members inside of an array and make it more
robust by using the helper macro MOVE_ARRAY. It calculates the size
based on the specified number of elements for us and supports NULL
pointers when that number is zero. Raw memmove(3) calls with NULL can
cause the compiler to (over-eagerly) optimize out later NULL checks.
This patch was generated with contrib/coccinelle/array.cocci and spatch
(Coccinelle).
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split index code did not honor core.sharedrepository setting
correctly.
* cc/shared-index-permfix:
t1700: make sure split-index respects core.sharedrepository
t1301: move modebits() to test-lib-functions.sh
read-cache: use shared perms when writing shared index
Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10)
write_shared_index() has been using mks_tempfile() to create the
temporary file that will become the shared index.
But even before that, it looks like the functions used to create this
file didn't call adjust_shared_perm(), which means that the shared
index file has always been created with 600 permissions regardless
of the shared permission settings.
Because of that, on repositories created with `git init --shared=all`
and using the split index feature, one gets an error like:
fatal: .git/sharedindex.a52f910b489bc462f187ab572ba0086f7b5157de: index file open failed: Permission denied
when another user performs any operation that reads the shared index.
Call adjust_shared_perm() on the temporary file created by
mks_tempfile() ourselves to adjust the permission bits.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The timestamp of the index file is now taken after the file is
closed, to help Windows, on which a stale timestamp is reported by
fstat() on a file that is opened for writing and data was written
but not yet closed.
* jh/close-index-before-stat:
read-cache: close index.lock in do_write_index
The timestamp of the index file is now taken after the file is
closed, to help Windows, on which a stale timestamp is reported by
fstat() on a file that is opened for writing and data was written
but not yet closed.
* jh/close-index-before-stat:
read-cache: close index.lock in do_write_index
When "git checkout", "git merge", etc. manipulates the in-core
index, various pieces of information in the index extensions are
discarded from the original state, as it is usually not the case
that they are kept up-to-date and in-sync with the operation on the
main index. The untracked cache extension is copied across these
operations now, which would speed up "git status" (as long as the
cache is properly invalidated).
* dt/unpack-save-untracked-cache-extension:
unpack-trees: preserve index extensions
Plug some leaks and updates internal API used to implement the
split index feature to make it easier to avoid such a leak in the
future.
* nd/split-index-unshare:
p3400: add perf tests for rebasing many changes
split-index: add and use unshare_split_index()
Make git checkout (and other unpack_tree operations) preserve the
untracked cache. This is valuable for two reasons:
1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.
2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.
[jes: backed out the watchman-specific parts]
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code to update the cache-tree has been tightened so that we won't
accidentally write out any 0{40} entry in the tree object.
* jk/no-null-sha1-in-cache-tree:
cache-tree: reject entries with null sha1
When split-index is being used, we have two cache_entry arrays in
index_state->cache[] and index_state->split_index->base->cache[].
index_state->cache[] may share the same entries with base->cache[] so
we can quickly determine what entries are shared. This makes memory
management tricky, we can't free base->cache[] until we know
index_state->cache[] does not point to any of those entries.
unshare_split_index() is added for this purpose, to find shared
entries and either duplicate them in index_state->cache[], or discard
them. Either way it should be safe to free base->cache[] after
unshare_split_index().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach do_write_index() to close the index.lock file
before getting the mtime and updating the istate.timestamp
fields.
On Windows, a file's mtime is not updated until the file is
closed. On Linux, the mtime is set after the last flush.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split-index code configuration code used an unsafe git_path()
function without copying its result out.
* cc/split-index-config:
read-cache: avoid using git_path() in freshen_shared_index()
"git checkout" that handles a lot of paths has been optimized by
reducing the number of unnecessary checks of paths in the
has_dir_name() function.
* jh/add-index-entry-optim:
read-cache: speed up has_dir_name (part 2)
read-cache: speed up has_dir_name (part 1)
read-cache: speed up add_index_entry during checkout
p0006-read-tree-checkout: perf test to time read-tree
read-cache: add strcmp_offset function
We generally disallow null sha1s from entering the index,
due to 4337b5856 (do not write null sha1s to on-disk index,
2012-07-28). However, we loosened that in 83bd7437c
(write_index: optionally allow broken null sha1s,
2013-08-27) so that tools like filter-branch could be used
to repair broken history.
However, we should make sure that these broken entries do
not get propagated into new trees. For most entries, we'd
catch them with the missing-object check (since presumably
the null sha1 does not exist in our object database). But
gitlink entries do not need reachability, so we may blindly
copy the entry into a bogus tree.
This patch rejects all null sha1s (with the same "invalid
entry" message that missing objects get) when building trees
from the index. It does so even for non-gitlinks, and even
when "write-tree" is given the --missing-ok flag. The null
sha1 is a special sentinel value that is already rejected in
trees by fsck; whether the object exists or not, it is an
error to put it in a tree.
Note that for this to work, we must also avoid reusing an
existing cache-tree that contains the null sha1. This patch
does so by just refusing to write out any cache tree when
the index contains a null sha1. This is blunter than we need
to be; we could just reject the subtree that contains the
offending entry. But it's not worth the complexity. The
behavior is unchanged unless you have a broken index entry,
and even then we'd refuse the whole index write unless the
emergency GIT_ALLOW_NULL_SHA1 is in use. And even then the
end result is only a performance drop (any write-tree will
have to generate the whole cache-tree from scratch).
The tests bear some explanation.
The existing test in t7009 doesn't catch this problem,
because our index-filter runs "git rm --cached", which will
try to rewrite the updated index and barf on the bogus
entry. So we never even make it to write-tree. The new test
there adds a noop index-filter, which does show the problem.
The new tests in t1601 are slightly redundant with what
filter-branch is doing under the hood in t7009. But as
they're much more direct, they're easier to reason about.
And should filter-branch ever change or go away, we'd want
to make sure that these plumbing commands behave sanely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing an interactive rebase in split-index mode,
the commit message that one should rework when squashing commits
can contain some garbage instead of the usual concatenation of
both of the commit messages.
The code uses git_path() to compute the shared index filename, and
passes it to check_and_freshen_file() as its argument; there is no
guarantee that the rotating pathname buffer passed as argument will
stay valid during the life of this call. Make our own copy before
calling the function and pass the copy as its argument to avoid this
risky pattern.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.
has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn. This can cause multiple binary searches
for each path.
During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.
This commit is part 2 of 2. This commit handles the
additional possible short-cuts as we look at each
sub-directory prefix.
The net-net gains for add_index_entry_with_check() and
both had_dir_name() commits are best seen for very
large repos.
Here are results for an INFLATED version of linux.git
with 1M files.
$ GIT_PERF_REPO=/mnt/test/linux_inflated.git/ ./run upstream/base HEAD ./p0006-read-tree-checkout.sh
Test upstream/base HEAD
0006.2: read-tree br_base br_ballast (1043893) 3.79(3.63+0.15) 2.68(2.52+0.15) -29.3%
0006.3: switch between br_base br_ballast (1043893) 7.55(6.58+0.44) 6.03(4.60+0.43) -20.1%
0006.4: switch between br_ballast br_ballast_plus_1 (1043893) 10.84(9.26+0.59) 8.44(7.06+0.65) -22.1%
0006.5: switch between aliases (1043893) 10.93(9.39+0.58) 10.24(7.04+0.63) -6.3%
Here are results for a synthetic repo with 4.2M files.
$ GIT_PERF_REPO=~/work/gfw/t/perf/repos/gen-many-files-10.4.3.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test HEAD~3 HEAD
0006.2: read-tree br_base br_ballast (4194305) 29.96(19.26+10.50) 23.76(13.42+10.12) -20.7%
0006.3: switch between br_base br_ballast (4194305) 56.95(36.08+16.83) 45.54(25.94+15.68) -20.0%
0006.4: switch between br_ballast br_ballast_plus_1 (4194305) 90.94(51.50+31.52) 78.22(39.39+30.70) -14.0%
0006.5: switch between aliases (4194305) 93.72(51.63+34.09) 77.94(39.00+30.88) -16.8%
Results for medium repos (like linux.git) are mixed and have
more variance (probably do to disk IO unrelated to this test.
$ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test HEAD~3 HEAD
0006.2: read-tree br_base br_ballast (57994) 0.25(0.21+0.03) 0.20(0.17+0.02) -20.0%
0006.3: switch between br_base br_ballast (57994) 10.67(6.06+2.92) 10.51(5.94+2.91) -1.5%
0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.59(0.47+0.16) 0.52(0.40+0.13) -11.9%
0006.5: switch between aliases (57994) 0.59(0.44+0.17) 0.51(0.38+0.14) -13.6%
$ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test HEAD~3 HEAD
0006.2: read-tree br_base br_ballast (57994) 0.24(0.21+0.02) 0.21(0.18+0.02) -12.5%
0006.3: switch between br_base br_ballast (57994) 10.42(5.98+2.91) 10.66(5.86+3.09) +2.3%
0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.59(0.49+0.13) 0.53(0.37+0.16) -10.2%
0006.5: switch between aliases (57994) 0.59(0.43+0.17) 0.50(0.37+0.14) -15.3%
Results for smaller repos (like git.git) are not significant.
$ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test HEAD~3 HEAD
0006.2: read-tree br_base br_ballast (3043) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0%
0006.3: switch between br_base br_ballast (3043) 0.31(0.17+0.11) 0.29(0.19+0.08) -6.5%
0006.4: switch between br_ballast br_ballast_plus_1 (3043) 0.03(0.02+0.00) 0.03(0.02+0.00) +0.0%
0006.5: switch between aliases (3043) 0.03(0.02+0.00) 0.03(0.02+0.00) +0.0%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.
has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn. This can cause multiple binary searches
for each path.
During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.
This commit is part 1 of 2. This commit handles the top
of has_dir_name() and the trivial optimization.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach add_index_entry_with_check() to see if the path
of the new item is greater than the last path in the
index array before attempting to search for it.
During checkout, merge_working_tree() populates the new
index in sorted order, so this change will save a binary
lookups per file. This preserves the original behavior
but simply checks the last element before starting the
search.
This helps performance on very large repositories.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add strcmp_offset() function to also return the offset of the
first change.
Add unit test and helper to verify.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.
Teach fsck to force this verification.
The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.
These effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.
* cc/split-index-config: (22 commits)
Documentation/git-update-index: explain splitIndex.*
Documentation/config: add splitIndex.sharedIndexExpire
read-cache: use freshen_shared_index() in read_index_from()
read-cache: refactor read_index_from()
t1700: test shared index file expiration
read-cache: unlink old sharedindex files
config: add git_config_get_expiry() from gc.c
read-cache: touch shared index files when used
sha1_file: make check_and_freshen_file() non static
Documentation/config: add splitIndex.maxPercentChange
t1700: add tests for splitIndex.maxPercentChange
read-cache: regenerate shared index if necessary
config: add git_config_get_max_percent_split_change()
Documentation/git-update-index: talk about core.splitIndex config var
Documentation/config: add information for core.splitIndex
t1700: add tests for core.splitIndex
update-index: warn in case of split-index incoherency
read-cache: add and then use tweak_split_index()
split-index: add {add,remove}_split_index() functions
config: add git_config_get_split_index()
...
This way a share index file will not be garbage collected if
we still read from an index it is based from.
As we need to read the current index before creating a new
one, the tests have to be adjusted, so that we don't expect
an old shared index file to be deleted right away when we
create a new one.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It looks better and is simpler to review when we don't compute
the same things many times in the function.
It will also help make the following commit simpler.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Everytime split index is turned on, it creates a "sharedindex.XXXX"
file in the git directory. This change makes sure that shared index
files that haven't been used for a long time are removed when a new
shared index file is created.
The new "splitIndex.sharedIndexExpire" config variable is created
to tell the delay after which an unused shared index file can be
deleted. It defaults to "2.weeks.ago".
A previous commit made sure that each time a split index file is
created the mtime of the shared index file it references is updated.
This makes sure that recently used shared index file will not be
deleted.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a split-index file is created, let's update the mtime of the
shared index file that the split-index file is referencing.
In a following commit we will make shared index file expire
depending on their mtime, so updating the mtime makes sure that
the shared index file will not be deleted soon.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new split-index and there is a big number of cache
entries in the split-index compared to the shared index, it is a
good idea to regenerate the shared index.
By default when the ratio reaches 20%, we will push back all
the entries from the split-index into a new shared index file
instead of just creating a new split-index file.
The threshold can be configured using the
"splitIndex.maxPercentChange" config variable.
We need to adjust the existing tests in t1700 by setting
"splitIndex.maxPercentChange" to 100 at the beginning of t1700,
as the existing tests are assuming that the shared index is
regenerated only when `git update-index --split-index` is used.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will make us use the split-index feature or not depending
on the value of the "core.splitIndex" config variable.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do this by moving the existing documentation from
read-cache.c to cache.h.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve constness of the index_state parameter to the
'read_blob_data_from_index' function.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.
* jc/git-open-cloexec:
sha1_file: stop opening files with O_NOATIME
git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
git_open(): untangle possible NOATIME and CLOEXEC interactions
Callers of the hold_locked_index() function pass 0 when they want to
prepare to write a new version of the index file without wishing to
die or emit an error message when the request fails (e.g. somebody
else already held the lock), and pass 1 when they want the call to
die upon failure.
This option is called LOCK_DIE_ON_ERROR by the underlying lockfile
API, and the hold_locked_index() function translates the paramter to
LOCK_DIE_ON_ERROR when calling the hold_lock_file_for_update().
Replace these hardcoded '1' with LOCK_DIE_ON_ERROR and stop
translating. Callers other than the ones that are replaced with
this change pass '0' to the function; no behaviour change is
intended with this patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Among the callers of hold_locked_index() that passes 0:
- diff.c::refresh_index_quietly() at the end of "git diff" is an
opportunistic update; it leaks the lockfile structure but it is
just before the program exits and nobody should care.
- builtin/describe.c::cmd_describe(),
builtin/commit.c::cmd_status(),
sequencer.c::read_and_refresh_cache() are all opportunistic
updates and they are OK.
- builtin/update-index.c::cmd_update_index() takes a lock upfront
but we may end up not needing to update the index (i.e. the
entries may be fully up-to-date), in which case we do not need to
issue an error upon failure to acquire the lock. We do diagnose
and die if we indeed need to update, so it is OK.
- wt-status.c::require_clean_work_tree() IS BUGGY. It asks
silence, does not check the returned value. Compare with
callsites like cmd_describe() and cmd_status() to notice that it
is wrong to call update_index_if_able() unconditionally.
The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME. A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.
Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.
Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor. open(2) itself does not
cause atime to be updated according to Linus [*1*].
The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.
*1* <CA+55aFw83E+zOd+z5h-CA-3NhrLjVr-anL6pubrSWttYx3zu8g@mail.gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes "convert: add filter.<driver>.process option" (edcc8581) on
Windows.
Consider the case of a file that requires filtering and is present in
branch A but not in branch B. If A is the current HEAD and we checkout B
then the following happens:
1. ce_compare_data() opens the file
2. index_fd() detects that the file requires to run a clean filter and
calls index_stream_convert_blob()
4. index_stream_convert_blob() calls convert_to_git_filter_fd()
5. convert_to_git_filter_fd() calls apply_filter() which creates a
new long running filter process (in case it is the first file
of this kind to be filtered)
6. The new filter process inherits all file handles. This is the
default on Linux/OSX and is explicitly defined in the
`CreateProcessW` call in `mingw.c` on Windows.
7. ce_compare_data() closes the file
8. Git unlinks the file as it is not present in B
The unlink operation does not work on Windows because the filter process
has still an open handle to the file. On Linux/OSX the unlink operation
succeeds but the file descriptors still leak into the child process.
Fix this problem by opening files in read-cache with the O_CLOEXEC flag
to ensure that the file descriptor does not remain open in a newly
spawned process similar to 05d1ed6148 ("mingw: ensure temporary file
handles are not inherited by child processes", 2016-08-22).
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.
* tg/add-chmod+x-fix:
t3700-add: do not check working tree file mode without POSIXPERM
t3700-add: create subdirectory gently
add: modify already added files when --chmod is given
read-cache: introduce chmod_index_entry
update-index: add test for chmod flags
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.
As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As there are chmod options for both add and update-index, introduce a
new chmod_index_entry function to do the work. Use it in update-index,
while it will be used in add in the next patch.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.
* jc/renormalize-merge-kill-safer-crlf:
merge: avoid "safer crlf" during recording of merge results
convert: unify the "auto" handling of CRLF
When merge_recursive() decides what the correct blob object merge
result for a path should be, it uses update_file_flags() helper
function to write it out to a working tree file and then calls
add_cacheinfo(). The add_cacheinfo() function in turn calls
make_cache_entry() to create a new cache entry to replace the
higher-stage entries for the path that represents the conflict.
The make_cache_entry() function calls refresh_cache_entry() to fill
in the cached stat information. To mark a cache entry as
up-to-date, the data is re-read from the file in the working tree,
and goes through convert_to_git() conversion to be compared with the
blob object name the new cache entry records.
It is important to note that this happens while the higher-stage
entries, which are going to be replaced with the new entry, are
still in the index. Unfortunately, the convert_to_git() conversion
has a misguided "safer crlf" mechanism baked in, and looks at the
existing cache entry for the path to decide how to convert the
contents in the working tree file. If our side (i.e. stage#2)
records a text blob with CRLF in it, even when the system is
configured to record LF in blobs and convert them to CRLF upon
checkout (and back to LF upon checkin), the "safer crlf" mechanism
stops us doing so.
This especially poses a problem during a renormalizing merge, where
the merge result for the path is computed by first "normalizing" the
blobs involved in the merge by using convert_to_working_tree()
followed by convert_to_git() with "safer crlf" disabled. The merge
result that is computed correctly and fed to add_cacheinfo() via
update_file_flags() does _not_ match what refresh_cache_entry() sees
by converting the working tree file via convert_to_git().
We can work this around by not refreshing the new cache entry in
make_cache_entry() called by add_cacheinfo(). After add_cacheinfo()
adds the new entry, we can call refresh_cache_entry() on that,
knowing that addition of this new cache entry would have removed the
stale cache entries that had CRLF in stage #2 that were carried over
before the renormalizing merge started and will not interfere with
the correct recording of the result.
The test update was taken from a series by Torsten Bögershausen
that attempted to fix this with a different approach.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
The executable bit will not be detected (and therefore will not be
set) for paths in a repository with `core.filemode` set to false,
though the users may still wish to add files as executable for
compatibility with other users who _do_ have `core.filemode`
functionality. For example, Windows users adding shell scripts may
wish to add them as executable for compatibility with users on
non-Windows.
Although this can be done with a plumbing command
(`git update-index --add --chmod=+x foo`), teaching the `git-add`
command allows users to set a file executable with a command that
they're already familiar with.
Signed-off-by: Edward Thomson <ethomson@edwardthomson.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we know that mtime on directory as given by the environment
is usable for the purpose of untracked cache, we may want the
untracked cache to be always used without any mtime test or
kernel name check being performed.
Also when we know that mtime is not usable for the purpose of
untracked cache, for example because the repo is shared over a
network file system, we may want the untracked-cache to be
automatically removed from the index.
Allow the user to express such preference by setting the
'core.untrackedCache' configuration variable, which can take
'keep', 'false', or 'true' and default to 'keep'.
When read_index_from() is called, it now adds or removes the
untracked cache in the index to respect the value of this
variable. So it does nothing if the value is `keep` or if the
variable is unset; it adds the untracked cache if the value is
`true`; and it removes the cache if the value is `false`.
`git update-index --[no-|force-]untracked-cache` still adds the
untracked cache to, or removes it, from the index, but this
shows a warning if it goes against the value of
core.untrackedCache, because the next time the index is read
the untracked cache will be added or removed if the
configuration is set to do so.
Also `--untracked-cache` used to check that the underlying
operating system and file system change `st_mtime` field of a
directory if files are added or deleted in that directory. But
because those tests take a long time, `--untracked-cache` no
longer performs them. Instead, there is now
`--test-untracked-cache` to perform the tests. This change
makes `--untracked-cache` the same as `--force-untracked-cache`.
This last change is backward incompatible and should be
mentioned in the release notes.
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
read-cache: Duy'sfixup
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.
* nd/ita-cleanup:
grep: make it clear i-t-a entries are ignored
add and use a convenience macro ce_intent_to_add()
blame: remove obsolete comment
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
After switching to use the tempfile module in commit f6ecc62d
(write_shared_index(): use tempfile module), no declarations from
sigchain.h are used in read-cache.c anymore. Thus, remove the #include.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.
During merges, we free entries that we no longer need in the
destination index. But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.
To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry. This entails some refactoring of code
that expects the cache_entry.
Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.
Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "lockfile" API has been rebuilt on top of a new "tempfile" API.
* mh/tempfile:
credential-cache--daemon: use tempfile module
credential-cache--daemon: delete socket from main()
gc: use tempfile module to handle gc.pid file
lock_repo_for_gc(): compute the path to "gc.pid" only once
diff: use tempfile module
setup_temporary_shallow(): use tempfile module
write_shared_index(): use tempfile module
register_tempfile(): new function to handle an existing temporary file
tempfile: add several functions for creating temporary files
prepare_tempfile_object(): new function, extracted from create_tempfile()
tempfile: a new module for handling temporary files
commit_lock_file(): use get_locked_file_path()
lockfile: add accessor get_lock_file_path()
lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
create_bundle(): duplicate file descriptor to avoid closing it twice
lockfile: move documentation to lockfile.h and lockfile.c
We are about to move those members, so change client code to read them
through accessor functions.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for the 'untracked-cache' topic that is already in 'master'.
* nd/untracked-cache:
read-cache: fix untracked cache invalidation when split-index is used
The configuration reader/writer uses mmap(2) interface to access
the files; when we find a directory, it barfed with "Out of memory?".
* jk/diagnose-config-mmap-failure:
xmmap(): drop "Out of memory?"
config.c: rewrite ENODEV into EISDIR when mmap fails
config.c: avoid xmmap error messages
config.c: fix mmap leak when writing config
read-cache.c: drop PROT_WRITE from mmap of index
Before this change, t7063.17 fails. The actual action though happens at
t7063.16 where the entry "two" is added back to index after being
removed in the .13. Here we expect a directory invalidate at .16 and
none at .17 where untracked cache is refreshed. But things do not go as
expected when GIT_TEST_SPLIT_INDEX is set.
The different behavior that happens at .16 when split index is used: the
entry "two", when deleted at .13, is simply marked "deleted". When .16
executes, the entry resurfaces from the version in base index. This
happens in merge_base_index() where add_index_entry() is called to add
"two" back from the base index.
This is where the bug comes from. The add_index_entry() is called with
ADD_CACHE_KEEP_CACHE_TREE flag because this version of "two" is not new,
it does not break either cache-tree or untracked cache. The code should
check this flag and not invalidate untracked cache. This causes a second
invalidation violates test expectation. The fix is obvious.
Noticed-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, git's in-memory representation of a cache
entry actually pointed to the mmap'd on-disk data. So in
520fc24 (Allow writing to the private index file mapping.,
2005-04-26), we specified PROT_WRITE so that we could tweak
the entries while we run (in our own MAP_PRIVATE copy-on-write
version, of course).
Later, 7a51ed6 (Make on-disk index representation separate
from in-core one, 2008-01-14) stopped doing this; we copy
the data into our in-core representation, and then drop the
mmap immediately. We can therefore drop the PROT_WRITE flag.
It's probably not hurting anything as it is, but it's
potentially confusing.
Note that we could also mark the mapping as "const" to
verify that we never write to it. However, we don't
typically do that for our other maps, as it then requires
casting to munmap() it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.
* nd/untracked-cache: (24 commits)
git-status.txt: advertisement for untracked cache
untracked cache: guard and disable on system changes
mingw32: add uname()
t7063: tests for untracked cache
update-index: test the system before enabling untracked cache
update-index: manually enable or disable untracked cache
status: enable untracked cache
untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
untracked cache: mark index dirty if untracked cache is updated
untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
untracked cache: avoid racy timestamps
read-cache.c: split racy stat test to a separate function
untracked cache: invalidate at index addition or removal
untracked cache: load from UNTR index extension
untracked cache: save to an index extension
ewah: add convenient wrapper ewah_serialize_strbuf()
untracked cache: don't open non-existent .gitignore
untracked cache: mark what dirs should be recursed/saved
untracked cache: record/validate dir mtime and reuse cached output
untracked cache: make a wrapper around {open,read,close}dir()
...
* sb/leaks:
http: release the memory of a http pack request as well
read-cache: fix memleak
add_to_index(): free unused cache-entry
commit.c: fix a memory leak
http-push: remove unneeded cleanup
merge-recursive: fix memleaks
merge-blobs.c: fix a memleak
builtin/apply.c: fix a memleak
update-index: fix a memleak
read-cache: free cache entry in add_to_index in case of early return
The split-index mode introduced at v2.3.0-rc0~41 was broken in the
codepath to protect us against a broken reimplementation of Git
that writes an invalid index with duplicated index entries, etc.
* tg/fix-check-order-with-split-index:
read-cache: fix reading of split index
`ce` is allocated in make_cache_entry and should be freed if it is not
used any more. refresh_cache_entry as a wrapper around refresh_cache_ent
will either return
- the `ce` given as the parameter, when it was up-to-date;
- a new updated cache entry which is allocated to new memory; or
- a NULL when refreshing failed.
In the latter two cases, the original cache-entry `ce` is not used
and needs to be freed. The rule can be expressed as "if the return
value from refresh is different from the original ce, ce is no
longer used."
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate a cache-entry pretty early in the function and then
decide either not to do anything when we are pretending to add, or
add it and then get an error (another possibility is obviously to
succeed).
When pretending or failing to add, we forgot to free the
cache-entry.
Noticed during a discussion on Stefan's patch to change the coding
style without fixing the issue ;-)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This frees `ce` would be leaking in the error path.
Additionally a free is moved towards the return. This helps code
readability as we often have this pattern of freeing resources just
before return/exit and not in between the code.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split index extension uses ewah bitmaps to mark index entries as
deleted, instead of removing them from the index directly. This can
result in an on-disk index, in which entries of stage #0 and higher
stages appear, which are removed later when the index bases are merged.
15999d0 read_index_from(): catch out of order entries when reading an
index file introduces a check which checks if the entries are in order
after each index entry is read in do_read_index. This check may however
fail when a split index is read.
Fix this by moving checking the index after we know there is no split
index or after the split index bases are successfully merged instead.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a directory is updated within the same second that its timestamp
is last saved, we cannot realize the directory has been updated by
checking timestamps. Assume the worst (something is update). See
29e4d36 (Racy GIT - 2005-12-20) for more information.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally we should implement untracked_cache_remove_from_index() and
untracked_cache_add_to_index() so that they update untracked cache
right away instead of invalidating it and wait for read_directory()
next time to deal with it. But that may need some more work in
unpack-trees.c. So stay simple as the first step.
The new call in add_index_entry_with_check() may look strange because
new calls usually stay close to cache_tree_invalidate_path(). We do it
a bit later than c_t_i_p() in this function because if it's about
replacing the entry with the same name, we don't care (but cache-tree
does).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"update-index --refresh" used to leak when an entry cannot be
refreshed for whatever reason.
* sb/plug-leak-in-make-cache-entry:
read-cache.c: free cache entry when refreshing fails
This fixes a memory leak when building the cache entries as
refresh_cache_entry may decide to return NULL, but it does not
free the cache entry structure which was passed in as an argument.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow ".git" to enter into the index as a path
component, because checking out the result to the working
tree may causes confusion for subsequent git commands.
However, on case-insensitive file systems, ".Git" or ".GIT"
is the same. We should catch and prevent those, too.
Note that technically we could allow this for repos on
case-sensitive filesystems. But there's not much point. It's
unlikely that anybody cares, and it creates a repository
that is unexpectedly non-portable to other systems.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lockfile.c contains the general API for locking any file. Code
specifically about the index file doesn't belong here.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit_locked_index(), when writing to an alternate index file,
duplicates (poorly) the code in commit_lock_file(). And anyway, it
shouldn't have to know so much about the internal workings of lockfile
objects. So extract a new function commit_lock_file_to() that does the
work common to the two functions, and call it from both
commit_lock_file() and commit_locked_index().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value. (That will be fixed in a moment.)
Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file. But lock_file objects are often reused. By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state. And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.
This was formerly not the case, because part of the state was encoded
by setting lk->filename to the empty string vs. a valid filename. It
is wrong to assume that this string can be updated atomically; for
example, even
strcpy(lk->filename, value)
is unsafe. But the old code was even more reckless; for example,
strcpy(lk->filename, path);
if (!(flags & LOCK_NODEREF))
resolve_symlink(lk->filename, max_path_len);
strcat(lk->filename, ".lock");
During the call to resolve_symlink(), lk->filename contained the name
of the file that was being locked, not the name of the lockfile. If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!
We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.
So, instead of using the filename field to determine whether the
lock_file object is active, add a new field "lock_file::active" for
this purpose. Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A broken reimplementation of Git could write an invalid index that
records both stage #0 and higher stage entries for the same path.
Notice and reject such an index, as there is no sensible fallback
(we do not know if the broken tool wanted to resolve and forgot to
remove higher stage entries, or if it wanted to unresolve and
forgot to remove the stage#0 entry).
* jp/index-with-corrupt-stages:
read_index_unmerged(): remove unnecessary loop index adjustment
read_index_from(): catch out of order entries when reading an index file
Update git_config() users with callback functions for a very narrow
scope with calls to config-set API that lets us query a single
variable.
* ta/config-set-2:
builtin/apply.c: replace `git_config()` with `git_config_get_string_const()`
merge-recursive.c: replace `git_config()` with `git_config_get_int()`
ll-merge.c: refactor `read_merge_config()` to use `git_config_string()`
fast-import.c: replace `git_config()` with `git_config_get_*()` family
branch.c: replace `git_config()` with `git_config_get_string()
alias.c: replace `git_config()` with `git_config_get_string()`
imap-send.c: replace `git_config()` with `git_config_get_*()` family
pager.c: replace `git_config()` with `git_config_get_value()`
builtin/gc.c: replace `git_config()` with `git_config_get_*()` family
rerere.c: replace `git_config()` with `git_config_get_*()` family
fetchpack.c: replace `git_config()` with `git_config_get_*()` family
archive.c: replace `git_config()` with `git_config_get_bool()` family
read-cache.c: replace `git_config()` with `git_config_get_*()` family
http-backend.c: replace `git_config()` with `git_config_get_bool()` family
daemon.c: replace `git_config()` with `git_config_get_bool()` family
"git add x" where x that used to be a directory has become a
symbolic link to a directory misbehaved.
* rs/refresh-beyond-symlink:
read-cache: check for leading symlinks when refreshing index