This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)
This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.
Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.
The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.
* kb/fast-hashmap:
name-hash: retire unused index_name_exists()
hashmap.h: use 'unsigned int' for hash-codes everywhere
test-hashmap.c: drop unnecessary #includes
.gitignore: test-hashmap is a generated file
read-cache.c: fix memory leaks caused by removed cache entries
builtin/update-index.c: cleanup update_one
fix 'git update-index --verbose --again' output
remove old hash.[ch] implementation
name-hash.c: remove cache entries instead of marking them CE_UNHASHED
name-hash.c: use new hash map implementation for cache entries
name-hash.c: remove unreferenced directory entries
name-hash.c: use new hash map implementation for directories
diffcore-rename.c: use new hash map implementation
diffcore-rename.c: simplify finding exact renames
diffcore-rename.c: move code around to prepare for the next patch
buitin/describe.c: use new hash map implementation
add a hashtable implementation that supports O(1) removal
submodule: don't access the .gitmodules cache entry after removing it
Some buffers created with PATH_MAX length are not checked when being
written, and can overflow if PATH_MAX is not big enough to hold the
path.
Replace those buffers by strbufs so that their size is automatically
grown if necessary. They are created as static local variables to avoid
reallocating memory on each call. Note that prefix_filename() returns
this static buffer so each callers should copy or use the string
immediately (this is currently true).
Reported-by: Wataru Noguchi <wnoguchi.0727@gmail.com>
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a rather longstanding corner-case bug in twoway "reset to
there" merge, which is most often seen in "git am --abort".
* jk/two-way-merge-corner-case-fix:
t1005: add test for "read-tree --reset -u A B"
t1005: reindent
unpack-trees: fix "read-tree -u --reset A B" with conflicted index
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.
The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.
Keep the CE_HASHED flag to prevent adding entries twice.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be
removed, as ce->next (now ce->ent.next) is always properly initialized in
name-hash.c::hash_index_entry.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call "read-tree --reset -u HEAD ORIG_HEAD", the first thing we
do with the index is to call read_cache_unmerged. Originally that
would read the index, leaving aside any unmerged entries. However, as
of d1a43f2 (reset --hard/read-tree --reset -u: remove unmerged new
paths, 2008-10-15), it actually creates a new cache entry to serve as
a placeholder, so that we later know to update the working tree.
However, we later noticed that the sha1 of that unmerged entry was
just copied from some higher stage, leaving you with random content in
the index. That was fixed by e11d7b5 ("reset --merge": fix unmerged
case, 2009-12-31), which instead puts the null sha1 into the newly
created entry, and sets a CE_CONFLICTED flag. At the same time, it
teaches the unpack-trees machinery to pay attention to this flag, so
that oneway_merge throws away the current value.
However, it did not update the code paths for twoway_merge, which is
where we end up in the two-way read-tree with --reset. We notice that
the HEAD and ORIG_HEAD versions are the same, and say "oh, we can just
reuse the current version". But that's not true. The current version
is bogus.
Notice this case and make sure we do not keep the bogus entry; either
we do not have that path in the tree we are moving to (i.e. remove
it), or we want to have the cache entry we created for the tree we are
moving to (i.e. resolve by explicitly saying the "newtree" version is
what we want).
[jc: this is from the almost year-old $gmane/212316]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.
Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before overwriting the destination index, first let's discard its
contents.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Tested-by: Лежанкин Иван <abyss.7@gmail.com> wrote:
Reviewed-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If o->merge is set, the struct traverse_info member conflicts is shifted
left in unpack_callback, then passed through traverse_trees_recursive
to unpack_nondirectories, where it is shifted right before use. Stop
the shifting and just pass the conflict bit mask as is. Rename the
member to df_conflicts to prove that it isn't used anywhere else.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge functions duplicate entries as needed and they don't free
them. Release them in unpack_nondirectories, the same function
where they were allocated, after we're done.
As suggested by Felipe, use the same loop style (zero-based for loop)
for freeing as for allocating.
Improved-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the type merge_fn_t to accept the array of cache_entry pointers
as const pointers to const pointers. This documents the fact that the
merge functions don't modify the cache_entry contents or replace any of
the pointers in the array.
Only a single cast is necessary in unpack_nondirectories because adding
two const modifiers at once is not allowed in C. The cast is safe in
that it doesn't mask any modfication; call_unpack_fn only needs the
array for reading.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add const to struct cache_entry pointers throughout the tree which are
only used for reading. This allows callers to pass in const pointers.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Duplicate the merge entry right away and work with that instead of
modifying the entry we got and duplicating it only at the end of
the function. Then mark that pointer const to document that we
don't modify the referenced cache_entry.
This change is safe because all existing merge functions call
merged_entry just before returning (or not at all), i.e. they don't
care about changes to the referenced cache_entry after the call.
unpack_nondirectories and unpack_index_entry, which call the merge
functions through call_unpack_fn, aren't interested in such changes
neither.
The change complicates merged_entry a bit because we have to free the
copy if we error out, but allows callers to pass a const pointer.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we're add it, mark the struct cache_entry pointer of add_entry
const because we only read from it and this allows callers to pass in
const pointers.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new command "git check-ignore" for debugging .gitignore
files.
The variable names may want to get cleaned up but that can be done
in-tree.
* as/check-ignore:
clean.c, ls-files.c: respect encapsulation of exclude_list_groups
t0008: avoid brace expansion
add git-check-ignore sub-command
setup.c: document get_pathspec()
add.c: extract new die_if_path_beyond_symlink() for reuse
add.c: extract check_path_for_gitlink() from treat_gitlinks() for reuse
pathspec.c: rename newly public functions for clarity
add.c: move pathspec matchers into new pathspec.c for reuse
add.c: remove unused argument from validate_pathspec()
dir.c: improve docs for match_pathspec() and match_pathspec_depth()
dir.c: provide clear_directory() for reclaiming dir_struct memory
dir.c: keep track of where patterns came from
dir.c: use a single struct exclude_list per source of excludes
Conflicts:
builtin/ls-files.c
dir.c
Refactor and generally clean up the directory traversal API
implementation.
* as/dir-c-cleanup:
dir.c: rename free_excludes() to clear_exclude_list()
dir.c: refactor is_path_excluded()
dir.c: refactor is_excluded()
dir.c: refactor is_excluded_from_list()
dir.c: rename excluded() to is_excluded()
dir.c: rename excluded_from_list() to is_excluded_from_list()
dir.c: rename path_excluded() to is_path_excluded()
dir.c: rename cryptic 'which' variable to more consistent name
Improve documentation and comments regarding directory traversal API
api-directory-listing.txt: update to match code
Previously each exclude_list could potentially contain patterns
from multiple sources. For example dir->exclude_list[EXC_FILE]
would typically contain patterns from .git/info/exclude and
core.excludesfile, and dir->exclude_list[EXC_DIRS] could contain
patterns from multiple per-directory .gitignore files during
directory traversal (i.e. when dir->exclude_stack was more than
one item deep).
We split these composite exclude_lists up into three groups of
exclude_lists (EXC_CMDL / EXC_DIRS / EXC_FILE as before), so that each
exclude_list now contains patterns from a single source. This will
allow us to cleanly track the origin of each pattern simply by adding
a src field to struct exclude_list, rather than to struct exclude,
which would make memory management of the source string tricky in the
EXC_DIRS case where its contents are dynamically generated.
Similarly, by moving the filebuf member from struct exclude_stack to
struct exclude_list, it allows us to track and subsequently free
memory buffers allocated during the parsing of all exclude files,
rather than only tracking buffers allocated for files in the EXC_DIRS
group.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is clearer to use a 'clear_' prefix for functions which empty
and deallocate the contents of a data structure without freeing
the structure itself, and a 'free_' prefix for functions which
also free the structure itself.
http://article.gmane.org/gmane.comp.version-control.git/206128
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue adopting clearer names for exclude functions. This 'is_*'
naming pattern for functions returning booleans was discussed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Also adjust their callers as necessary.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start adopting clearer names for exclude functions. This 'is_*'
naming pattern for functions returning booleans was agreed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although the subject line of 613f027 (read-tree -u one-way merge fix
to check out locally modified paths., 2006-05-15) mentions "read-tree
-u", it did not seem to check whether -u was in effect. Not checking
whether -u is in effect makes e.g. "read-tree --reset" lstat() the
worktree, even though the worktree stat should not matter for that
operation.
This speeds up e.g. "git reset" a little on the linux-2.6 repo (best
of five, warm cache):
Before After
real 0m0.288s 0m0.233s
user 0m0.190s 0m0.150s
sys 0m0.090s 0m0.080s
Signed-off-by: Martin von Zweigbergk <martinvonz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split lower bits of ce_flags field and creates a new ce_namelen
field in the in-core index structure.
* tg/ce-namelen-field:
Strip namelen out of ce_flags into a ce_namelen field
"git ls-files --exclude=t -i" did not consider anything under t/ as
excluded, as it did not pay attention to exclusion of leading paths
while walking the index. Other two users of excluded() are also
updated.
* jc/ls-files-i-dir:
dir.c: make excluded() file scope static
unpack-trees.c: use path_excluded() in check_ok_to_remove()
builtin/add.c: use path_excluded()
path_excluded(): update API to less cache-entry centric
ls-files -i: micro-optimize path_excluded()
ls-files -i: pay attention to exclusion of leading paths
Strip the name length from the ce_flags field and move it
into its own ce_namelen field in struct cache_entry. This
will both give us a tiny bit of a performance enhancement
when working with long pathnames and is a refactoring for
more readability of the code.
It enhances readability, by making it more clear what
is a flag, and where the length is stored and make it clear
which functions use stages in comparisions and which only
use the length.
It also makes CE_NAMEMASK private, so that users don't
mistakenly write the name length in the flags.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace strlen(ce->name) with ce_namelen() in a couple
of places which gives us some additional bits of
performance.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git ls-files --exclude=t -i" did not consider anything under t/
as excluded, as it did not pay attention to exclusion of leading
paths while walking the index. Other two users of excluded() are
also updated.
* jc/ls-files-i-dir:
dir.c: make excluded() file scope static
unpack-trees.c: use path_excluded() in check_ok_to_remove()
builtin/add.c: use path_excluded()
path_excluded(): update API to less cache-entry centric
ls-files -i: micro-optimize path_excluded()
ls-files -i: pay attention to exclusion of leading paths
This function is responsible for determining if a path that is not
tracked is ignored and allow "checkout" to overwrite it as needed.
It used excluded() without checking if higher level directory in the
path is ignored; correct it to use path_excluded() for this check.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* There are uses of lower-level interface excluded_from_list() in
the codepath for narrow-checkout hack; they are supposed to be
already checking each level as they descend, and are not touched
with this patch.
By Jens Lehmann (1) and Johannes Sixt (1)
* maint:
Consistently use "superproject" instead of "supermodule"
t3404: begin "exchange commits with -p" test with correct preconditions
We fairly consistently say "superproject" and never "supermodule" these
days. But there are seven occurrences of "supermodule" left in the current
work tree. Three appear in Release Notes for 1.5.3 and 1.7.7, three in
test names and one in a C-code comment.
Replace all occurrences of "supermodule" outside of the Release Notes
(which shouldn't be changed after the fact) with "superproject" for
consistency.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many error/warning messages had extra trailing newlines that are
unnecessary.
By Pete Wyckoff
* pw/message-cleanup:
remove blank filename in error message
remove superfluous newlines in error messages
Trivially shrinks the on-disk size of the index file to save both I/O and
checksum overhead.
The topic should give a solid base to build on further updates, with the
code refactoring in its earlier parts, and the backward compatibility
mechanism in its later parts.
* jc/index-v4:
index-v4: document the entry format
unpack-trees: preserve the index file version of original
update-index: upgrade/downgrade on-disk index version
read-cache.c: write prefix-compressed names in the index
read-cache.c: read prefix-compressed names in index on-disk version v4
read-cache.c: move code to copy incore to ondisk cache to a helper function
read-cache.c: move code to copy ondisk to incore cache to a helper function
read-cache.c: report the header version we do not understand
read-cache.c: make create_from_disk() report number of bytes it consumed
read-cache.c: allow unaligned mapping of the index file
cache.h: hide on-disk index details
varint: make it available outside the context of pack
The error handling routines add a newline. Remove
the duplicate ones in error messages.
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise "git checkout $other_branch" (or even "git checkout HEAD")
would end up writing the index out in the default format.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The allocations made by unpack_nondirectories() using create_ce_entry()
are never freed.
In the non-merge case, we duplicate them using add_entry() and later
only look at the first allocated element (src[0]), perhaps even only
by mistake. Split out the actual addition from add_entry() into the
new helper do_add_entry() and call this non-duplicating function
instead of add_entry() to avoid the leak.
Valgrind reports this for the command "git archive v1.7.9" without
the patch:
==13372== LEAK SUMMARY:
==13372== definitely lost: 230,986 bytes in 2,325 blocks
==13372== indirectly lost: 0 bytes in 0 blocks
==13372== possibly lost: 98 bytes in 1 blocks
==13372== still reachable: 2,259,198 bytes in 3,243 blocks
==13372== suppressed: 0 bytes in 0 blocks
And with the patch applied:
==13375== LEAK SUMMARY:
==13375== definitely lost: 65 bytes in 1 blocks
==13375== indirectly lost: 0 bytes in 0 blocks
==13375== possibly lost: 0 bytes in 0 blocks
==13375== still reachable: 2,364,417 bytes in 3,245 blocks
==13375== suppressed: 0 bytes in 0 blocks
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
src[0] points to the index entry in the merge case and to the first
tree to unpack in the non-merge case. We only want to mark the index
entry, so check first if we're merging.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tree_entry_len() does not simply take two random arguments and return
a tree length. The two pointers must point to a tree item structure,
or struct name_entry. Passing random pointers will return incorrect
value.
Force callers to pass struct name_entry instead of two pointers (with
hope that they don't manually construct struct name_entry themselves)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/maint-sparse-errors:
Add explanation why we do not allow to sparse checkout to empty working tree
sparse checkout: show error messages when worktree shaping fails
* mg/maint-doc-sparse-checkout:
git-read-tree.txt: correct sparse-checkout and skip-worktree description
git-read-tree.txt: language and typography fixes
unpack-trees: print "Aborting" to stderr
* jc/diff-index-unpack:
diff-index: pass pathspec down to unpack-trees machinery
unpack-trees: allow pruning with pathspec
traverse_trees(): allow pruning with pathspec