Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that everything has been changed to read what it needs directly
out of the `packed-refs` file, `packed_ref_store` doesn't need to
maintain a `ref_cache` at all. So get rid of it.
First of all, this will save a lot of memory and lots of little
allocations. Instead of needing to store complicated parsed data
structures in memory, we just mmap the file (potentially sharing
memory with other processes) and parse only what we need.
Moreover, since the mmapped access to the file reads only the parts of
the file that it needs, this might save reading all of the data from
disk at all (at least if the file starts out sorted).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to stop storing packed refs in a `ref_cache`. That means
that the only way we have left to optimize `peel_ref()` is by checking
whether the reference being peeled is the one currently being iterated
over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`.
But this can be done generically; it doesn't have to be implemented
per-backend.
So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()`
method from the refs API.
This removes the last callers of a couple of functions, so delete
them. More cleanup to come...
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It doesn't actually matter now, because the references are only
iterated over to fill the associated `ref_cache`, which itself puts
them in the correct order. But we want to get rid of the `ref_cache`,
so we want to be able to iterate directly over the `packed-refs`
buffer, and then the iteration will need to be ordered correctly.
In fact, we already write the `packed-refs` file sorted, but it is
possible that other Git clients don't get it right. So let's not
assume that a `packed-refs` file is sorted unless it is explicitly
declared to be so via a `sorted` trait in its header line.
If it is *not* declared to be sorted, then scan quickly through the
file to check. If it is found to be out of order, then sort the
records into a new memory-only copy. This checking and sorting is done
quickly, without parsing the full file contents. However, it needs a
little bit of care to avoid reading past the end of the buffer even if
the `packed-refs` file is corrupt.
Since *we* always write the file correctly sorted, include that trait
when we write or rewrite a `packed-refs` file. This means that the
scan described in the previous paragraph should only have to be done
for `packed-refs` files that were written by older versions of the Git
command-line client, or by other clients that haven't yet learned to
write the `sorted` trait.
If `packed-refs` was already sorted, then (if the system allows it) we
can use the mmapped file contents directly. But if the system doesn't
allow a file that is currently mmapped to be replaced using
`rename()`, then it would be bad for us to keep the file mmapped for
any longer than necessary. So, on such systems, always make a copy of
the file contents, either as part of the sorting process, or
afterwards.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep a copy of the `packed-refs` file contents in memory for as long
as a `packed_ref_cache` object is in use:
* If the system allows it, keep the `packed-refs` file mmapped.
* If not (either because the system doesn't support `mmap()` at all,
or because a file that is currently mmapped cannot be replaced via
`rename()`), then make a copy of the file's contents in
heap-allocated space, and keep that around instead.
We base the choice of behavior on a new build-time switch,
`MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows
variants.
After this commit, `MMAP_NONE` and `MMAP_TEMPORARY` are still handled
identically. But the next commit will introduce a difference.
This whole change is still pointless, because we only read the
`packed-refs` file contents immediately after instantiating the
`packed_ref_cache`. But that will soon change.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No code has been changed. This will make subsequent patches more
self-contained.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a reference is broken, suppress its peeled value.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new `mmapped_ref_iterator`, which can iterate over the
references in an mmapped `packed-refs` file directly. Use this
iterator from `read_packed_refs()` to fill the packed refs cache.
Note that we are not yet willing to promise that the new iterator
generates its output in order. That doesn't matter for now, because
the packed refs cache doesn't care what order it is filled.
This change adds a lot of boilerplate without providing any obvious
benefits. The benefits will come soon, when we get rid of the
`ref_cache` for packed references altogether.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of copying data from the `packed-refs` file one line at time
and then processing it, process the data in place as much as possible.
Also, instead of processing one line per iteration of the main loop,
process a reference line plus its corresponding peeled line (if
present) together.
Note that this change slightly tightens up the parsing of the
`packed-refs` file. Previously, the parser would have accepted
multiple "peeled" lines for a single reference (ignoring all but the
last one). Now it would reject that.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code parsed the traits in the `packed-refs` header by looking
for the string " trait " (i.e., the name of the trait with a space on
either side) in the header line. This is fragile, because if any other
implementation of Git forgets to write the trailing space, the last
trait would silently be ignored (and the error might never be
noticed).
So instead, use `string_list_split_in_place()` to split the traits
into tokens then use `unsorted_string_list_has_string()` to look for
the tokens we are interested in. This means that we can read the
traits correctly even if the header line is missing a trailing
space (or indeed, if it is missing the space after the colon, or if it
has multiple spaces somewhere).
However, older Git clients (and perhaps other Git implementations)
still require the surrounding spaces, so we still have to output the
header with a trailing space.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This tightens up the parsing a bit; previously, stray header-looking
lines would have been processed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's still done in a pretty stupid way, involving more data copying
than necessary. That will improve in future commits.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract some helper functions for reporting errors. While we're at it,
prevent them from spewing unlimited output to the terminal. These
functions will soon have more callers.
These functions accept the problematic line as a `(ptr, len)` pair
rather than a NUL-terminated string, and `die_invalid_line()` checks
for an EOL itself, because these calling conventions will be
convenient for future callers. (Efficiency is not a concern here
because these functions are only ever called if the `packed-refs` file
is corrupt.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the underlying iterator is ordered, then `prefix_ref_iterator` can
stop as soon as it sees a refname that comes after the prefix. This
will rarely make a big difference now, because `ref_cache_iterator`
only iterates over the directory containing the prefix (and usually
the prefix will span a whole directory anyway). But if *hint, hint* a
future reference backend doesn't itself know where to stop the
iteration, then this optimization will be a big win.
Note that there is no guarantee that the underlying iterator doesn't
include output preceding the prefix, so we have to skip over any
unwanted references before we get to the ones that we want.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
References are iterated over in order by refname, but reflogs are not.
Some consumers of reference iteration care about the difference. Teach
each `ref_iterator` to keep track of whether its output is ordered.
`overlay_ref_iterator` is one of the picky consumers. Add a sanity
check in `overlay_ref_iterator_begin()` to verify that its inputs are
ordered.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the deletion steps unexpectedly fail, it is less bad to leave a
reference without its reflog than it is to leave a reflog without its
reference, since the latter is an invalid repository state.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now the outside world interacts with the packed ref store only via the
generic refs API plus a few lock-related functions. This allows us to
delete some functions that are no longer used, thereby completing the
encapsulation of the packed ref store.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When processing a `files_ref_store` transaction, it is sometimes
necessary to delete some references from the "packed-refs" file. Do
that using a reference transaction conducted against the
`packed_ref_store`.
This change further decouples `files_ref_store` from
`packed_ref_store`. It also fixes multiple problems, including the two
revealed by test cases added in the previous commit.
First, the old code didn't obtain the `packed-refs` lock until
`files_transaction_finish()`. This means that a failure to acquire the
`packed-refs` lock (e.g., due to contention with another process)
wasn't detected until it was too late (problems like this are supposed
to be detected in the "prepare" phase). The new code acquires the
`packed-refs` lock in `files_transaction_prepare()`, the same stage of
the processing when the loose reference locks are being acquired,
removing another reason why the "prepare" phase might succeed and the
"finish" phase might nevertheless fail.
Second, the old code deleted the loose version of a reference before
deleting any packed version of the same reference. This left a moment
when another process might think that the packed version of the
reference is current, which is incorrect. (Even worse, the packed
version of the reference can be arbitrarily old, and might even point
at an object that has since been garbage-collected.)
Third, if a reference deletion fails to acquire the `packed-refs` lock
altogether, then the old code might leave the repository in the
incorrect state (possibly corrupt) described in the previous
paragraph.
Now we activate the new "packed-refs" file (sans any references that
are being deleted) *before* deleting the corresponding loose
references. But we hold the "packed-refs" lock until after the loose
references have been finalized, thus preventing a simultaneous
"pack-refs" process from packing the loose version of the reference in
the time gap, which would otherwise defeat our attempt to delete it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a loose reference is deleted even before locking the
`packed-refs` file, let alone deleting any packed version of the
reference. This leads to two problems, demonstrated by two new tests:
* While a reference is being deleted, other processes might see the
old, packed value of the reference for a moment before the packed
version is deleted. Normally this would be hard to observe, but we
can prolong the window by locking the `packed-refs` file externally
before running `update-ref`, then unlocking it before `update-ref`'s
attempt to acquire the lock times out.
* If the `packed-refs` file is locked so long that `update-ref` fails
to lock it, then the reference can be left permanently in the
incorrect state described in the previous point.
In a moment, both problems will be fixed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use a `packed_ref_store` transaction in the implementation of
`files_initial_transaction_commit()` rather than using internal
features of the packed ref store. This further decouples
`files_ref_store` from `packed_ref_store`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At least since v1.7, the elements of the `refs_to_prune` linked list
have been leaked. Fix the leak by teaching `prune_refs()` to free the
list elements as it processes them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the packed reference store supports transactions, we can use
a transaction to write the packed versions of references that we want
to pack. This decreases the coupling between `files_ref_store` and
`packed_ref_store`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement `packed_delete_refs()` using a reference transaction. This
means that `files_delete_refs()` can use `refs_delete_refs()` instead
of `repack_without_refs()` to delete any packed references, decreasing
the coupling between the classes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the methods needed to support reference transactions for
the packed-refs backend. The new methods are not yet used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`packed_ref_store` is going to want to store some transaction-wide
data, so make a place for it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code incremented the packed ref cache reference count when
acquiring the packed-refs lock, and decremented the count when
releasing the lock. This is unnecessary because:
* Another process cannot change the packed-refs file because it is
locked.
* When we ourselves change the packed-refs file, we do so by first
modifying the packed ref-cache, and then writing the data from the
ref-cache to disk. So the packed ref-cache remains fresh because any
changes that we plan to make to the file are made in the cache first
anyway.
So there is no reason for the cache to become stale.
Moreover, the extra reference count causes a problem if we
intentionally clear the packed refs cache, as we sometimes need to do
if we change the cache in anticipation of writing a change to disk,
but then the write to disk fails. In that case, `packed_refs_unlock()`
would have no easy way to find the cache whose reference count it
needs to decrement.
This whole issue will soon become moot due to upcoming changes that
avoid changing the in-memory cache as part of updating the packed-refs
on disk, but this change makes that transition easier.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Killing "git merge --edit" before the editor returns control left
the repository in a state with MERGE_MSG but without MERGE_HEAD,
which incorrectly tells the subsequent "git commit" that there was
a squash merge in progress. This has been fixed.
* mg/killed-merge:
merge: save merge state earlier
merge: split write_merge_state in two
merge: clarify call chain
Documentation/git-merge: explain --continue
The code to acquire a lock on a reference (e.g. while accepting a
push from a client) used to immediately fail when the reference is
already locked---now it waits for a very short while and retries,
which can make it succeed if the lock holder was holding it during
a read-only operation.
* mh/ref-lock-entry:
refs: retry acquiring reference locks for 100ms
"[gc] rerereResolved = 5.days" used to be invalid, as the variable
is defined to take an integer counting the number of days. It now
is allowed.
* jc/cutoff-config:
rerere: allow approxidate in gc.rerereResolved/gc.rerereUnresolved
rerere: represent time duration in timestamp_t internally
t4200: parameterize "rerere gc" custom expiry test
t4200: gather "rerere gc" together
t4200: make "rerere gc" test more robust
t4200: give us a clean slate after "rerere gc" tests
We used to spend more than necessary cycles allocating and freeing
piece of memory while writing each index entry out. This has been
optimized.
* kw/write-index-reduce-alloc:
read-cache: avoid allocating every ondisk entry when writing
read-cache: fix memory leak in do_write_index
perf: add test for writing the index
Code clean-up to avoid mixing values read from the .gitmodules file
and values read from the .git/config file.
* bw/submodule-config-cleanup:
submodule: remove gitmodules_config
unpack-trees: improve loading of .gitmodules
submodule-config: lazy-load a repository's .gitmodules file
submodule-config: move submodule-config functions to submodule-config.c
submodule-config: remove support for overlaying repository config
diff: stop allowing diff to have submodules configured in .git/config
submodule: remove submodule_config callback routine
unpack-trees: don't respect submodule.update
submodule: don't rely on overlayed config when setting diffopts
fetch: don't overlay config with submodule-config
submodule--helper: don't overlay config in update-clone
submodule--helper: don't overlay config in remote_submodule_branch
add, reset: ensure submodules can be added or reset
submodule: don't use submodule_from_name
t7411: check configuration parsing errors
"gitweb" shows a link to visit the 'raw' contents of blbos in the
history overview page.
* js/gitweb-raw-blob-link-in-history:
gitweb: add 'raw' blob_plain link in history overview
"git apply" that is used as a better "patch -p1" failed to apply a
taken from a file with CRLF line endings to a file with CRLF line
endings. The root cause was because it misused convert_to_git()
that tried to do "safe-crlf" processing by looking at the index
entry at the same path, which is a nonsense---in that mode, "apply"
is not working on the data in (or derived from) the index at all.
This has been fixed.
* tb/apply-with-crlf:
apply: file commited with CRLF should roundtrip diff and apply
convert: add SAFE_CRLF_KEEP_CRLF
Test update to improve coverage for "git stash" operations.
* jt/stash-tests:
stash: add a test for stashing in a detached state
stash: add a test for when apply fails during stash branch
stash: add a test for stash create with no files
"git interpret-trailers" has been taught a "--parse" and a few
other options to make it easier for scripts to grab existing
trailer lines from a commit log message.
* jk/trailers-parse:
doc/interpret-trailers: fix "the this" typo
pretty: support normalization options for %(trailers)
t4205: refactor %(trailers) tests
pretty: move trailer formatting to trailer.c
interpret-trailers: add --parse convenience option
interpret-trailers: add an option to unfold values
interpret-trailers: add an option to show only existing trailers
interpret-trailers: add an option to show only the trailers
trailer: put process_trailers() options into a struct
"git interpret-trailers" learned to take the trailer specifications
from the command line that overrides the configured values.
* pb/trailers-from-command-line:
interpret-trailers: fix documentation typo
interpret-trailers: add options for actions
trailers: introduce struct new_trailer_item
trailers: export action enums and corresponding lookup functions
A handful of bugfixes and an improvement to "diff --color-moved".
* jt/diff-color-move-fix:
diff: define block by number of alphanumeric chars
diff: respect MIN_BLOCK_LENGTH for last block
diff: avoid redundantly clearing a flag
"git diff" has been taught to optionally paint new lines that are
the same as deleted lines elsewhere differently from genuinely new
lines.
* sb/diff-color-move: (25 commits)
diff: document the new --color-moved setting
diff.c: add dimming to moved line detection
diff.c: color moved lines differently, plain mode
diff.c: color moved lines differently
diff.c: buffer all output if asked to
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
diff.c: convert word diffing to use emit_diff_symbol
diff.c: convert show_stats to use emit_diff_symbol
diff.c: convert emit_binary_diff_body to use emit_diff_symbol
submodule.c: migrate diff output to use emit_diff_symbol
diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}
diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]
diff.c: migrate emit_line_checked to use emit_diff_symbol
diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
...