The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.
Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.
Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.
We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.
The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The null stream filter unsurprisingly does not look at its "filter"
argument, since it just eats bytes. But we can't drop it, since it has
to conform to the same virtual interface that real filters do. Mark the
unused parameter to appease -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The start_async(), etc, functions need a "proc" callback that conforms
to a particular interface. Not every callback needs every parameter
(e.g., the caller might not even ask to open an input descriptor, in
which case there is no point in the callback looking at it). Let's mark
these for -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.
Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning about converting line endings is extremely confusing. Its
two sentences each use the word "will" without specifying a timeframe,
which makes it sound like both sentences are referring to the same
timeframe. On top of that, it uses the term "original line endings"
without saying whether "original" means LF or CRLF.
Rephrase the warning to be clear about when the line endings will be
changed and what they will be changed to.
On a platform whose native line endings are not CRLF (e.g. Linux), the
"git add" step in the following sequence triggers the warning in
question:
$ git config core.autocrlf true
$ echo 'Hello world!' >hello.txt
$ git add hello.txt
warning: LF will be replaced by CRLF in hello.txt
The file will have its original line endings in your working directory
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object-file API shuffling.
* ab/object-file-api-updates:
object-file API: pass an enum to read_object_with_reference()
object-file.c: add a literal version of write_object_file_prepare()
object-file API: have hash_object_file() take "enum object_type"
object API: rename hash_object_file_literally() to write_*()
object-file API: split up and simplify check_object_signature()
object API users + docs: check <0, not !0 with check_object_signature()
object API docs: move check_object_signature() docs to cache.h
object API: correct "buf" v.s. "map" mismatch in *.c and *.h
object-file API: have write_object_file() take "enum object_type"
object-file API: add a format_object_header() function
object-file API: return "void", not "int" from hash_object_file()
object-file.c: split up declaration of unrelated variables
Change the hash_object_file() function to take an "enum
object_type".
Since a preceding commit all of its callers are passing either
"{commit,tree,blob,tag}_type", or the result of a call to type_name(),
the parse_object() caller that would pass NULL is now using
stream_object_signature().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "struct stream_filter_vtbl" and "struct stream_filter"
assignments in convert.c to use designated initializers.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The clean/smudge conversion code path has been prepared to better
work on platforms where ulong is narrower than size_t.
* mc/clean-smudge-with-llp64:
clean/smudge: allow clean filters to process extremely large files
odb: guard against data loss checking out a huge file
git-compat-util: introduce more size_t helpers
odb: teach read_blob_entry to use size_t
t1051: introduce a smudge filter test for extremely large files
test-lib: add prerequisite for 64-bit platforms
test-tool genzeros: generate large amounts of data more efficiently
test-genzeros: allow more than 2G zeros in Windows
The filter system allows for alterations to file contents when they're
moved between the database and the worktree. We already made sure that
it is possible for smudge filters to produce contents that are larger
than `unsigned long` can represent (which matters on systems where
`unsigned long` is narrower than `size_t`, most notably 64-bit Windows).
Now we make sure that clean filters can _consume_ contents that are
larger than that.
Note that this commit only allows clean filters' _input_ to be larger
than can be represented by `unsigned long`.
This change makes only a very minute dent into the much larger project
to teach Git to use `size_t` instead of `unsigned long` wherever
appropriate.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply_multi_file_filter and async_query_available_blobs both query
subprocess output using subprocess_read_status, which writes data into
the identically named filter_status strbuf. We add a strbuf_release to
avoid leaking their contents.
Leak output seen when running t0021 with LSAN:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c2b5 in xrealloc wrapper.c:126:8
#2 0x9ff99d in strbuf_grow strbuf.c:98:2
#3 0x9ff99d in strbuf_addbuf strbuf.c:304:2
#4 0xa101d6 in subprocess_read_status sub-process.c:45:5
#5 0x77793c in apply_multi_file_filter convert.c:886:8
#6 0x77793c in apply_filter convert.c:1042:10
#7 0x77a0b5 in convert_to_git_filter_fd convert.c:1492:7
#8 0x8b48cd in index_stream_convert_blob object-file.c:2156:2
#9 0x8b48cd in index_fd object-file.c:2248:9
#10 0x597411 in hash_fd builtin/hash-object.c:43:9
#11 0x596be1 in hash_object builtin/hash-object.c:59:2
#12 0x596be1 in cmd_hash_object builtin/hash-object.c:153:3
#13 0x4ce83e in run_builtin git.c:475:11
#14 0x4ccafe in handle_builtin git.c:729:3
#15 0x4cb01c in run_argv git.c:818:4
#16 0x4cb01c in cmd_main git.c:949:19
#17 0x6bdc2d in main common-main.c:52:11
#18 0x7f42acf79349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s).
Direct leak of 120 byte(s) in 5 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c295 in xrealloc wrapper.c:126:8
#2 0x9ff97d in strbuf_grow strbuf.c:98:2
#3 0x9ff97d in strbuf_addbuf strbuf.c:304:2
#4 0xa101b6 in subprocess_read_status sub-process.c:45:5
#5 0x775c73 in async_query_available_blobs convert.c:960:8
#6 0x80029d in finish_delayed_checkout entry.c:183:9
#7 0xa65d1e in check_updates unpack-trees.c:493:10
#8 0xa5f469 in unpack_trees unpack-trees.c:1747:8
#9 0x525971 in checkout builtin/clone.c:815:6
#10 0x525971 in cmd_clone builtin/clone.c:1409:8
#11 0x4ce83e in run_builtin git.c:475:11
#12 0x4ccafe in handle_builtin git.c:729:3
#13 0x4cb01c in run_argv git.c:818:4
#14 0x4cb01c in cmd_main git.c:949:19
#15 0x6bdc2d in main common-main.c:52:11
#16 0x7fa253fce349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 120 byte(s) leaked in 5 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A simple IPC interface gets introduced to build services like
fsmonitor on top.
* jh/simple-ipc:
t0052: add simple-ipc tests and t/helper/test-simple-ipc tool
simple-ipc: add Unix domain socket implementation
unix-stream-server: create unix domain socket under lock
unix-socket: disallow chdir() when creating unix domain sockets
unix-socket: add backlog size option to unix_stream_listen()
unix-socket: eliminate static unix_stream_socket() helper function
simple-ipc: add win32 implementation
simple-ipc: design documentation for new IPC mechanism
pkt-line: add options argument to read_packetized_to_strbuf()
pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option
pkt-line: do not issue flush packets in write_packetized_*()
pkt-line: eliminate the need for static buffer in packet_write_gently()
Preparatory API changes for parallel checkout.
* mt/parallel-checkout-part-1:
entry: add checkout_entry_ca() taking preloaded conv_attrs
entry: move conv_attrs lookup up to checkout_entry()
entry: extract update_ce_after_write() from write_entry()
entry: make fstat_output() and read_blob_entry() public
entry: extract a header file for entry.c functions
convert: add classification for conv_attrs struct
convert: add get_stream_filter_ca() variant
convert: add [async_]convert_to_working_tree_ca() variants
convert: make convert_attrs() and convert structs public
We had a code to diagnose and die cleanly when a required
clean/smudge filter is missing, but an assert before that
unnecessarily fired, hiding the end-user facing die() message.
* mt/cleanly-die-upon-missing-required-filter:
convert: fail gracefully upon missing clean cmd on required filter
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation. Note that crlf_action is renamed to
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the calling sequence of `read_packetized_to_strbuf()` to take
an options argument and not assume a fixed set of options. Update the
only existing caller accordingly to explicitly pass the
formerly-assumed flags.
The `read_packetized_to_strbuf()` function calls `packet_read()` with
a fixed set of assumed options (`PACKET_READ_GENTLE_ON_EOF`). This
assumption has been fine for the single existing caller
`apply_multi_file_filter()` in `convert.c`.
In a later commit we would like to add other callers to
`read_packetized_to_strbuf()` that need a different set of options.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `packet_flush_gently()` call in `write_packetized_from_buf() and
`write_packetized_from_fd()` and require the caller to call it if desired.
Rename both functions to `write_packetized_from_*_no_flush()` to prevent
later merge accidents.
`write_packetized_from_buf()` currently only has one caller:
`apply_multi_file_filter()` in `convert.c`. It always wants a flush packet
to be written after writing the payload.
However, we are about to introduce a caller that wants to write many
packets before a final flush packet, so let's make the caller responsible
for emitting the flush packet.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gitattributes documentation mentions that either the clean cmd or
the smudge cmd can be left unspecified in a filter definition. However,
when the filter is marked as 'required', the absence of any one of these
two should be treated as an error. Git already fails under these
circumstances, but not always in a pleasant way: omitting a clean cmd in
a required filter triggers an assertion error which leaves the user with
a quite verbose message:
git: convert.c:1459: convert_to_git_filter_fd: Assertion "ca.drv->clean || ca.drv->process" failed.
This assertion is not really necessary, as the apply_filter() call below
it already performs the same check. And when this condition is not met,
the function returns 0, making the caller die() with a much nicer
message. (Also note that die()-ing here is the right behavior as
`would_convert_to_git_filter_fd() == true` is a precondition to use
convert_to_git_filter_fd(), and the former is only true when the filter
is required.) So remove the assertion and add two regression tests to
make sure that git fails nicely when either the smudge or clean command
is missing on a required filter.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The crlf_action parameter hasn't been used since a0ad53c181 (convert:
Correct NNO tests and missing `LF will be replaced by CRLF`,
2016-08-13), where that part of the function was hoisted out to a
separate will_convert_lf_to_crlf() helper. Let's drop the useless
parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The child_process structure has an embedded strvec for formulating
the command line argument list these days, but code that predates
the wide use of it prepared a separate char *argv[] array and
manually set the child_process.argv pointer point at it.
Teach these old-style code to lose the separate argv[] array.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.
Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).
When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have the codebase wired up to pass any additional metadata
to filters, let's collect the additional metadata that we'd like to
pass.
The two main places we pass this metadata are checkouts and archives.
In these two situations, reading HEAD isn't a valid option, since HEAD
isn't updated for checkouts until after the working tree is written and
archives can accept an arbitrary tree. In other situations, HEAD will
usually reflect the refname of the branch in current use.
We pass a smaller amount of data in other cases, such as git cat-file,
where we can really only logically know about the blob.
This commit updates only the parts of the checkout code where we don't
use unpack_trees. That function and callers of it will be handled in a
future commit.
In the archive code, we leak a small amount of memory, since nothing we
pass in the archiver argument structure is freed.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a variety of situations where a filter process can make use of
some additional metadata. For example, some people find the ident
filter too limiting and would like to include the commit or the branch
in their smudged files. This information isn't available during
checkout as HEAD hasn't been updated at that point, and it wouldn't be
available in archives either.
Let's add a way to pass this metadata down to the filter. We pass the
blob we're operating on, the treeish (preferring the commit over the
tree if one exists), and the ref we're operating on. Note that we won't
pass this information in all cases, such as when renormalizing or when
we're performing diffs, since it doesn't make sense in those cases.
The data we currently get from the filter process looks like the
following:
command=smudge
pathname=git.c
0000
With this change, we'll get data more like this:
command=smudge
pathname=git.c
refname=refs/tags/v2.25.1
treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b
blob=7be7ad34bd053884ec48923706e70c81719a8660
0000
There are a couple things to note about this approach. For operations
like checkout, treeish will always be a commit, since we cannot check
out individual trees, but for other operations, like archive, we can end
up operating on only a particular tree, so we'll provide only a tree as
the treeish. Similar comments apply for refname, since there are a
variety of cases in which we won't have a ref.
This commit wires up the code to print this information, but doesn't
pass any of it at this point. In a future commit, we'll have various
code paths pass the actual useful data down.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).
* mt/use-passed-repo-more-in-funcs:
sha1-file: allow check_object_signature() to handle any repo
sha1-file: pass git_hash_algo to hash_object_file()
sha1-file: pass git_hash_algo to write_object_file_prepare()
streaming: allow open_istream() to handle any repo
pack-check: use given repo's hash_algo at verify_packfile()
cache-tree: use given repo's hash_algo at verify_one()
diff: make diff_populate_filespec() honor its repo argument
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use skip_iprefix() to parse "UTF" case-insensitively instead of checking
with istarts_with(), building an upper-case version and then using
skip_prefix() on it. This gets rid of duplicate code and of a small
allocation.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to skip "UTF" and "UTF-" prefix, when computing an advice
message, did not work correctly when the prefix was "UTF", which
has been fixed.
* rs/convert-fix-utf-without-dash:
convert: fix handling of dashless UTF prefix in validate_encoding()
Strip "UTF" and an optional dash from the start of 'upper' without
passing a NULL pointer to skip_prefix() in the second call, as it cannot
handle that.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.
To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".
Since we load these attributes in multiple separate files, we must
expire them accordingly.
Verify that both the am and rebase code paths work correctly, including
the conflict marker size with am -3.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the "clean" filter can reduce the size of a huge file in the
working tree down to a small "token" (a la Git LFS), there is no
point in allocating a huge scratch area upfront, but the buffer is
sized based on the original file size. The convert mechanism now
allocates very minimum and reallocates as it receives the output
from the clean filter process.
* jh/resize-convert-scratch-buffer:
convert: avoid malloc of original file size
We write the output of a "clean" filter into a strbuf. Rather than
growing the strbuf dynamically as we read its output, we make the
initial allocation as large as the original input file. This is a good
guess when the filter is just tweaking a few bytes, but it's disastrous
when the point of the filter is to condense a very large file into a
short identifier (e.g., the way git-lfs and git-annex do). We may ask to
allocate many gigabytes, causing the allocation to fail and Git to
die().
Instead, let's just let strbuf do its usual growth.
When the clean filter does output something around the same size as the
worktree file, the buffer will need to be reallocated until it fits,
starting at 8192 and doubling in size. Benchmarking indicates that
reallocation is not a significant overhead for outputs up to a
few MB in size.
Signed-off-by: Joey Hess <id@joeyh.name>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* jk/unused-parameter-cleanup:
convert: drop path parameter from actual conversion functions
convert: drop len parameter from conversion checks
config: drop unused parameter from maybe_remove_section()
show_date_relative(): drop unused "tz" parameter
column: drop unused "opts" parameter in item_length()
create_bundle(): drop unused "header" parameter
apply: drop unused "def" parameter from find_name_gnu()
match-trees: drop unused path parameter from score functions
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.
* nd/the-index-final:
cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
read-cache.c: remove the_* from index_has_changes()
merge-recursive.c: remove implicit dependency on the_repository
merge-recursive.c: remove implicit dependency on the_index
sha1-name.c: remove implicit dependency on the_index
read-cache.c: replace update_index_if_able with repo_&
read-cache.c: kill read_index()
checkout: avoid the_index when possible
repository.c: replace hold_locked_index() with repo_hold_locked_index()
notes-utils.c: remove the_repository references
grep: use grep_opt->repo instead of explict repo argument
The caller is responsible for looking up the attributes,
after which point we no longer care about the path at which
the content is found.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've already extracted the useful information into our text_stat
struct, so the length is no longer needed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>