A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.
* tb/cruft-packs:
sha1-file.c: don't freshen cruft packs
builtin/gc.c: conditionally avoid pruning objects via loose
builtin/repack.c: add cruft packs to MIDX during geometric repack
builtin/repack.c: use named flags for existing_packs
builtin/repack.c: allow configuring cruft pack generation
builtin/repack.c: support generating a cruft pack
builtin/pack-objects.c: --cruft with expiration
reachable: report precise timestamps from objects in cruft packs
reachable: add options to add_unseen_recent_objects_to_traversal
builtin/pack-objects.c: --cruft without expiration
builtin/pack-objects.c: return from create_object_entry()
t/helper: add 'pack-mtimes' test-tool
pack-mtimes: support writing pack .mtimes files
chunk-format.h: extract oid_version()
pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
pack-mtimes: support reading .mtimes files
Documentation/technical: add cruft-packs.txt
Disable the "do not remove the directory the user started Git in"
logic when Git cannot tell where that directory is. Earlier we
refused to run in such a case.
* kl/setup-in-unreadable-worktree:
setup: don't die if realpath(3) fails on getcwd(3)
A workflow change for translators are being proposed.
* jx/l10n-workflow-change:
l10n: Document the new l10n workflow
Makefile: add "po-init" rule to initialize po/XX.po
Makefile: add "po-update" rule to update po/XX.po
po/git.pot: don't check in result of "make pot"
po/git.pot: this is now a generated file
Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
i18n CI: stop allowing non-ASCII source messages in po/git.pot
Makefile: have "make pot" not "reset --hard"
Makefile: generate "po/git.pot" from stable LOCALIZED_C
Makefile: sort source files before feeding to xgettext
Teach "git repack --geometric" work better with "--keep-pack" and
avoid corrupting the repository when packsize limit is used.
* tb/geom-repack-with-keep-and-max:
builtin/repack.c: ensure that `names` is sorted
t7703: demonstrate object corruption with pack.packSizeLimit
repack: respect --keep-pack with geometric repack
"sparse-checkout" learns to work well with the sparse-index
feature.
* ds/sparse-sparse-checkout:
sparse-checkout: integrate with sparse index
p2000: add test for 'git sparse-checkout [add|set]'
sparse-index: complete partial expansion
sparse-index: partially expand directories
sparse-checkout: --no-sparse-index needs a full index
cache-tree: implement cache_tree_find_path()
sparse-index: introduce partially-sparse indexes
sparse-index: create expand_index()
t1092: stress test 'git sparse-checkout set'
t1092: refactor 'sparse-index contents' test
The multi-pack-index code did not protect the packfile it is going
to depend on from getting removed while in use, which has been
corrected.
* tb/midx-race-in-pack-objects:
builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
builtin/pack-objects.c: ensure included `--stdin-packs` exist
builtin/pack-objects.c: avoid redundant NULL check
pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
Preliminary code refactoring around transport and bundle code.
* ds/bundle-uri:
bundle.h: make "fd" version of read_bundle_header() public
remote: allow relative_url() to return an absolute url
remote: move relative_url()
http: make http_get_file() external
fetch-pack: move --keep=* option filling to a function
fetch-pack: add a deref_without_lazy_fetch_extended()
dir API: add a generalized path_match_flags() function
connect.c: refactor sending of agent & object-format
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.
* ns/batch-fsync:
core.fsyncmethod: performance tests for batch mode
t/perf: add iteration setup mechanism to perf-lib
core.fsyncmethod: tests for batch mode
test-lib-functions: add parsing helpers for ls-files and ls-tree
core.fsync: use batch mode and sync loose objects by default on Windows
unpack-objects: use the bulk-checkin infrastructure
update-index: use the bulk-checkin infrastructure
builtin/add: add ODB transaction around add_files_to_cache
cache-tree: use ODB transaction around writing a tree
core.fsyncmethod: batched disk flushes for loose-objects
bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
bulk-checkin: rename 'state' variable and separate 'plugged' boolean
Deprecate non-cone mode of the sparse-checkout feature.
* en/sparse-cone-becomes-default:
Documentation: some sparsity wording clarifications
git-sparse-checkout.txt: mark non-cone mode as deprecated
git-sparse-checkout.txt: flesh out pattern set sections a bit
git-sparse-checkout.txt: add a new EXAMPLES section
git-sparse-checkout.txt: shuffle some sections and mark as internal
git-sparse-checkout.txt: update docs for deprecation of 'init'
git-sparse-checkout.txt: wording updates for the cone mode default
sparse-checkout: make --cone the default
tests: stop assuming --no-cone is the default mode for sparse-checkout
Fixes real problems noticed by gcc 12 and works around false
positives.
* js/ci-gcc-12-fixes:
dir.c: avoid "exceeds maximum object size" error with GCC v12.x
nedmalloc: avoid new compile error
compat/win32/syslog: fix use-after-realloc
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.
* js/use-builtin-add-i:
add -i: default to the built-in implementation
t2016: require the PERL prereq only when necessary
The tests that ensured merges stop when interfering local changes
are present did not make sure that local changes are preserved; now
they do.
* jc/t6424-failing-merge-preserve-local-changes:
t6424: make sure a failed merge preserves local changes
With the new http.curloptResolve configuration, the CURLOPT_RESOLVE
mechanism that allows cURL based applications to use pre-resolved
IP addresses for the requests is exposed to the scripts.
* cc/http-curlopt-resolve:
http: add custom hostname to IP address resolutions
In http.c, the run_active_slot() function allows the given "slot" to
make progress by calling step_active_slots() in a loop repeatedly,
and the loop is not left until the request held in the slot
completes.
Ages ago, we used to use the slot->in_use member to get out of the
loop, which misbehaved when the request in "slot" completes (at
which time, the result of the request is copied away from the slot,
and the in_use member is cleared, making the slot ready to be
reused), and the "slot" gets reused to service a different request
(at which time, the "slot" becomes in_use again, even though it is
for a different request). The loop terminating condition mistakenly
thought that the original request has yet to be completed.
Today's code, after baa7b67d (HTTP slot reuse fixes, 2006-03-10)
fixed this issue, uses a separate "slot->finished" member that is
set in run_active_slot() to point to an on-stack variable, and the
code that completes the request in finish_active_slot() clears the
on-stack variable via the pointer to signal that the particular
request held by the slot has completed. It also clears the in_use
member (as before that fix), so that the slot itself can safely be
reused for an unrelated request.
One thing that is not quite clean in this arrangement is that,
unless the slot gets reused, at which point the finished member is
reset to NULL, the member keeps the value of &finished, which
becomes a dangling pointer into the stack when run_active_slot()
returns. Clear the finished member before the control leaves the
function, which has a side effect of unconfusing compilers like
recent GCC 12 that is over-eager to warn against such an assignment.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't bother to freshen objects stored in a cruft pack individually
by updating the `.mtimes` file. This is because we can't portably `mmap`
and write into the middle of a file (i.e., to update the mtime of just
one object). Instead, we would have to rewrite the entire `.mtimes` file
which may incur some wasted effort especially if there a lot of cruft
objects and they are freshened infrequently.
Instead, force the freshening code to avoid an optimizing write by
writing out the object loose and letting it pick up a current mtime.
This works because we prefer the mtime of the loose copy of an object
when both a loose and packed one exist (whether or not the packed copy
comes from a cruft pack or not).
This could certainly do with a test and/or be included earlier in this
series/PR, but I want to wait until after I have a chance to clean up
the overly-repetitive nature of the cruft pack tests in general.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using cruft packs, the following race can occur when a geometric
repack that writes a MIDX bitmap takes place afterwords:
- First, create an unreachable object and do an all-into-one cruft
repack which stores that object in the repository's cruft pack.
- Then make that object reachable.
- Finally, do a geometric repack and write a MIDX bitmap.
Assuming that we are sufficiently unlucky as to select a commit from the
MIDX which reaches that object for bitmapping, then the `git
multi-pack-index` process will complain that that object is missing.
The reason is because we don't include cruft packs in the MIDX when
doing a geometric repack. Since the "make that object reachable" doesn't
necessarily mean that we'll create a new copy of that object in one of
the packs that will get rolled up as part of a geometric repack, it's
possible that the MIDX won't see any copies of that now-reachable
object.
Of course, it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are likely
to get thrown away. But excluding that pack does open us up to the above
race.
This patch demonstrates the bug, and resolves it by including cruft
packs in the MIDX even when doing a geometric repack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the `util` pointer for items in the `existing_packs` string list
to indicate which packs are going to be deleted. Since that has so far
been the only use of that `util` pointer, we just set it to 0 or 1.
But we're going to add an additional state to this field in the next
patch, so prepare for that by adding a #define for the first bit so we
can more expressively inspect the flags state.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.
Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose a way to split the contents of a repository into a main and cruft
pack when doing an all-into-one repack with `git repack --cruft -d`, and
a complementary configuration variable.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a previous patch, pack-objects learned how to generate a cruft pack
so long as no objects are dropped.
This patch teaches pack-objects to handle the case where a non-never
`--cruft-expiration` value is passed. This case is slightly more
complicated than before, because we want pack-objects to save
unreachable objects which would have been pruned when there is another
recent (i.e., non-prunable) unreachable object which reaches the other.
We'll call these objects "unreachable but reachable-from-recent".
Here is how pack-objects handles `--cruft-expiration`:
- Instead of adding all objects outside of the kept pack(s) into the
packing list, only handle the ones whose mtime is within the grace
period.
- Construct a reachability traversal whose tips are the
unreachable-but-recent objects.
- Then, walk along that traversal, stopping if we reach an object in
the kept pack. At each step along the traversal, we add the object
we are visiting to the packing list.
In the majority of these cases, any object we visit in this traversal
will already be in our packing list. But we will sometimes encounter
reachable-from-recent cruft objects, which we want to retain even if
they aged out of the grace period.
The most subtle point of this process is that we actually don't need to
bother to update the rescued object's mtime. Even though we will write
an .mtimes file with a value that is older than the expiration window,
it will continue to survive cruft repacks so long as any objects which
reach it haven't aged out.
That is, a future repack will also exclude that object from the initial
packing list, only to discover it later on when doing the reachability
traversal.
Finally, stopping early once an object is found in a kept pack is safe
to do because the kept packs ordinarily represent which packs will
survive after repacking. Assuming that it _isn't_ safe to halt a
traversal early would mean that there is some ancestor object which is
missing, which implies repository corruption (i.e., the complete set of
reachable objects isn't present).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating a cruft pack, the caller within pack-objects will want
to know the precise timestamps of cruft objects (i.e., their
corresponding values in the .mtimes table) rather than the mtime of the
cruft pack itself.
Teach add_recent_packed() to lookup each object's precise mtime from the
.mtimes file if one exists (indicated by the is_cruft bit on the
packed_git structure).
A couple of small things worth noting here:
- load_pack_mtimes() needs to be called before asking for
nth_packed_mtime(), and that call is done lazily here. That function
exits early if the .mtimes file has already been opened and parsed,
so only the first call is slow.
- Checking the is_cruft bit can be done without any extra work on the
caller's behalf, since it is set up for us automatically as a
side-effect of calling add_packed_git() (just like the 'pack_keep'
and 'pack_promisor' bits).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function behaves very similarly to what we will need in
pack-objects in order to implement cruft packs with expiration. But it
is lacking a couple of things. Namely, it needs:
- a mechanism to communicate the timestamps of individual recent
objects to some external caller
- and, in the case of packed objects, our future caller will also want
to know the originating pack, as well as the offset within that pack
at which the object can be found
- finally, it needs a way to skip over packs which are marked as kept
in-core.
To address the first two, add a callback interface in this patch which
reports the time of each recent object, as well as a (packed_git,
off_t) pair for packed objects.
Likewise, add a new option to the packed object iterators to skip over
packs which are marked as kept in core. This option will become
implicitly tested in a future patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.
When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".
Generating a non-expiring cruft packs works as follows:
- Callers provide a list of every pack they know about, and indicate
which packs are about to be removed.
- All packs which are going to be removed (we'll call these the
redundant ones) are marked as kept in-core.
Any packs the caller did not mention (but are known to the
`pack-objects` process) are also marked as kept in-core. Packs not
mentioned by the caller are assumed to be unknown to them, i.e.,
they entered the repository after the caller decided which packs
should be kept and which should be discarded.
Since we do not want to include objects in these "unknown" packs
(because we don't know which of their objects are or aren't
reachable), these are also marked as kept in-core.
- Then, we enumerate all objects in the repository, and add them to
our packing list if they do not appear in an in-core kept pack.
This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new caller in the next commit will want to immediately modify the
object_entry structure created by create_object_entry(). Instead of
forcing that caller to wastefully look-up the entry we just created,
return it from create_object_entry() instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next patch, we will implement and test support for writing a
cruft pack via a special mode of `git pack-objects`. To make sure that
objects are written with the correct timestamps, and a new test-tool
that can dump the object names and corresponding timestamps from a given
`.mtimes` file.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the `.mtimes` format is defined, supplement the pack-write API
to be able to conditionally write an `.mtimes` file along with a pack by
setting an additional flag and passing an oidmap that contains the
timestamps corresponding to each object in the pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three definitions of an identical function which converts
`the_hash_algo` into either 1 (for SHA-1) or 2 (for SHA-256). There is a
copy of this function for writing both the commit-graph and
multi-pack-index file, and another inline definition used to write the
.rev header.
Consolidate these into a single definition in chunk-format.h. It's not
clear that this is the best header to define this function in, but it
should do for now.
(Worth noting, the .rev caller expects a 4-byte unsigned, but the other
two callers work with a single unsigned byte. The consolidated version
uses the latter type, and lets the compiler widen it when required).
Another caller will be added in a subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This structure will be used to communicate the per-object mtimes when
writing a cruft pack. Here, we need the full packing_data structure
because the mtime information is stored in an array there, not on the
individual object_entry's themselves (to avoid paying the overhead in
structure width for operations which do not generate a cruft pack).
We haven't passed this information down before because one of the two
callers (in bulk-checkin.c) does not have a packing_data structure at
all. In that case (where no cruft pack will be generated), NULL is
passed instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To store the individual mtimes of objects in a cruft pack, introduce a
new `.mtimes` format that can optionally accompany a single pack in the
repository.
The format is defined in Documentation/technical/pack-format.txt, and
stores a 4-byte network order timestamp for each object in name (index)
order.
This patch prepares for cruft packs by defining the `.mtimes` format,
and introducing a basic API that callers can use to read out individual
mtimes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote -v" now shows the list-objects-filter used during
fetching from the remote, if available.
* ac/remote-v-with-object-list-filters:
builtin/remote.c: teach `-v` to list filters for promisor remotes
With a recent update to refuse access to repositories of other
people by default, "sudo make install" and "sudo git describe"
stopped working. This series intends to loosen it while keeping
the safety.
* cb/path-owner-check-with-sudo:
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
"git -c branch.autosetupmerge=simple branch $A $B" will set the $B
as $A's upstream only when $A and $B shares the same name, and "git
-c push.default=simple" on branch $A would push to update the
branch $A at the remote $B came from. Also more places use the
sole remote, if exists, before defaulting to 'origin'.
* tk/simple-autosetupmerge:
push: new config option "push.autoSetupRemote" supports "simple" push
push: default to single remote even when not named origin
branch: new autosetupmerge option 'simple' for matching branches
Change the "flow" of how translators interact with the l10n repository
at [1] to adjust it for a new workflow of not having a po/git.pot file
in-tree at all, and to not commit line numbers to the po/*.po files
that we do track in tree.
The current workflow was added in a combination of dce37b66fb (l10n:
initial git.pot for 1.7.10 upcoming release, 2012-02-13) and
271ce198cd (Update l10n guide, 2012-02-29).
As noted in preceding commits I think that it came about due to
technical debt I'd left behind in how the "po/git.pot" file was
created, and a mis-impression that the file:line comments were needed
as anything more than a transitory translation aid.
As the updated po/README.md shows the new workflow is substantially
the same, the difference is that translators no longer need to
initially pull from the l10n coordinator for a new po/git.pot, they
can simply use git.git's canonical source repository.
The l10n coordinator is still expected to announce a release to
translate, which presumably would always be Junio's latest release
tag. I'm not certain if this part of the process is actually
important. I.e. the delta translation-wise between that tag and
"master" is usually pretty small, so perhaps translators can just work
on "master" instead.
1. https://github.com/git-l10n/git-po/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core translation is the minimum set of work that must be done for a
new language translation.
There are over 5000 messages in the template message file "po/git.pot"
that need to be translated. It is not a piece of cake for such a huge
workload. So we used to define a small set of messages called "core
translation" that a new l10n contributor must complete before sending
pull request to the l10n coordinator.
By pulling in some parts of the git-po-helper[^1] logic, we add a new
rule to create this core translation message "po/git-core.pot":
make po/git-core.pot
To help new l10n contributors to initialized their "po/XX.pot" from
"po/git-core.pot", we also add new rules "po-init":
make po-init PO_FILE=po/XX.po
[^1]: https://github.com/git-l10n/git-po-helper/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since there is no longer a "po/git.pot" file in tree, a l10n team leader
has to run several commands to update their "po/XX.po" file:
$ make pot
$ msgmerge --add-location --backup=off -U po/XX.po po/git.pot
To make this process easier, add a new rule so that l10n team leaders
can update their "po/XX.po" with one command. E.g.:
$ make po-update PO_FILE=po/zh_CN.po
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the "po/git.pot" file from being tracked, which started with
dce37b66fb (l10n: initial git.pot for 1.7.10 upcoming release,
2012-02-13).
The reason the po/git.pot started being checked in was because the
po/*.po files were changed a schema where we'd generate them from a
known-good snapshot of po/git.pot, instead of each translator running
"make pot" themselves.
This makes sense, but we don't need to carry this file in-tree just to
achieve that aim, and doing so has resulted in a significant amount of
"diff churn" since this method of doing it was introduced:
$ git log -p --oneline -- po/git.pot|wc -l
553743
We can instead let l10n contributors to generate "po/git.pot" in runtime
to update their own "po/XX.po", and the l10n coordinator can check
pull requests using CI pipeline.
This reverts to the schema introduced initially in cd5513a716 (i18n:
Makefile: "pot" target to extract messages marked for translation,
2011-02-22).
The actual "git rm" of po/git.pot was in preceding commit to make this
change easier to review, and to preempt the mailing list from blocking
it due to it being too large.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We no longer keep track of the contents of this file.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We get source files saved in "$(FOUND_SOURCE_FILES)" by running the
command "git ls-files" or the command "find". We tried to have the
both commands return the same list of files, but apparently the "find"
command will return more files, such as the generated headers. We can
filter out these generated headers to get closer results.
In addition to this, "$(FOUND_SOURCE_FILES)" may contain duplicate
files. E.g. "git-ls-files" may have duplicate entries for the same file
in different staging areas if there are unresolved conflicts in the
working tree. For this case, we can reduce duplicate entries by passing
the option "--deduplicate" to git-ls-files.
Junio reported that when running "make" in a working tree with
unresolved conflicts, "make" may report warnings like below:
Makefile:xxxx: target '.build/pot/po/FOO.c.po' given more than once
in the same rule
The duplicate targets are introduced by the following pattern rule we
added in the preceding commit for incremental build of "po/git.pot".
$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
Although we have resolved this issue by sorting to create a unique
$(LOCALIZED_C), other targets may benefit from this. Such as: tags,
cscope.out, etc.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we moved away from using xgettext(1) to both
generate the po/git.pot, and to merge the incrementally generated
po/git.pot+ file as we sourced translations from C, shell and Perl.
Doing it this way, which dates back to my initial
implementation[1][2][3] was conflating two things: With xgettext(1)
the --from-code both controls what encoding is specified in the
po/git.pot's header, and what encoding we allow in source messages.
We don't ever want to allow non-ASCII in *source messages*, and doing
so has hid e.g. a buggy message introduced in
a6226fd772 (submodule--helper: convert the bulk of cmd_add() to C,
2021-08-10) from us, we'd warn about it before, but only when running
"make pot", but the operation would still succeed. Now we'll error out
on it when running "make pot".
Since the preceding Makefile changes made this easy: let's add a "make
check-pot" target with the same prerequisites as the "po/git.pot"
target, but without changing the file "po/git.pot". Running it as part
of the "static-analysis" CI target will ensure that we catch any such
issues in the future. E.g.:
$ make check-pot
XGETTEXT .build/pot/po/builtin/submodule--helper.c.po
xgettext: Non-ASCII string at builtin/submodule--helper.c:3381.
Please specify the source encoding through --from-code.
make: *** [.build/pot/po/builtin/submodule--helper.c.po] Error 1
1. cd5513a716 (i18n: Makefile: "pot" target to extract messages
marked for translation, 2011-02-22)
2. adc3b2b276 (Makefile: add xgettext target for *.sh files,
2011-05-14)
3. 5e9637c629 (i18n: add infrastructure for translating Git with
gettext, 2011-11-18)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
custom PRItime format, 2017-07-20), we'd consider source files as-is
with gettext, but because we need to understand PRItime in the same way
that gettext itself understands PRIuMAX, we'd first check if we had a
clean checkout, then munge all of the processed files in-place with
"sed", generate "po/git.pot", and then finally "reset --hard" to undo
our changes.
By generating "pot" snippets in ".build/pot/po" for each source file
and rewriting certain source files with PRItime macros to temporary
files in ".build/pot/po", we can avoid running "make pot" by altering
files in place and doing a "reset --hard" afterwards.
This speed of "make pot" is slower than before on an initial run,
because we run "xgettext" many times (once per source file), but it
can be boosted by parallelization. It is *much* faster for incremental
runs, and will allow us to implement related targets in subsequent
commits.
When the "pot" target was originally added in cd5513a716 (i18n:
Makefile: "pot" target to extract messages marked for translation,
2011-02-22) it behaved like a "normal" target. I.e. we'd skip the
re-generation of the po/git.pot if nothing had to be done.
Then after po/git.pot was checked in in dce37b66fb (l10n: initial
git.pot for 1.7.10 upcoming release, 2012-02-13) the target was broken
until 1f31963e92 (i18n: treat "make pot" as an explicitly-invoked
target, 2014-08-22) when it was made to depend on "FORCE". I.e. the
Makefile's dependency resolution inherently can't handle incremental
building when the target file may be updated by git (or something else
external to "make"). But this case no longer applies, so FORCE is no
longer needed.
That out of the way, the main logic change here is getting rid of the
"reset --hard":
We'll generate intermediate ".build/pot/po/%.po" files from "%", which
is handy to see at a glance what strings (if any) in a given file are
marked for translation:
$ make .build/pot/po/pretty.c.po
[...]
$ cat .build/pot/po/pretty.c.po
#: pretty.c:1051
msgid "unable to parse --pretty format"
msgstr ""
$
For these C source files which contain the PRItime macros, we will
create temporary munged "*.c" files in a tree in ".build/pot/po"
corresponding to our source tree, and have "xgettext" consider those.
The rule needs to be careful to "(cd .build/pot/po && ...)", because
otherwise the comments in the po/git.pot file wouldn't refer to the
correct source locations (they'd be prefixed with ".build/pot/po").
These temporary munged "*.c” files will be removed immediately after
the corresponding po files are generated, because some development tools
cannot ignore the duplicate source files in the ".build" directory
according to the ".gitignore" file, and that may cause trouble.
The output of the generated po/git.pot file is changed in one minor
way: Because we're using msgcat(1) instead of xgettext(1) to
concatenate the output we'll now disambiguate where "TRANSLATORS"
comments come from, in cases where a message is the same in N files,
and either only one has a "TRANSLATORS" comment, or they're
different. E.g. for the "Your edited hunk[...]" message we'll now
apply this change (comment content elided):
+#. #-#-#-#-# add-patch.c.po #-#-#-#-#
#. TRANSLATORS: do not translate [y/n]
[...]
+#. #-#-#-#-# git-add--interactive.perl.po #-#-#-#-#
#. TRANSLATORS: do not translate [y/n]
[...]
#: add-patch.c:1253 git-add--interactive.perl:1244
msgid ""
"Your edited hunk does not apply. Edit again (saying \"no\" discards!) [y/n]? "
msgstr ""
There are six such changes, and they all make the context more
understandable, as msgcat(1) is better at handling these edge cases
than xgettext(1)'s previously used "--join-existing" flag.
But filenames in the above disambiguation lines of extracted-comments
have an extra ".po" extension compared to the filenames at the file
locations. While we could rename the intermediate ".build/pot/po/%.po"
files without the ".po" extension to use more intuitive filenames in
the disambiguation lines of extracted-comments, but that will confuse
developer tools with lots of invalid C or other source files in
".build/pot/po" directory.
The addition of "--omit-header" option for xgettext makes the "pot"
snippets in ".build/pot/po/*.po" smaller. But as we'll see in a
subsequent commit this header behavior has been hiding an
encoding-related bug from us, so let's carry it forward instead of
re-generating it with xgettext(1).
The "po/git.pot" file should have a header entry, because a proper
header entry will increase the speed of creating a new po file using
msginit and set a proper "POT-Creation-Date:" field in the header
entry of a "po/XX.po" file. We use xgettext to generate a separate
header file at ".build/pot/git.header" from "/dev/null", and use this
header to assemble "po/git.pot".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Different users may generate a different message template file
"po/git.pot". This is because the POT file is generated from
"$(LOCALIZED_C)", which is supposed to list all the sources that we
extract the strings to be translated from. But "$(LOCALIZED_C)"
includes "$(C_OBJ)", which only lists the source files used in the
current build for a specific platform and specific compiler
conditions.
Instead of using "$(C_OBJ)", we use "$(FOUND_C_SOURCES)", which lists
all source files we keep track of (or ship in a tarball extract), to
form a stable "LOCALIZED_C". We also add "$(SCALAR_SOURCES)", which
is part of "$(C_OBJ)" but not included in "$(FOUND_C_SOURCES)".
With this update, the newly generated "po/git.pot" will have 30 new
entries coming from the following C source files:
* compat/fsmonitor/fsm-listen-win32.c
* compat/mingw.c
* compat/regex/regcomp.c
* compat/simple-ipc/ipc-win32.c
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will feed xgettext with more C source files and in different order
in subsequent commit. To generate a stable "po/git.pot" regardless of
the number and order of input source files, we sort the c, perl, and
shell source files in groups before feeding them to xgettext.
Ævar suggested that we should not pass the option "--sort-by-file" to
xgettext to sort the translatable strings, as it will mix the three
groups of source files (c, perl and shell) in the file "po/git.pot",
and change the order of translatable strings in the same line of a file.
With this update, the newly generated "po/git.pot" will have the same
entries while in a different order.
With the help of a custom diff driver as shown below,
git config --global diff.gettext-fmt.textconv \
"msgcat --no-location --sort-by-file"
and appending a new entry "*.pot diff=gettext-fmt" to git attributes,
we can see that there are no substantial changes in "po/git.pot".
We won't checkin the newly generated "po/git.pot", because we will
remove it from tree in a later commit.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>