The commit 29de20504e (Makefile: fix default regex settings on
Darwin, 2013-05-11) fixed t0070-fundamental.sh under Darwin (macOS) by
adopting Git's regex library. However, this library is compiled with
NO_MBSUPPORT, which causes git-grep to work incorrectly on multibyte
(e.g. UTF-8) files. Current macOS versions pass t0070-fundamental.sh
with the native macOS regex library, which also supports multibyte
characters.
Adjust the Makefile to use the native regex library, and call
setlocale(3) to set CTYPE according to the user's preference.
The setlocale call is required on all platforms, but in platforms
supporting gettext(3), setlocale was called as a side-effect of
initializing gettext. Therefore, move the CTYPE setlocale call from
gettext.c to common-main.c and the corresponding locale.h include
into git-compat-util.h.
Thanks to the global initialization of CTYPE setlocale, the test-tool
regex command now works correctly with supported multibyte regexes, and
is used to set the MB_REGEX test prerequisite by assessing a platform's
support for them.
Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing __attribute__((format)) function attributes to various
"static" functions that take printf arguments.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.
* ab/detox-gettext-tests:
tests: remove uses of GIT_TEST_GETTEXT_POISON=false
tests: remove support for GIT_TEST_GETTEXT_POISON
ci: remove GETTEXT_POISON jobs
This removes the ability to inject "poison" gettext() messages via the
GIT_TEST_GETTEXT_POISON special test setup.
I initially added this as a compile-time option in bb946bba76 (i18n:
add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and
most recently modified to be toggleable at runtime in
6cdccfce1e (i18n: make GETTEXT_POISON a runtime option, 2018-11-08)..
The reason for its removal is that the trade-off of maintaining it
v.s. what it's getting us has long since flipped. When gettext was
integrated in 5e9637c629 (i18n: add infrastructure for translating
Git with gettext, 2011-11-18) there was understandable concern on the
Git ML that in marking messages for translation en-masse we'd
inadvertently mark plumbing messages. The GETTEXT_POISON facility was
a way to smoke those out via our test suite.
Nowadays however we're done (or almost entirely done) with any marking
of messages for translation. New messages are usually marked by their
authors, who'll know whether it makes sense to translate them or
not. If not any errors in marking the messages are much more likely to
be spotted in review than in the the initial deluge of i18n patches in
the 2011-2012 era.
So let's just remove this. This leaves the test suite in a state where
we still have a lot of test_i18n, C_LOCALE_OUTPUT
etc. uses. Subsequent commits will remove those too.
The change to t/lib-rebase.sh is a selective revert of the relevant
part of f2d17068fd (i18n: rebase-interactive: mark comments of squash
for translation, 2016-06-17), and the comment in
t/t3406-rebase-message.sh is from c7108bf9ed (i18n: rebase: mark
messages for translation, 2012-07-25).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mostly remove the comment I added 5e9637c629 (i18n: add
infrastructure for translating Git with gettext, 2011-11-18). Since
then we had a fix in 9c0495d23e (gettext.c: detect the vsnprintf bug
at runtime, 2013-12-01) so we're not running with the "set back to C
locale" hack on any modern system.
So having more than 1/4 of the file taken up by a digression about a
glibc bug that mostly doesn't happen to anyone anymore is just a
needless distraction. Shorten the comment to make a brief mention of
the bug, and where to find more info by looking at the git history for
this now-removed comment.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many GIT_TEST_* environment variables control various aspects of
how our tests are run, but a few followed "non-empty is true, empty
or unset is false" while others followed the usual "there are a few
ways to spell true, like yes, on, etc., and also ways to spell
false, like no, off, etc." convention.
* ab/test-env:
env--helper: mark a file-local symbol as static
tests: make GIT_TEST_FAIL_PREREQS a boolean
tests: replace test_tristate with "git env--helper"
tests README: re-flow a previously changed paragraph
tests: make GIT_TEST_GETTEXT_POISON a boolean
t6040 test: stop using global "script" variable
config.c: refactor die_bad_number() to not call gettext() early
env--helper: new undocumented builtin wrapping git_env_*()
config tests: simplify include cycle test
On native Windows, Git exclusively uses UTF-8 for console output (both
with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
determine the output encoding for translated text, however, MSVCRT's
`setlocale()` does not support UTF-8. As a result, translated text is
encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
mangled in console output.
Side note: There is actually a code page for UTF-8: 65001. In practice,
it does not work as expected at least on Windows 7, though, so we cannot
use it in Git. Besides, if we overrode the code page, any process
spawned from Git would inherit that code page (as opposed to the code
page configured for the current user), which would quite possibly break
e.g. diff or merge helpers. So we really cannot override the code page.
In `init_gettext_charset()`, Git calls gettext's
`bind_textdomain_codeset()` with the character set obtained via
`locale_charset()`; Let's override that latter function to force the
encoding to UTF-8 on native Windows.
In Git for Windows' SDK, there is a `libcharset.h` and therefore we
define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
`config.mak.uname`, therefore we need to add the override before that
conditionally-compiled code block.
Rather than simply defining `locale_charset()` to return the string
`"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
`ab/no-kwset` patch series, for example, needs to have a way to prevent
Git from expecting UTF-8-encoded input.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the GIT_TEST_GETTEXT_POISON variable from being "non-empty?" to
being a more standard boolean variable.
Since it needed to be checked in both C code and shellscript (via test
-n) it was one of the remaining shellscript-like variables. Now that
we have "env--helper" we can change that.
There's a couple of tricky edge cases that arise because we're using
git_env_bool() early, and the config-reading "env--helper".
If GIT_TEST_GETTEXT_POISON is set to an invalid value die_bad_number()
will die, but to do so it would usually call gettext(). Let's detect
the special case of GIT_TEST_GETTEXT_POISON and always emit that
message in the C locale, lest we infinitely loop.
As seen in the updated tests in t0017-env-helper.sh there's also a
caveat related to "env--helper" needing to read the config for trace2
purposes.
Since the C_LOCALE_OUTPUT prerequisite is lazy and relies on
"env--helper" we could get invalid results if we failed to read the
config (e.g. because we'd loop on includes) when combined with
e.g. "test_i18ngrep" wanting to check with "env--helper" if
GIT_TEST_GETTEXT_POISON was true or not.
I'm crossing my fingers and hoping that a test similar to the one I
removed in the earlier "config tests: simplify include cycle test"
change in this series won't happen again, and testing for this
explicitly in "env--helper"'s own tests.
This change breaks existing uses of
e.g. GIT_TEST_GETTEXT_POISON=YesPlease, which we've documented in
po/README and other places. As noted in [1] we might want to consider
also accepting "YesPlease" in "env--helper" as a special-case.
But as the lack of uproar over 6cdccfce1e ("i18n: make GETTEXT_POISON
a runtime option", 2018-11-08) demonstrates the audience for this
option is a really narrow set of git developers, who shouldn't have
much trouble modifying their test scripts, so I think it's better to
deal with that minor headache now and make all the relevant GIT_TEST_*
variables boolean in the same way than carry the "YesPlease"
special-case forward.
1. https://public-inbox.org/git/xmqqtvckm3h8.fsf@gitster-ct.c.googlers.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON
test parameter to only be a GIT_TEST_GETTEXT_POISON=<non-empty?>
runtime parameter, to be consistent with other parameters documented
in "Running tests with special setups" in t/README.
When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON
to simulate unfriendly translator", 2011-02-22) I was concerned with
ensuring that the _() function would get constant folded if NO_GETTEXT
was defined, and likewise that GETTEXT_POISON would be compiled out
unless it was defined.
But as the benchmark in my [1] shows doing a one-off runtime
getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was
originally added the GIT_TEST_* env variables have become the common
idiom for turning on special test setups.
So change GETTEXT_POISON to work the same way. Now the
GETTEXT_POISON=YesPlease compile-time option is gone, and running the
tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off
without recompiling.
This allows for conditionally amending tests to test with/without
poison, similar to what 859fdc0c3c ("commit-graph: define
GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do
some of that, now we e.g. always run the t0205-gettext-poison.sh test.
I did enough there to remove the GETTEXT_POISON prerequisite, but its
inverse C_LOCALE_OUTPUT is still around, and surely some tests using
it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=.
Notes on the implementation:
* We still compile a dedicated GETTEXT_POISON build in Travis
CI. Perhaps this should be revisited and integrated into the
"linux-gcc" build, see ae59a4e44f ("travis: run tests with
GIT_TEST_SPLIT_INDEX", 2018-01-07) for prior art in that area. Then
again maybe not, see [2].
* We now skip a test in t0000-basic.sh under
GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This
test relies on C locale output, but due to an edge case in how the
previous implementation of GETTEXT_POISON worked (reading it from
GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does,
and needs to be skipped.
* The getenv() function is not reentrant, so out of paranoia about
code of the form:
printf(_("%s"), getenv("some-env"));
call use_gettext_poison() in our early setup in git_setup_gettext()
so we populate the "poison_requested" variable in a codepath that's
won't suffer from that race condition.
* We error out in the Makefile if you're still saying
GETTEXT_POISON=YesPlease to prompt users to change their
invocation.
* We should not print out poisoned messages during the test
initialization itself to keep it more readable, so the test library
hides the variable if set in $GIT_TEST_GETTEXT_POISON_ORIG during
setup. See [3].
See also [4] for more on the motivation behind this patch, and the
history of the GETTEXT_POISON facility.
1. https://public-inbox.org/git/871s8gd32p.fsf@evledraar.gmail.com/
2. https://public-inbox.org/git/20181102163725.GY30222@szeder.dev/
3. https://public-inbox.org/git/20181022202241.18629-2-szeder.dev@gmail.com/
4. https://public-inbox.org/git/878t2pd6yu.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/runtime-prefix:
Avoid multiple PREFIX definitions
git_setup_gettext: plug memory leak
gettext: avoid initialization if the locale dir is not present
A build-time option has been added to allow Git to be told to refer
to its associated files relative to the main binary, in the same
way that has been possible on Windows for quite some time, for
Linux, BSDs and Darwin.
* dj/runtime-prefix:
Makefile: quote $INSTLIBDIR when passing it to sed
Makefile: remove unused @@PERLLIBDIR@@ substitution variable
mingw/msvc: use the new-style RUNTIME_PREFIX helper
exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows
exec_cmd: RUNTIME_PREFIX on some POSIX systems
Makefile: add Perl runtime prefix support
Makefile: generate Perl header from template file
The system_path() function returns a freshly-allocated string. We need
to release it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The runtime of a simple `git.exe version` call on Windows is currently
dominated by the gettext setup, adding a whopping ~150ms to the ~210ms
total.
Given that this cost is added to each and every git.exe invocation goes
through common-main's invocation of git_setup_gettext(), and given that
scripts have to call git.exe dozens, if not hundreds, of times, this is
a substantial performance penalty.
This is particularly pointless when considering that Git for Windows
ships without localization (to keep the installer's size to a bearable
~34MB): all that time setting up gettext is for naught.
To be clear, Git for Windows *needs* to be compiled with localization,
for the following reasons:
- to allow users to copy add-on localization in case they want it, and
- to fix the nasty error message
BUG: your vsnprintf is broken (returned -1)
by using libgettext's override of vsnprintf() that does not share the
behavior of msvcrt.dll's version of vsnprintf().
So let's be smart about it and skip setting up gettext if the locale
directory is not even present.
Since localization might be missing for not-yet-supported locales, this
will not break anything.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable Git to resolve its own binary location using a variety of
OS-specific and generic methods, including:
- procfs via "/proc/self/exe" (Linux)
- _NSGetExecutablePath (Darwin)
- KERN_PROC_PATHNAME sysctl on BSDs.
- argv0, if absolute (all, including Windows).
This is used to enable RUNTIME_PREFIX support for non-Windows systems,
notably Linux and Darwin. When configured with RUNTIME_PREFIX, Git will
do a best-effort resolution of its executable path and automatically use
this as its "exec_path" for relative helper and data lookups, unless
explicitly overridden.
Small incidental formatting cleanup of "exec_cmd.c".
Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Robbie Iannucci <iannucci@google.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function returns true if git is running under an UTF-8
locale. pcre in the next patch will need this.
is_encoding_utf8() is used instead of strcmp() to catch both "utf-8"
and "utf8" suffixes.
When built with no gettext support, we peek in several env variables
to detect UTF-8. pcre library might support utf-8 even if libc is
built without locale support.. The peeking code is a copy from
compat/regex/regcomp.c
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This feeds the format directly to strftime. Besides being a
little more flexible, the main advantage is that your system
strftime may know more about your locale's preferred format
(e.g., how to spell the days of the week).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calling setlocale(LC_MESSAGES, ...) directly from http.c, without
including <locale.h>, was causing compilation warnings. Move the
helper function to gettext.c that already includes the header and
where locale-related issues are handled.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bug 6530 [1] in glibc causes "git show v0.99.6~1" to fail with error
"your vsnprintf is broken". The workaround avoids that, but it
corrupts system error messages in non-C locales.
The bug has been fixed since 2.17. We could know running glibc version
with gnu_get_libc_version(). But version is not a sure way to detect
the bug because downstream may back port the fix to older versions. Do
a runtime test that immitates the call flow that leads to "your
vsnprintf is broken". Only enable the workaround if the test fails.
Tested on Gentoo Linux, glibc 2.16.0 and 2.17, amd64.
[1] http://sourceware.org/bugzilla/show_bug.cgi?id=6530
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch does printf("%-*s", width, "foo") where "foo" can be a utf-8
string, but width is in bytes, not columns. For ASCII it's fine as one
byte takes one column. For utf-8, this may result in misaligned ref
summary table.
Introduce gettext_width() function that returns the string length in
columns (currently only supports utf-8 locales). Make the code use
TRANSPORT_SUMMARY(x) where the length is compensated properly in
non-English locales.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tweak the GETTEXT_POISON facility so it is activated at run time
instead of compile time. If the GIT_GETTEXT_POISON environment
variable is set, _(msg) will result in gibberish as before; but if the
GIT_GETTEXT_POISON variable is not set, it will return the message for
human-readable output. So the behavior of mistranslated and
untranslated git can be compared without rebuilding git in between.
For simplicity we always set the GIT_GETTEXT_POISON variable in tests.
This does not affect builds without the GETTEXT_POISON compile-time
option set, so non-i18n git will not be slowed down.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>