Граф коммитов

248 Коммитов

Автор SHA1 Сообщение Дата
Brandon Williams f9704c2d82 diff: convert fill_filespec to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams 1c41c82bc4 grep: convert to struct object_id
Convert the remaining parts of grep to use struct object_id.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:06 +09:00
Ævar Arnfjörð Bjarmason 94da9193a6 grep: add support for PCRE v2
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh
    [...]
    Test                                            HEAD~5            HEAD~                    HEAD
    -----------------------------------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.31(1.10+0.48)   0.21(0.35+0.56) -32.3%   0.21(0.34+0.55) -32.3%
    7820.7: perl grep '^how to'                     0.56(2.70+0.40)   0.24(0.64+0.52) -57.1%   0.20(0.28+0.60) -64.3%
    7820.11: perl grep '[how] to'                   0.56(2.66+0.38)   0.29(0.95+0.45) -48.2%   0.23(0.45+0.54) -58.9%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       1.02(5.77+0.42)   0.31(1.02+0.54) -69.6%   0.23(0.50+0.54) -77.5%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.38(1.57+0.42)   0.27(0.85+0.46) -28.9%   0.21(0.33+0.57) -44.7%

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

    [...]
    Test                                            HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.21(0.42+0.52)   0.21(0.31+0.58) +0.0%
    7820.7: perl grep '^how to'                     0.25(0.65+0.50)   0.20(0.31+0.57) -20.0%
    7820.11: perl grep '[how] to'                   0.30(0.90+0.50)   0.23(0.46+0.53) -23.3%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.30(1.19+0.38)   0.23(0.51+0.51) -23.3%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.27(0.84+0.48)   0.21(0.34+0.57) -22.2%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 08:29:05 +09:00
Ævar Arnfjörð Bjarmason e87de7cab4 grep: un-break building with PCRE < 8.32
Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.

The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).

This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.

So just take the easy way out and disable the JIT on any version older
than 8.32.

The reason this change isn't part of the initial change PCRE JIT
support is to have a cleaner history showing which parts of the
implementation are only used for ancient PCRE versions. This also
makes it easier to revert this change if we ever decide to stop
supporting those old versions.

1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
   native interface for JIT. Through this interface, the
   compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:59:05 +09:00
Ævar Arnfjörð Bjarmason fbaceaac47 grep: add support for the PCRE v1 JIT API
Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh
    Test                                           HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.35(1.11+0.43)   0.23(0.42+0.46) -34.3%
    7820.7: perl grep '^how to'                     0.64(2.71+0.36)   0.27(0.66+0.44) -57.8%
    7820.11: perl grep '[how] to'                   0.63(2.51+0.42)   0.33(0.98+0.39) -47.6%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       1.17(5.61+0.35)   0.34(1.08+0.46) -70.9%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.43(1.52+0.44)   0.30(0.88+0.42) -30.2%

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.

That there's no assignment of `p->pcre1_jit_on = 0` when
PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat()
function allocates the grep_pat allocates it with calloc(), so it's
guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined.

I you're bisecting and find this change, check that your PCRE isn't
older than 8.32. This change intentionally broke really old versions
of PCRE, but that's fixed in follow-up commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:59:05 +09:00
Ævar Arnfjörð Bjarmason 543f1c0cb0 grep: move is_fixed() earlier to avoid forward declaration
Move the is_fixed() function which are currently only used in
compile_regexp() earlier so it can be used in the PCRE family of
functions in a later change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:52:37 +09:00
Ævar Arnfjörð Bjarmason 6d4b5747f0 grep: change internal *pcre* variable & function names to be *pcre1*
Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.

An earlier change in this series ("grep: change the internal PCRE
macro names to be PCRE1", 2017-04-07) elaborates on the motivations
behind this change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:52:37 +09:00
Ævar Arnfjörð Bjarmason 3485bea157 grep: change the internal PCRE macro names to be PCRE1
Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.

This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.

In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.

However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.

This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:52:37 +09:00
Ævar Arnfjörð Bjarmason 219e65b65c grep: factor test for \0 in grep patterns into a function
Factor the test for \0 in grep patterns into a function. Since commit
9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
\0 is considered fixed as regcomp() can't handle it.

This change makes later changes that make use of either has_null() or
is_fixed() (but not both) smaller.

While I'm at it make the comment conform to the style guide, i.e. add
an opening "/*\n".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:52:37 +09:00
Ævar Arnfjörð Bjarmason e0b9f8ae09 grep: remove redundant regflags assignments
Remove redundant assignments to the "regflags" variable. This variable
is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to
un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}.

Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon
seeing "-G" on the command line and flipped the bit on upon seeing
"-E", but I think that was perfectly sensible and it would have been a
bug if we didn't.  They were part of the command line parsing that
could have seen "-E" on the command line earlier.

When cca2c172 ("git-grep: do not die upon -F/-P when
grep.extendedRegexp is set.", 2011-05-09) switched the command line
parsing to "read into a 'tentatively this is what we saw the last'
variable and then finally commit just once", we didn't touch
opt.regflags for PCRE and FIXED, but we still had to flip regflags
between BRE and ERE, because parsing of grep.extendedregexp
configuration variable directly touched opt.regflags back then, which
was done by b22520a3 ("grep: allow -E and -n to be turned on by
default via configuration", 2011-03-30).

When 84befcd0 ("grep: add a grep.patternType configuration setting",
2012-08-03) introduced extended_regexp_option field, we stopped
flipping regflags while reading the configuration, and that was when
we should have noticed and stopped dropping REG_EXTENDED bit in the
"now we can commit what type to use" helper function.

There is no reason to do this anymore, so stop doing it, more to
reduce "wait this is used under fixed/BRE/PCRE how?" confusion when
reading the code, than to to save ourselves trivial CPU cycles by
removing one assignment.

1. "built-in "git grep"", 2006-04-30.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:52:37 +09:00
Jeff King 1a168e5c86 convert unchecked snprintf into xsnprintf
These calls to snprintf should always succeed, because their
input is small and fixed. Let's use xsnprintf to make sure
this is the case (and to make auditing for actual truncation
easier).

These could be candidates for turning into heap buffers, but
they fall into a few broad categories that make it not worth
doing:

  - formatting single numbers is simple enough that we can
    see the result should fit

  - the size of a sha1 is likewise well-known, and I didn't
    want to cause unnecessary conflicts with the ongoing
    process to convert these constants to GIT_MAX_HEXSZ

  - the interface for curl_errorstr is dictated by curl

Signed-off-by: Jeff King <peff@peff.net>
2017-03-30 14:59:50 -07:00
Brandon Williams 379642bcd8 grep: set default output method
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17 12:18:41 -07:00
Brandon Williams 4538eef564 grep: add submodules as a grep source type
Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22 11:47:33 -08:00
Junio C Hamano 6a67695268 Merge branch 'js/regexec-buf'
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region.  This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.

* js/regexec-buf:
  regex: use regexec_buf()
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-26 16:09:19 -07:00
Johannes Schindelin b7d36ffca0 regex: use regexec_buf()
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf(), the code has
become much cleaner.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 13:56:15 -07:00
Junio C Hamano 1a5f1a3f25 Merge branch 'js/am-3-merge-recursive-direct'
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.

* js/am-3-merge-recursive-direct:
  merge-recursive: flush output buffer even when erroring out
  merge_trees(): ensure that the callers release output buffer
  merge-recursive: offer an option to retain the output in 'obuf'
  merge-recursive: write the commit title in one go
  merge-recursive: flush output buffer before printing error messages
  am -3: use merge_recursive() directly again
  merge-recursive: switch to returning errors instead of dying
  merge-recursive: handle return values indicating errors
  merge-recursive: allow write_tree_from_memory() to error out
  merge-recursive: avoid returning a wholesale struct
  merge_recursive: abort properly upon errors
  prepare the builtins for a libified merge_recursive()
  merge-recursive: clarify code in was_tracked()
  die(_("BUG")): avoid translating bug messages
  die("bug"): report bugs consistently
  t5520: verify that `pull --rebase` shows the helpful advice when failing
2016-08-10 12:33:20 -07:00
Junio C Hamano b422d99658 Merge branch 'jc/grep-commandline-vs-configuration'
"git -c grep.patternType=extended log --basic-regexp" misbehaved
because the internal API to access the grep machinery was not
designed well.

* jc/grep-commandline-vs-configuration:
  grep: further simplify setting the pattern type
2016-08-04 14:39:18 -07:00
Johannes Schindelin ef1177d18e die("bug"): report bugs consistently
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".

As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-26 11:13:44 -07:00
Junio C Hamano 8465541e8c grep: further simplify setting the pattern type
When c5c31d33 (grep: move pattern-type bits support to top-level
grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper
function, the intention was to allow the users of grep API to having
to fiddle only with .pattern_type_option (which can be set to "fixed",
"basic", "extended", and "pcre"), and then immediately before compiling
the pattern strings for use, call grep_commit_pattern_type() to have
it prepare various bits in the grep_opt structure (like .fixed,
.regflags, etc.).

However, grep_set_pattern_type_option() helper function the grep API
internally uses were left as an external function by mistake.  This
function shouldn't have been made callable by the users of the API.

Later when the grep API was used in revision traversal machinery,
the caller then mistakenly started calling the function around
34a4ae55 (log --grep: use the same helper to set -E/-F options as
"git grep", 2012-10-03), instead of setting the .pattern_type_option
field and letting the grep_commit_pattern_type() to take care of the
details.

This caused an unnecessary bug that made a configured
grep.patternType take precedence over the command line options
(e.g. --basic-regexp, --fixed-strings) in "git log" family of
commands.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-25 09:16:18 -07:00
Junio C Hamano a883c31af6 Merge branch 'nd/icase'
"git grep -i" has been taught to fold case in non-ascii locales
correctly.

* nd/icase:
  grep.c: reuse "icase" variable
  diffcore-pickaxe: support case insensitive match on non-ascii
  diffcore-pickaxe: Add regcomp_or_die()
  grep/pcre: support utf-8
  gettext: add is_utf8_locale()
  grep/pcre: prepare locale-dependent tables for icase matching
  grep: rewrite an if/else condition to avoid duplicate expression
  grep/icase: avoid kwsset when -F is specified
  grep/icase: avoid kwsset on literal non-ascii strings
  test-regex: expose full regcomp() to the command line
  test-regex: isolate the bug test code
  grep: break down an "if" stmt in preparation for next changes
2016-07-19 13:22:17 -07:00
Nguyễn Thái Ngọc Duy 695f95ba5d grep.c: reuse "icase" variable
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy 18547aacf5 grep/pcre: support utf-8
In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).

We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.

The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy 9d9babb84d grep/pcre: prepare locale-dependent tables for icase matching
The default tables are usually built with C locale and only suitable
for LANG=C or similar.  This should make case insensitive search work
correctly for all single-byte charsets.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy e944d9d932 grep: rewrite an if/else condition to avoid duplicate expression
"!icase || ascii_only" is repeated twice in this if/else chain as this
series evolves. Rewrite it (and basically revert the first if
condition back to before the "grep: break down an "if" stmt..." commit).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy 793dc676e0 grep/icase: avoid kwsset when -F is specified
Similar to the previous commit, we can't use kws on icase search
outside ascii range. But we can't simply pass the pattern to
regcomp/pcre like the previous commit because it may contain regex
special characters, so we need to quote the regex first.

To avoid misquote traps that could lead to undefined behavior, we
always stick to basic regex engine in this case. We don't need fancy
features for grepping a literal string anyway.

basic_regex_quote_buf() assumes that if the pattern is in a multibyte
encoding, ascii chars must be unambiguously encoded as single
bytes. This is true at least for UTF-8. For others, let's wait until
people yell up. Chances are nobody uses multibyte, non utf-8 charsets
anymore.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:30 -07:00
Nguyễn Thái Ngọc Duy 5c1ebcca4d grep/icase: avoid kwsset on literal non-ascii strings
When we detect the pattern is just a literal string, we avoid heavy
regex engine and use fast substring search implemented in kwsset.c.
But kws uses git-ctype which is locale-independent so it does not know
how to fold case properly outside ascii range. Let regcomp or pcre
take care of this case instead. Slower, but accurate.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27 07:31:35 -07:00
Nguyễn Thái Ngọc Duy 60452a30f5 grep: break down an "if" stmt in preparation for next changes
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27 07:31:35 -07:00
Junio C Hamano d15c05a5d0 Merge branch 'rs/xdiff-hunk-with-func-line'
"git show -W" (extend hunks to cover the entire function, delimited
by lines that match the "funcname" pattern) used to show the entire
file when a change added an entire function at the end of the file,
which has been fixed.

* rs/xdiff-hunk-with-func-line:
  xdiff: fix merging of appended hunk with -W
  grep: -W: don't extend context to trailing empty lines
  t7810: add test for grep -W and trailing empty context lines
  xdiff: don't trim common tail with -W
  xdiff: -W: don't include common trailing empty lines in context
  xdiff: ignore empty lines before added functions with -W
  xdiff: handle appended chunks better with -W
  xdiff: factor out match_func_rec()
  t4051: rewrite, add more tests
2016-06-20 11:01:04 -07:00
René Scharfe 4aa2c4753d grep: -W: don't extend context to trailing empty lines
Empty lines between functions are shown by grep -W, as it considers them
to be part of the function preceding them.  They are not interesting in
most languages.  The previous patches stopped showing them for diff -W.

Stop showing empty lines trailing a function with grep -W.  Grep scans
the lines of a buffer from top to bottom and prints matching lines
immediately.  Thus we need to peek ahead in order to determine if an
empty line is part of a function body and worth showing or not.

Remember how far ahead we peeked in order to avoid having to do so
repeatedly when handling multiple consecutive empty lines.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31 13:08:56 -07:00
Nguyễn Thái Ngọc Duy 7645d8f119 grep.c: use error_errno()
While at there, improve the error message a bit (what operation failed?)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09 12:29:08 -07:00
Jeff King 3733e69464 use xmallocz to avoid size arithmetic
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.

There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.

In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King 7ce4fb948c color: add color_set helper for copying raw colors
To set up default colors, we sometimes strcpy() from the
default string literals into our color buffers. This isn't a
bug (assuming the destination is COLOR_MAXLEN bytes), but
makes it harder to audit the code for problematic strcpy
calls.

Let's introduce a color_set which copies under the
assumption that there are COLOR_MAXLEN bytes in the
destination (of course you can call it on a smaller buffer,
so this isn't providing a huge amount of safety, but it's
more convenient than calling xsnprintf yourself).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King 19bdd3e7e1 grep: use xsnprintf to format failure message
This looks at first glance like the sprintf can overflow our
buffer, but it's actually fine; the p->origin string is
something constant and small, like "command line" or "-e
option".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano 092c4be7f5 Merge branch 'jk/blame-commit-label'
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".

* jk/blame-commit-label:
  blame.c: fix garbled error message
  use xstrdup_or_null to replace ternary conditionals
  builtin/commit.c: use xstrdup_or_null instead of envdup
  builtin/apply.c: use xstrdup_or_null instead of null_strdup
  git-compat-util: add xstrdup_or_null helper
2015-02-11 13:39:50 -08:00
Jeff King 8c53f0719b use xstrdup_or_null to replace ternary conditionals
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x).
The change is fairly mechanical, with the exception of
resolve_refdup, which can eliminate a temporary variable.

There are still a few hits grepping for "?.*xstrdup", but
these are of slightly different forms and cannot be
converted (e.g., "x ? xstrdup(x->foo) : NULL").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13 10:05:48 -08:00
Junio C Hamano bf1f639ea2 Merge branch 'rs/grep-color-words'
Allow painting or not painting (partial) matches in context lines
when showing "grep -C<num>" output in color.

* rs/grep-color-words:
  grep: add color.grep.matchcontext and color.grep.matchselected
2014-10-31 11:49:47 -07:00
René Scharfe 79a77109d3 grep: add color.grep.matchcontext and color.grep.matchselected
The config option color.grep.match can be used to specify the highlighting
color for matching strings.  Add the options matchContext and matchSelected
to allow different colors to be specified for matching strings in the
context vs. in selected lines.  This is similar to the ms and mc specifiers
in GNU grep's environment variable GREP_COLORS.

Tests are from Zoltan Klinger's earlier attempt to solve the same
issue in a different way.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-28 10:33:50 -07:00
Jeff King f6c5a2968c color_parse: do not mention variable name in error message
Originally the color-parsing function was used only for
config variables. It made sense to pass the variable name so
that the die() message could be something like:

  $ git -c color.branch.plain=bogus branch
  fatal: bad color value 'bogus' for variable 'color.branch.plain'

These days we call it in other contexts, and the resulting
error messages are a little confusing:

  $ git log --pretty='%C(bogus)'
  fatal: bad color value 'bogus' for variable '--pretty format'

  $ git config --get-color foo.bar bogus
  fatal: bad color value 'bogus' for variable 'command line'

This patch teaches color_parse to complain only about the
value, and then return an error code. Config callers can
then propagate that up to the config parser, which mentions
the variable name. Other callers can provide a custom
message. After this patch these three cases now look like:

  $ git -c color.branch.plain=bogus branch
  error: invalid color value: bogus
  fatal: unable to parse 'color.branch.plain' from command-line config

  $ git log --pretty='%C(bogus)'
  error: invalid color value: bogus
  fatal: unable to parse --pretty format

  $ git config --get-color foo.bar bogus
  error: invalid color value: bogus
  fatal: unable to parse default color value

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-14 11:01:21 -07:00
Junio C Hamano 52df9173fa Merge branch 'as/grep-fullname-config'
Add a configuration variable to force --full-name to be default for
"git grep".

This may cause regressions on scripted users that do not expect
this new behaviour.

* as/grep-fullname-config:
  grep: add grep.fullName config variable
2014-06-03 12:06:39 -07:00
Andreas Schwab 6453f7b348 grep: add grep.fullName config variable
This configuration variable sets the default for the --full-name option.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 12:38:00 -07:00
Junio C Hamano 6f6be80ef1 Merge branch 'rs/grep-h-c'
"git grep" learns to handle combination of "-h (no header)" and "-c
(counts)".

* rs/grep-h-c:
  grep: support -h (no header) with --count
  t7810: add missing variables to tests in loop
2014-03-18 13:51:20 -07:00
René Scharfe f76d947ae1 grep: support -h (no header) with --count
Suppress printing the header (filename) with -h even if in -c/--count
mode.  GNU grep and OpenBSD's grep do the same.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-11 15:05:28 -07:00
Sun He 50546b15ed Use hashcpy() when copying object names
We invented hashcpy() to keep the abstraction of "object name"
behind it.  Use it instead of calling memcpy() with hard-coded
20-byte length when moving object names between pieces of memory.

Leave ppc/sha1.c as-is, because the function is about the SHA-1 hash
algorithm whose output is and will always be 20 bytes.

Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Sun He <sunheehnus@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-06 14:03:12 -08:00
Jeff King 335ec3bf41 grep: allow to use textconv filters
Recently and not so recently, we made sure that log/grep type operations
use textconv filters when a userfacing diff would do the same:

ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28)
b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28)
0508fe5 (combine-diff: respect textconv attributes, 2011-05-23)

"git grep" currently does not use textconv filters at all, that is
neither for displaying the match and context nor for the actual grepping,
even when requested by --textconv.

Introduce an option "--textconv" which makes git grep use any configured
textconv filters for grepping and output purposes. It is off by default.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10 10:27:31 -07:00
Antoine Pelisse 3ce3ffb840 fix clang -Wtautological-compare with unsigned enum
Create a GREP_HEADER_FIELD_MIN so we can check that the field value is
sane and silence the clang warning.

Clang warning happens because the enum is unsigned (this is
implementation-defined, and there is no negative fields) and the check
is then tautological.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: John Keeping <john@keeping.me.uk>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-25 07:35:55 -08:00
Jeff King e034d1bb92 Merge branch 'nd/grep-true-path'
"git grep -e pattern <tree>" asked the attribute system to read
"<tree>:.gitattributes" file in the working tree, which was
nonsense.

* nd/grep-true-path:
  grep: stop looking at random places for .gitattributes
2012-10-29 04:13:16 -04:00
Nguyễn Thái Ngọc Duy 55c61688ea grep: stop looking at random places for .gitattributes
grep searches for .gitattributes using "name" field in struct
grep_source but that field is not real on-disk path name. For example,
"grep pattern rev" fills the field with "rev:path", and Git looks for
.gitattributes in the (non-existent but exploitable) path "rev:path"
instead of "path".

This patch passes real paths down to grep_source_load_driver() when:

 - grep on work tree
 - grep on the index
 - grep a commit (or a tag if it points to a commit)

so that these cases look up .gitattributes at proper paths.
.gitattributes lookup is disabled in all other cases.

Initial-work-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-12 08:24:44 -07:00
Junio C Hamano 918d4e1c90 revisions: initialize revs->grep_filter using grep_init()
Instead of using the hand-rolled initialization sequence,
use grep_init() to populate the necessary bits.  This opens
the door to allow the calling commands to optionally read
grep.* configuration variables via git_config() if they
want to.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 23:21:29 -07:00
Junio C Hamano c5c31d3381 grep: move pattern-type bits support to top-level grep.[ch]
Switching between -E/-G/-P/-F correctly needs a lot more than just
flipping opt->regflags bit these days, and we have a nice helper
function buried in builtin/grep.c for the sole use of "git grep".

Extract it so that "log --grep" family can also use it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 23:21:29 -07:00
Junio C Hamano 7687a0541e grep: move the configuration parsing logic to grep.[ch]
The configuration handling is a library-ish part of this program,
that is not specific to "git grep" command.  It should be reusable
by "log" and others.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 16:17:50 -07:00
Junio C Hamano baa6378ff2 log --grep-reflog: reject the option without -g
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29 12:07:04 -07:00
Nguyễn Thái Ngọc Duy 72fd13f71c revision: add --grep-reflog to filter commits by reflog messages
Similar to --author/--committer which filters commits by author and
committer header fields. --grep-reflog adds a fake "reflog" header to
commit and a grep filter to search on that line.

All rules to --author/--committer apply except no timestamp stripping.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29 11:41:14 -07:00
Nguyễn Thái Ngọc Duy ad4813b3c2 grep: prepare for new header field filter
grep supports only author and committer headers, which have the same
special treatment that later headers may or may not have. Check for
field type and only strip_timestamp() when the field is either author
or committer.

GREP_HEADER_FIELD_MAX is put in the grep_header_field enum to be
calculated automatically, correctly, as long as it's at the end of the
enum.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29 11:40:58 -07:00
Junio C Hamano 3083301ead grep.c: make two symbols really file-scope static this time
Adding a declaration at the beginning is not sufficient for obvious
reasons. The definition has to be made static.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-20 14:20:09 -07:00
Junio C Hamano 07a7d656dd grep.c: mark private file-scope symbols as static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15 23:35:39 -07:00
Junio C Hamano 13e4fc7e01 log --grep/--author: honor --all-match honored for multiple --grep patterns
When we have both header expression (which has to be an OR node by
construction) and a pattern expression (which could be anything), we
create a new top-level OR node to bind them together, and the
resulting expression structure looks like this:

             OR
        /          \
       /            \
   pattern            OR
     / \           /     \
    .....    committer    OR
                         /   \
                     author   TRUE

The three elements on the top-level backbone that are inspected by
the "all-match" logic are "pattern", "committer" and "author".  When
there are more than one elements in the "pattern", the top-level
node of the "pattern" part of the subtree is an OR, and that node is
inspected by "all-match".

The result ends up ignoring the "--all-match" given from the command
line.  A match on either side of the pattern is considered a match,
hence:

        git log --grep=A --grep=B --author=C --all-match

shows the same "authored by C and has either A or B" that is correct
only when run without "--all-match".

Fix this by turning the resulting expression around when "--all-match"
is in effect, like this:

              OR
          /        \
         /          \
        /              OR
    committer        /    \
                 author    \
                           pattern

The set of nodes on the top-level backbone in the resulting
expression becomes "committer", "author", and the nodes that are on
the top-level backbone of the "pattern" subexpression.  This makes
the "all-match" logic inspect the same nodes in "pattern" as the
case without the author and/or the committer restriction, and makes
the earlier "log" example to show "authored by C and has A and has
B", which is what the command line expects.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14 10:12:56 -07:00
Junio C Hamano 17bf35a3c7 grep: teach --debug option to dump the parse tree
Our "grep" allows complex boolean expressions to be formed to match
each individual line with operators like --and, '(', ')' and --not.
Introduce the "--debug" option to show the parse tree to help people
who want to debug and enhance it.

Also "log" learns "--grep-debug" option to do the same.  The command
line parser to the log family is a lot more limited than the general
"git grep" parser, but it has special handling for header matching
(e.g. "--author"), and a parse tree is valuable when working on it.

Note that "--all-match" is *not* any individual node in the parse
tree.  It is an instruction to the evaluator to check all the nodes
in the top-level backbone have matched and reject a document as
non-matching otherwise.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14 10:10:35 -07:00
Junio C Hamano 2147cb2762 Merge branch 'rs/maint-grep-F' into maint
"git grep -e '$pattern'", unlike the case where the patterns are read from
a file, did not treat individual lines in the given pattern argument as
separate regular expressions as it should.

By René Scharfe
* rs/maint-grep-F:
  grep: stop leaking line strings with -f
  grep: support newline separated pattern list
  grep: factor out do_append_grep_pat()
  grep: factor out create_grep_pat()
2012-06-01 13:01:41 -07:00
René Scharfe 526a858a99 grep: support newline separated pattern list
Currently, patterns that contain newline characters don't match anything
when given to git grep.  Regular grep(1) interprets patterns as lists of
newline separated search strings instead.

Implement this functionality by creating and inserting extra grep_pat
structures for patterns consisting of multiple lines when appending to
the pattern lists.  For simplicity, all pattern strings are duplicated.
The original pattern is truncated in place to make it contain only the
first line.

Requested-by: Torne (Richard Coles) <torne@google.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-20 15:25:46 -07:00
René Scharfe 2b3873ff34 grep: factor out do_append_grep_pat()
Add do_append_grep_pat() as a shared function for adding patterns to
the header pattern list and the general pattern list.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-20 15:12:25 -07:00
René Scharfe fc45675110 grep: factor out create_grep_pat()
Add create_grep_pat(), a shared helper for all grep pattern allocation
and initialization needs.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-20 15:12:22 -07:00
Junio C Hamano f5f37461a9 Merge branch 'ah/maint-grep-double-init' into maint
By Angus Hammond
* ah/maint-grep-double-init:
  grep.c: remove redundant line of code
2012-05-11 11:16:09 -07:00
Angus Hammond 2385f24625 grep.c: remove redundant line of code
Signed-off-by: Angus Hammond <angusgh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07 11:25:04 -07:00
Junio C Hamano 1e4d0875ac Merge branch 'jc/pickaxe-ignore-case'
By Junio C Hamano (2) and Ramsay Jones (1)
* jc/pickaxe-ignore-case:
  ctype.c: Fix a sparse warning
  pickaxe: allow -i to search in patch case-insensitively
  grep: use static trans-case table
2012-03-07 12:12:59 -08:00
Junio C Hamano 0f871cf56e grep: use static trans-case table
In order to prepare the kwset machinery for a case-insensitive search, we
used to use a static table of 256 elements and filled it every time before
calling kwsalloc().  Because the kwset machinery will never modify this
table, just allocate a single instance globally and fill it at the compile
time.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-28 14:29:37 -08:00
Junio C Hamano 4d06691eec Sync with 1.7.8.5 2012-02-26 16:42:35 -08:00
Michał Kiedrowicz fba4f1259d grep -P: Fix matching ^ and $
When "git grep" is run with -P/--perl-regexp, it doesn't match ^ and $ at
the beginning/end of the line.  This is because PCRE normally matches ^
and $ at the beginning/end of the whole text, not for each line, and "git
grep" passes a large chunk of text (possibly containing many lines) to
pcre_exec() and then splits the text into lines.

This makes "git grep -P" behave differently from "git grep -E" and also
from "grep -P" and "pcregrep":

	$ cat file
	a
	 b
	$ git grep --no-index -P '^ ' file
	$ git grep --no-index -E '^ ' file
	file: b
	$ grep -c -P '^ ' file
	 b
	$ pcregrep -c '^ ' file
	 b

Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-26 16:34:03 -08:00
Jeff King 08265798e1 grep: load file data after checking binary-ness
Usually we load each file to grep into memory, check whether
it's binary, and then either grep it (the default) or not
(if "-I" was given).

In the "-I" case, we can skip loading the file entirely if
it is marked as binary via gitattributes. On my giant
3-gigabyte media repository, doing "git grep -I foo" went
from:

  real    0m0.712s
  user    0m0.044s
  sys     0m4.780s

to:

  real    0m0.026s
  user    0m0.016s
  sys     0m0.020s

Obviously this is an extreme example. The repo is almost
entirely binary files, and you can see that we spent all of
our time asking the kernel to read() the data. However, with
a cold disk cache, even avoiding a few binary files can have
an impact.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King 41b59bfcb1 grep: respect diff attributes for binary-ness
There is currently no way for users to tell git-grep that a
particular path is or is not a binary file; instead, grep
always relies on its auto-detection (or the user specifying
"-a" to treat all binary-looking files like text).

This patch teaches git-grep to use the same attribute lookup
that is used by git-diff. We could add a new "grep" flag,
but that is unnecessarily complex and unlikely to be useful.
Despite the name, the "-diff" attribute (or "diff=foo" and
the associated diff.foo.binary config option) are really
about describing the contents of the path. It's simply
historical that diff was the only thing that cared about
these attributes in the past.

And if this simple approach turns out to be insufficient, we
still have a backwards-compatible path forward: we can add a
separate "grep" attribute, and fall back to respecting
"diff" if it is unset.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King 94ad9d9e07 grep: cache userdiff_driver in grep_source
Right now, grep only uses the userdiff_driver for one thing:
looking up funcname patterns for "-p" and "-W".  As new uses
for userdiff drivers are added to the grep code, we want to
minimize attribute lookups, which can be expensive.

It might seem at first that this would also optimize multiple
lookups when the funcname pattern for a file is needed
multiple times. However, the compiled funcname pattern is
already cached in struct grep_opt's "priv" member, so
multiple lookups are already suppressed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King c876d6da88 grep: drop grep_buffer's "name" parameter
Before the grep_source interface existed, grep_buffer was
used by two types of callers:

  1. Ones which pulled a file into a buffer, and then wanted
     to supply the file's name for the output (i.e.,
     git grep).

  2. Ones which really just wanted to grep a buffer (i.e.,
     git log --grep).

Callers in set (1) should now be using grep_source. Callers
in set (2) always pass NULL for the "name" parameter of
grep_buffer. We can therefore get rid of this now-useless
parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King e1327023ea grep: refactor the concept of "grep source" into an object
The main interface to the low-level grep code is
grep_buffer, which takes a pointer to a buffer and a size.
This is convenient and flexible (we use it to grep commit
bodies, files on disk, and blobs by sha1), but it makes it
hard to pass extra information about what we are grepping
(either for correctness, like overriding binary
auto-detection, or for optimizations, like lazily loading
blob contents).

Instead, let's encapsulate the idea of a "grep source",
including the buffer, its size, and where the data is coming
from. This is similar to the diff_filespec structure used by
the diff code (unsurprising, since future patches will
implement some of the same optimizations found there).

The diffstat is slightly scarier than the actual patch
content. Most of the modified lines are simply replacing
access to raw variables with their counterparts that are now
in a "struct grep_source". Most of the added lines were
taken from builtin/grep.c, which partially abstracted the
idea of grep sources (for file vs sha1 sources).

Instead of dropping the now-redundant code, this patch
leaves builtin/grep.c using the traditional grep_buffer
interface (which now wraps the grep_source interface). That
makes it easy to test that there is no change of behavior
(yet).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Jeff King b3aeb285d0 grep: move sha1-reading mutex into low-level code
The multi-threaded git-grep code needs to serialize access
to the thread-unsafe read_sha1_file call. It does this with
a mutex that is local to builtin/grep.c.

Let's instead push this down into grep.c, where it can be
used by both builtin/grep.c and grep.c. This will let us
safely teach the low-level grep.c code tricks that involve
reading from the object db.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Jeff King 78db6ea9dc grep: make locking flag global
The low-level grep code traditionally didn't care about
threading, as it doesn't do any threading itself and didn't
call out to other non-thread-safe code.  That changed with
0579f91 (grep: enable threading with -p and -W using lazy
attribute lookup, 2011-12-12), which pushed the lookup of
funcname attributes (which is not thread-safe) into the
low-level grep code.

As a result, the low-level code learned about a new global
"grep_attr_mutex" to serialize access to the attribute code.
A multi-threaded caller (e.g., builtin/grep.c) is expected
to initialize the mutex and set "use_threads" in the
grep_opt structure. The low-level code only uses the lock if
use_threads is set.

However, putting the use_threads flag into the grep_opt
struct is not the most logical place. Whether threading is
in use is not something that matters for each call to
grep_buffer, but is instead global to the whole program
(i.e., if any thread is doing multi-threaded grep, every
other thread, even if it thinks it is doing its own
single-threaded grep, would need to use the locking).  In
practice, this distinction isn't a problem for us, because
the only user of multi-threaded grep is "git-grep", which
does nothing except call grep.

This patch turns the opt->use_threads flag into a global
flag. More important than the nit-picking semantic argument
above is that this means that the locking functions don't
need to actually have access to a grep_opt to know whether
to lock. Which in turn can make adding new locks simpler, as
we don't need to pass around a grep_opt.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Thomas Rast 0579f91dd7 grep: enable threading with -p and -W using lazy attribute lookup
Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16 15:47:10 -08:00
Thomas Rast b8ffedca6f grep: load funcname patterns for -W
git-grep avoids loading the funcname patterns unless they are needed.
ba8ea74 (grep: add option to show whole function as context,
2011-08-01) forgot to extend this test also to the new funcbody
feature.  Do so.

The catch is that we also have to disable threading when using
userdiff, as explained in grep_threads_ok().  So we must be careful to
introduce the same test there.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-12 15:45:42 -08:00
Junio C Hamano 8a8895baaf Merge branch 'fk/use-kwset-pickaxe-grep-f'
* fk/use-kwset-pickaxe-grep-f:
  obstack: Fix portability issues
  Use kwset in grep
  Use kwset in pickaxe
  Adapt the kwset code to Git
  Add string search routines from GNU grep
  Add obstack.[ch] from EGLIBC 2.10
2011-09-02 10:00:38 -07:00
Junio C Hamano f946b465d7 Merge branch 'jk/color-and-pager'
* jk/color-and-pager:
  want_color: automatically fallback to color.ui
  diff: don't load color config in plumbing
  config: refactor get_colorbool function
  color: delay auto-color decision until point of use
  git_config_colorbool: refactor stdout_is_tty handling
  diff: refactor COLOR_DIFF from a flag into an int
  setup_pager: set GIT_PAGER_IN_USE
  t7006: use test_config helpers
  test-lib: add helper functions for config
  t7006: modernize calls to unset

Conflicts:
	builtin/commit.c
	parse-options.c
2011-08-28 21:19:16 -07:00
Fredrik Kuivinen 9eceddeec6 Use kwset in grep
Benchmarks for the hot cache case:

before:
$ perf stat --repeat=5 git grep qwerty > /dev/null

Performance counter stats for 'git grep qwerty' (5 runs):

        3,478,085 cache-misses             #      2.322 M/sec   ( +-   2.690% )
       11,356,177 cache-references         #      7.582 M/sec   ( +-   2.598% )
        3,872,184 branch-misses            #      0.363 %       ( +-   0.258% )
    1,067,367,848 branches                 #    712.673 M/sec   ( +-   2.622% )
    3,828,370,782 instructions             #      0.947 IPC     ( +-   0.033% )
    4,043,832,831 cycles                   #   2700.037 M/sec   ( +-   0.167% )
            8,518 page-faults              #      0.006 M/sec   ( +-   3.648% )
              847 CPU-migrations           #      0.001 M/sec   ( +-   3.262% )
            6,546 context-switches         #      0.004 M/sec   ( +-   2.292% )
      1497.695495 task-clock-msecs         #      3.303 CPUs    ( +-   2.550% )

       0.453394396  seconds time elapsed   ( +-   0.912% )

after:
$ perf stat --repeat=5 git grep qwerty > /dev/null

Performance counter stats for 'git grep qwerty' (5 runs):

        2,989,918 cache-misses             #      3.166 M/sec   ( +-   5.013% )
       10,986,041 cache-references         #     11.633 M/sec   ( +-   4.899% )  (scaled from 95.06%)
        3,511,993 branch-misses            #      1.422 %       ( +-   0.785% )
      246,893,561 branches                 #    261.433 M/sec   ( +-   3.967% )
    1,392,727,757 instructions             #      0.564 IPC     ( +-   0.040% )
    2,468,142,397 cycles                   #   2613.494 M/sec   ( +-   0.110% )
            7,747 page-faults              #      0.008 M/sec   ( +-   3.995% )
              897 CPU-migrations           #      0.001 M/sec   ( +-   2.383% )
            6,535 context-switches         #      0.007 M/sec   ( +-   1.993% )
       944.384228 task-clock-msecs         #      3.177 CPUs    ( +-   0.268% )

       0.297257643  seconds time elapsed   ( +-   0.450% )

So we gain about 35% by using the kwset code.

As a side effect of using kwset two grep tests are fixed by this
patch. The first is fixed because kwset can deal with case-insensitive
search containing NULs, something strcasestr cannot do. The second one
is fixed because we consider patterns containing NULs as fixed strings
(regcomp cannot accept patterns with NULs).

Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-20 22:33:58 -07:00
Jeff King daa0c3d971 color: delay auto-color decision until point of use
When we read a color value either from a config file or from
the command line, we use git_config_colorbool to convert it
from the tristate always/never/auto into a single yes/no
boolean value.

This has some timing implications with respect to starting
a pager.

If we start (or decide not to start) the pager before
checking the colorbool, everything is fine. Either isatty(1)
will give us the right information, or we will properly
check for pager_in_use().

However, if we decide to start a pager after we have checked
the colorbool, things are not so simple. If stdout is a tty,
then we will have already decided to use color. However, the
user may also have configured color.pager not to use color
with the pager. In this case, we need to actually turn off
color. Unfortunately, the pager code has no idea which color
variables were turned on (and there are many of them
throughout the code, and they may even have been manipulated
after the colorbool selection by something like "--color" on
the command line).

This bug can be seen any time a pager is started after
config and command line options are checked. This has
affected "git diff" since 89d07f7 (diff: don't run pager if
user asked for a diff style exit code, 2007-08-12). It has
also affect the log family since 1fda91b (Fix 'git log'
early pager startup error case, 2010-08-24).

This patch splits the notion of parsing a colorbool and
actually checking the configuration. The "use_color"
variables now have an additional possible value,
GIT_COLOR_AUTO. Users of the variable should use the new
"want_color()" wrapper, which will lazily determine and
cache the auto-color decision.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-19 15:51:34 -07:00
René Scharfe ba8ea7496f grep: add option to show whole function as context
Add a new option, -W, to show the whole surrounding function of a match.

It uses the same regular expressions as -p and diff to find the beginning
of sections.

Currently it will not display comments in front of a function, but those
that are following one.  Despite this shortcoming it is already useful,
e.g. to simply see a more complete applicable context or to extract whole
functions.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-01 16:09:15 -07:00
René Scharfe 1d84f72ef1 grep: add --heading
With --heading, the filename is printed once before matches from that
file instead of at the start of each line, giving more screen space to
the actual search results.

This option is taken from ack (http://betterthangrep.com/).  And now
git grep can dress up like it:

	$ git config alias.ack "grep --break --heading --line-number"

	$ git ack -e --heading
	Documentation/git-grep.txt
	154:--heading::

	t/t7810-grep.sh
	785:test_expect_success 'grep --heading' '
	786:    git grep --heading -e char -e lo_w hello.c hello_world >actual &&
	808:    git grep --break --heading -n --color \

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 18:15:27 -07:00
René Scharfe a8f0e7649e grep: add --break
With --break, an empty line is printed between matches from different
files, increasing readability.  This option is taken from ack
(http://betterthangrep.com/).

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 18:15:26 -07:00
René Scharfe 08303c3636 grep: fix coloring of hunk marks between files
Commit 431d6e7b (grep: enable threading for context line printing)
split the printing of the "--\n" mark between results from different
files out into two places: show_line() in grep.c for the non-threaded
case and work_done() in builtin/grep.c for the threaded case.  Commit
55f638bd (grep: Colorize filename, line number, and separator) updated
the former, but not the latter, so the separators between files are
not colored if threads are used.

This patch merges the two.  In the threaded case, hunk marks are now
printed by show_line() for every file, including the first one, and the
very first mark is simply skipped in work_done().  This ensures that the
output is properly colored and works just as well.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 18:10:50 -07:00
Michał Kiedrowicz 63e7e9d8b6 git-grep: Learn PCRE
This patch teaches git-grep the --perl-regexp/-P options (naming
borrowed from GNU grep) in order to allow specifying PCRE regexes on the
command line.

PCRE has a number of features which make them more handy to use than
POSIX regexes, like consistent escaping rules, extended character
classes, ungreedy matching etc.

git isn't build with PCRE support automatically. USE_LIBPCRE environment
variable must be enabled (like `make USE_LIBPCRE=YesPlease`).

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 16:29:33 -07:00
Michał Kiedrowicz a30c148aa7 grep: Extract compile_regexp_failed() from compile_regexp()
This simplifies compile_regexp() a little and allows re-using error
handling code.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 16:28:53 -07:00
Michał Kiedrowicz 8997da3820 grep: Fix a typo in a comment
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 16:28:16 -07:00
Michał Kiedrowicz 97e7778422 grep: Put calls to fixmatch() and regmatch() into patmatch()
Both match_one_pattern() and look_ahead() use fixmatch() and regmatch()
in the same way. They really want to match a pattern againt a string,
but now they need to know if the pattern is fixed or regexp.

This change cleans this up by introducing patmatch() (from "pattern
match") and also simplifies inserting other ways of matching a string.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-05 08:38:12 -07:00
Junio C Hamano 5aaeb733f5 log --author: take union of multiple "author" requests
In the olden days,

    log --author=me --committer=him --grep=this --grep=that

used to be turned into:

    (OR (HEADER-AUTHOR me)
        (HEADER-COMMITTER him)
        (PATTERN this)
        (PATTERN that))

showing my patches that do not have any "this" nor "that", which was
totally useless.

80235ba ("log --author=me --grep=it" should find intersection, not union,
2010-01-17) improved it greatly to turn the same into:

    (ALL-MATCH
      (HEADER-AUTHOR me)
      (HEADER-COMMITTER him)
      (OR (PATTERN this) (PATTERN that)))

That is, "show only patches by me and committed by him, that have either
this or that", which is a lot more natural thing to ask.

We however need to be a bit more clever when the user asks more than one
"author" (or "committer"); because a commit has only one author (and one
committer), they ought to be interpreted as asking for union to be useful.
The current implementation simply added another author/committer pattern
at the same top-level for ALL-MATCH to insist on matching all, finding
nothing.

Turn

    log --author=me --author=her \
    	--committer=him --committer=you \
	--grep=this --grep=that

into

    (ALL-MATCH
      (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her))
      (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you))
      (OR (PATTERN this) (PATTERN that)))

instead.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-13 01:11:55 -07:00
Junio C Hamano 95ce9ce296 grep: move logic to compile header pattern into a separate helper
The callers should be queuing only GREP_PATTERN_HEAD elements to the
header_list queue; simplify the switch and guard it with an assert.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12 19:56:21 -07:00
René Scharfe ed40a0951c grep: support NUL chars in search strings for -F
Search patterns in a file specified with -f can contain NUL characters.
The current code ignores all characters on a line after a NUL.

Pass the actual length of the line all the way from the pattern file to
fixmatch() and use it for case-sensitive fixed string matching.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:07 -07:00
René Scharfe f96e56733a grep: use REG_STARTEND for all matching if available
Refactor REG_STARTEND handling inlook_ahead() into a new helper,
regmatch(), and use it for line matching, too.  This allows regex
matching beyond NUL characters if regexec() supports the flag.  NUL
characters themselves are not matched in any way, though.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:07 -07:00
René Scharfe 52d799a79f grep: continue case insensitive fixed string search after NUL chars
Functions for C strings, like strcasestr(), can't see beyond NUL
characters.  Check if there is such an obstacle on the line and try
again behind it.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:07 -07:00
René Scharfe 1baddf4b37 grep: use memmem() for fixed string search
Allow searching beyond NUL characters by using memmem() instead of
strstr().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
René Scharfe 321ffcc055 grep: --name-only over binary
As with the option -c/--count, git grep with the option -l/--name-only
should work the same with binary files as with text files because
there is no danger of messing up the terminal with control characters
from the contents of matching files.  GNU grep does the same.

Move the check for ->name_only before the one for binary_match_only,
thus making the latter irrelevant for git grep -l.

Reported-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
René Scharfe c30c10cff1 grep: --count over binary
The intent of showing the message "Binary file xyz matches" for
binary files is to avoid annoying users by potentially messing up
their terminals by printing control characters.  In --count mode,
this precaution isn't necessary.

Display counts of matches if -c/--count was specified, even if -a
was not given.  GNU grep does the same.

Moving the check for ->count before the code for handling binary
file also avoids printing context lines if --count and -[ABC] were
used together, so we can remove the part of the comment that
mentions this behaviour.  Again, GNU grep does the same.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
René Scharfe 64fcec78b5 grep: grep: refactor handling of binary mode options
Turn the switch inside-out and add labels for each possible value
of ->binary.  This makes the code easier to read and avoids calling
buffer_is_binary() if the option -a was given.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
Junio C Hamano 07b838f087 Merge branch 'rs/threaded-grep-context'
* rs/threaded-grep-context:
  grep: enable threading for context line printing

Conflicts:
	grep.c
2010-04-03 12:28:39 -07:00
Junio C Hamano f1aa782a3b Merge branch 'ml/color-grep'
* ml/color-grep:
  grep: Colorize selected, context, and function lines
  grep: Colorize filename, line number, and separator
  Add GIT_COLOR_BOLD_* and GIT_COLOR_BG_*
2010-03-20 11:29:36 -07:00
René Scharfe 431d6e7bc8 grep: enable threading for context line printing
If context lines are to be printed, grep separates them with hunk marks
("--\n").  These marks are printed between matches from different files,
too.  They are not printed before the first file, though.

Threading was disabled when context line printing was enabled because
avoiding to print the mark before the first line was an unsolved
synchronisation problem.  This patch separates the code for printing
hunk marks for the threaded and the unthreaded case, allowing threading
to be turned on together with the common -ABC options.

->show_hunk_mark, which controls printing of hunk marks between files in
show_line(), is now set in grep_buffer_1(), but only if some results
have already been printed and threading is disabled.  The threaded case
is handled in work_done().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-15 15:26:35 -07:00
Mark Lodato 00588bb5cd grep: Colorize selected, context, and function lines
Colorize non-matching text of selected lines, context lines, and
function name lines.  The default for all three is no color, but they
can be configured using color.grep.<slot>.  The first two are similar
to the corresponding options in GNU grep, except that GNU grep applies
the color to the entire line, not just non-matching text.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08 00:30:59 -08:00
Mark Lodato 55f638bdc6 grep: Colorize filename, line number, and separator
Colorize the filename, line number, and separator in git grep output, as
GNU grep does.  The colors are customizable through color.grep.<slot>.
The default is to only color the separator (in cyan), since this gives
the biggest legibility increase without overwhelming the user with
colors.  GNU grep also defaults cyan for the separator, but defaults to
magenta for the filename and to green for the line number, as well.

There is one difference from GNU grep: When a binary file matches
without -a, GNU grep does not color the <file> in "Binary file <file>
matches", but we do.

Like GNU grep, if --null is given, the null separators are not colored.

For config.txt, use a a sub-list to describe the slots, rather than
a single paragraph with parentheses, since this is much more readable.

Remove the cast to int for `rm_eo - rm_so` since it is not necessary.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08 00:30:44 -08:00
Junio C Hamano 6b45b8c088 Merge branch 'jc/grep-author-all-match-implicit'
* jc/grep-author-all-match-implicit:
  "log --author=me --grep=it" should find intersection, not union
2010-03-02 12:44:06 -08:00
René Scharfe 79286102ce grep: simplify assignment of ->fixed
After 885d211e, the value of the ->fixed pattern option only depends on
the grep option of the same name.  Regex flags don't matter any more,
because fixed mode and regex mode are strictly separated.  Thus we can
simply copy the value from struct grep_opt to struct grep_pat, as we do
already for ->word_regexp and ->ignore_case.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-03 12:03:40 -08:00
Junio C Hamano b62cb17a65 Merge branch 'fk/threaded-grep'
* fk/threaded-grep:
  Threaded grep
  grep: expose "status-only" feature via -q
2010-01-28 00:46:45 -08:00
Benjamin Kramer 24072c0256 grep: use REG_STARTEND (if available) to speed up regexec
BSD and glibc have an extension to regexec which takes a buffer + length pair
instead of a NUL-terminated string. Since we already have the length computed
this can save us a strlen call inside regexec.

Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 10:44:10 -08:00
Fredrik Kuivinen 5b594f457a Threaded grep
Make git grep use threads when it is available.

The results below are best of five runs in the Linux repository (on a
box with two cores).

With the patch:

git grep qwerty
1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+5774minor)pagefaults 0swaps

Without:

git grep qwerty
1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+3716minor)pagefaults 0swaps

And with a pattern with quite a few matches:

With the patch:

$ /usr/bin/time git grep void
5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+5587minor)pagefaults 0swaps

Without:

$ /usr/bin/time git grep void
5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+3693minor)pagefaults 0swaps

In either case we gain about 40% by the threading.

Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 09:20:07 -08:00
Junio C Hamano 80235ba79e "log --author=me --grep=it" should find intersection, not union
Historically, any grep filter in "git log" family of commands were taken
as restricting to commits with any of the words in the commit log message.
However, the user almost always want to find commits "done by this person
on that topic".  With "--all-match" option, a series of grep patterns can
be turned into a requirement that all of them must produce a match, but
that makes it impossible to ask for "done by me, on either this or that"
with:

	log --author=me --committer=him --grep=this --grep=that

because it will require both "this" and "that" to appear.

Change the "header" parser of grep library to treat the headers specially,
and parse it as:

	(all-match-OR (HEADER-AUTHOR me)
		      (HEADER-COMMITTER him)
		      (OR
		      	(PATTERN this)
			(PATTERN that) ) )

Even though the "log" command line parser doesn't give direct access to
the extended grep syntax to group terms with parentheses, this change will
cover the majority of the case the users would want.

This incidentally revealed that one test in t7002 was bogus.  It ran:

	log --author=Thor --grep=Thu --format='%s'

and expected (wrongly) "Thu" to match "Thursday" in the author/committer
date, but that would never match, as the timestamp in raw commit buffer
does not have the name of the day-of-the-week.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-25 19:28:13 -08:00
Junio C Hamano 885d211e71 grep: rip out pessimization to use fixmatch()
Even when running without the -F (--fixed-strings) option, we checked the
pattern and used fixmatch() codepath when it does not contain any regex
magic.  Finding fixed strings with strstr() surely must be faster than
running the regular expression crud.

Not so.  It turns out that on some libc implementations, using the
regcomp()/regexec() pair is a lot faster than running strstr() and
strcasestr() the fixmatch() codepath uses.  Drop the optimization and use
the fixmatch() codepath only when the user explicitly asked for it with
the -F option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-13 01:05:04 -08:00
Junio C Hamano e2d2e383d8 Merge branch 'jc/maint-1.6.4-grep-lookahead' into jc/maint-grep-lookahead
* jc/maint-1.6.4-grep-lookahead:
  grep: optimize built-in grep by skipping lines that do not hit

This needs to be an evil merge as fixmatch() changed signature since
5183bf6 (grep: Allow case insensitive search of fixed-strings,
2009-11-06).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 00:58:13 -08:00
Junio C Hamano a26345b608 grep: optimize built-in grep by skipping lines that do not hit
The internal "grep" engine we use checks for hits line-by-line, instead of
letting the underlying regexec()/fixmatch() routines scan for the first
match from the rest of the buffer.  This was a major source of overhead
compared to the external grep.

Introduce a "look-ahead" mechanism to find the next line that would
potentially match by using regexec()/fixmatch() in the remainder of the
text to skip unmatching lines, and use it when the query criteria is
simple enough (i.e. punt for an advanced grep boolean expression like
"lines that have both X and Y but not Z" for now) and we are not running
under "-v" (aka "--invert-match") option.

Note that "-L" (aka "--files-without-match") is not a reason to disable
this optimization.  Under the option, we are interested if the file has
any hit at all, and that is what we determine reliably with or without the
optimization.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 00:47:50 -08:00
Brian Collins 5183bf6727 grep: Allow case insensitive search of fixed-strings
"git grep" currently an error when you combine the -F and -i flags.
This isn't in line with how GNU grep handles it.

This patch allows the simultaneous use of those flags.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Brian Collins <bricollins@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-16 16:06:46 -08:00
René Scharfe ed24e401e0 grep: simplify -p output
It was found a bit too loud to show == separators between the function
headers.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-02 21:36:42 -07:00
René Scharfe 60ecac98ed grep -p: support user defined regular expressions
Respect the userdiff attributes and config settings when looking for
lines with function definitions in git grep -p.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:50 -07:00
René Scharfe 2944e4e614 grep: add option -p/--show-function
The new option -p instructs git grep to print the previous function
definition as a context line, similar to diff -p.  Such context lines
are marked with an equal sign instead of a dash.  This option
complements the existing context options -A, -B, -C.

Function definitions are detected using the same heuristic that diff
uses.  User defined regular expressions are not supported, yet.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:49 -07:00
René Scharfe 49de321698 grep: handle pre context lines on demand
Factor out pre context line handling into the new function
show_pre_context() and change the algorithm to rewind by looking for
newline characters and roll forward again, instead of maintaining an
array of line beginnings and ends.

This is slower for hits, but the cost for non-matching lines becomes
zero.  Normally, there are far more non-matching lines, so the time
spent in total decreases.

Before this patch (current Linux kernel repo, best of five runs):

	$ time git grep --no-ext-grep -B1 memset >/dev/null

	real	0m2.134s
	user	0m1.932s
	sys	0m0.196s

	$ time git grep --no-ext-grep -B1000 memset >/dev/null

	real	0m12.059s
	user	0m11.837s
	sys	0m0.224s

The same with this patch:

	$ time git grep --no-ext-grep -B1 memset >/dev/null

	real	0m2.117s
	user	0m1.892s
	sys	0m0.228s

	$ time git grep --no-ext-grep -B1000 memset >/dev/null

	real	0m2.986s
	user	0m2.696s
	sys	0m0.288s

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:48 -07:00
René Scharfe 046802d015 grep: print context hunk marks between files
Print a hunk mark before matches from a new file are shown, in addition
to the current behaviour of printing them if lines have been skipped.

The result is easier to read, as (presumably unrelated) matches from
different files are separated by a hunk mark.  GNU grep does the same.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:46 -07:00
René Scharfe 5dd06d3879 grep: move context hunk mark handling into show_line()
Move last_shown into struct grep_opt, to make it available in
show_line(), and then make the function handle the printing of hunk
marks for context lines in a central place.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:45 -07:00
René Scharfe 84201eae77 grep: fix empty word-regexp matches
The command "git grep -w ''" dies as soon as it encounters an empty line,
reporting (wrongly) that "regexp returned nonsense".  The first hunk of
this patch relaxes the sanity check that is responsible for that,
allowing matches to start at the end.

The second hunk complements it by making sure that empty matches are
rejected if -w was specified, as they are not really words.

GNU grep does the same:

	$ echo foo | grep -c ''
	1
	$ echo foo | grep -c -w ''
	0

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-03 11:32:29 -07:00
René Scharfe 1f5b9cc40e grep: fix colouring of matches with zero length
If a zero-length match is encountered, break out of loop and show the rest
of the line uncoloured.  Otherwise we'd be looping forever, trying to make
progress by advancing the pointer by zero characters.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-01 22:30:39 -07:00
René Scharfe dbb6a4ada6 grep: fix word-regexp at the beginning of lines
After bol is forwarded, it doesn't represent the beginning of the line
any more.  This means that the beginning-of-line marker (^) mustn't match,
i.e. the regex flag REG_NOTBOL needs to be set.

This bug was introduced by fb62eb7fab
("grep -w: forward to next possible position after rejected match").

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-23 16:29:05 -07:00
René Scharfe e701fadb9e grep: fix word-regexp colouring
As noticed by Dmitry Gryazin: When a pattern is found but it doesn't
start and end at word boundaries, bol is forwarded to after the match and
the pattern is searched again.  When a pattern is finally found between
word boundaries, the match offsets are off by the number of characters
that have been skipped.

This patch corrects the offsets to be relative to the value of bol as
passed to match_one_pattern() by its caller.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 18:49:20 -07:00
Junio C Hamano b79376cdf3 Merge branch 'maint'
* maint:
  grep: fix segfault when "git grep '('" is given
  Documentation: fix a grammatical error in api-builtin.txt
  builtin-merge: fix a typo in an error message
2009-04-28 00:46:39 -07:00
Junio C Hamano 2254da06a5 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  grep: fix segfault when "git grep '('" is given
  Documentation: fix a grammatical error in api-builtin.txt
  builtin-merge: fix a typo in an error message
2009-04-28 00:46:25 -07:00
Junio C Hamano 3e73cb2f48 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  grep: fix segfault when "git grep '('" is given
  Documentation: fix a grammatical error in api-builtin.txt
  builtin-merge: fix a typo in an error message
2009-04-28 00:46:20 -07:00
Linus Torvalds c922b01f54 grep: fix segfault when "git grep '('" is given
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-27 17:28:18 -07:00
Michele Ballabio ba150a3fdc git log: avoid segfault with --all-match
Avoid a segfault when the command

	git log --all-match

was issued, by ignoring the option.

Signed-off-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-18 19:10:40 -07:00
Junio C Hamano 747a322bcc grep: cast printf %.*s "precision" argument explicitly to int
On some systems, regoff_t that is the type of rm_so/rm_eo members are
wider than int; %.*s precision specifier expects an int, so use an explicit
cast.

A breakage reported on Darwin by Brian Gernhardt should be fixed with
this patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-08 18:22:44 -07:00
René Scharfe 7e8f59d577 grep: color patterns in output
Coloring matches makes them easier to spot in the output.

Add two options and two parameters: color.grep (to turn coloring on
or off), color.grep.match (to set the color of matches), --color
and --no-color (to turn coloring on or off, respectively).

The output of external greps is not changed.

This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and
Thiago Alves.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:59 -08:00
René Scharfe 79212772ce grep: add pmatch and eflags arguments to match_one_pattern()
Push pmatch and eflags to the callers of match_one_pattern(), which
allows them to specify regex execution flags and to get the location
of a match.

Since we only use the first element of the matches array and aren't
interested in submatches, no provision is made for callers to
provide a larger array.

eflags are ignored for fixed patterns, but that's OK, since they
only have a meaning in connection with regular expressions
containing ^ or $.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:57 -08:00
René Scharfe d7eb527d73 grep: remove grep_opt argument from match_expr_eval()
The only use of the struct grep_opt argument of match_expr_eval()
is to pass the option word_regexp to match_one_pattern().  By adding
a pattern flag for it we can reduce the number of function arguments
of these two functions, as a cleanup and preparation for adding more
in the next patch.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:56 -08:00
René Scharfe 252d560d21 grep: micro-optimize hit collection for AND nodes
In addition to returning if an expression matches a line,
match_expr_eval() updates the expression's hit flag if the parameter
collect_hits is set.  It never sets collect_hits for children of AND
nodes, though, so their hit flag will never be updated.  Because of
that we can return early if the first child didn't match, no matter
if collect_hits is set or not.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:53 -08:00
René Scharfe f9b7cce61c Add is_regex_special()
Add is_regex_special(), a character class macro for chars that have a
special meaning in regular expressions.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 18:30:41 -08:00
René Scharfe 8cc3299262 Change NUL char handling of isspecial()
Replace isspecial() by the new macro is_glob_special(), which is more,
well, specialized.  The former included the NUL char in its character
class, while the letter only included characters that are special to
file name globbing.

The new name contains underscores because they enhance readability
considerably now that it's made up of three words.  Renaming the
function is necessary to document its changed scope.

The call sites of isspecial() are updated to check explicitly for NUL.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 18:30:37 -08:00
René Scharfe c822255cfc grep: don't call regexec() for fixed strings
Add the new flag "fixed" to struct grep_pat and set it if the pattern
is doesn't contain any regex control characters in addition to if the
flag -F/--fixed-strings was specified.

This gives a nice speed up on msysgit, where regexec() seems to be
extra slow.  Before (best of five runs):

	$ time git grep grep v1.6.1 >/dev/null

	real    0m0.552s
	user    0m0.000s
	sys     0m0.000s

	$ time git grep -F grep v1.6.1 >/dev/null

	real    0m0.170s
	user    0m0.000s
	sys     0m0.015s

With the patch:

	$ time git grep grep v1.6.1 >/dev/null

	real    0m0.173s
	user    0m0.000s
	sys     0m0.000s

The difference is much smaller on Linux, but still measurable.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 21:35:56 -08:00
René Scharfe fb62eb7fab grep -w: forward to next possible position after rejected match
grep -w accepts matches between non-word characters, only.  If a match
from regexec() doesn't meet this criteria, grep continues its search
after the first character of that match.

We can be a bit smarter here and skip all positions that follow a word
character first, as they can't match our criteria.  This way we can
consume characters quite cheaply and don't need to special-case the
handling of the beginning of a line.

Here's a contrived example command on msysgit (best of five runs):

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.611s
	user    0m0.000s
	sys     0m0.015s

With the patch it's quite a bit faster:

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.179s
	user    0m0.000s
	sys     0m0.015s

More common search patterns will gain a lot less, but it's a nice clean
up anyway.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 21:33:35 -08:00
Alexander Potashev d75307084d remove trailing LF in die() messages
LF at the end of format strings given to die() is redundant because
die already adds one on its own.

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-05 13:01:01 -08:00
Junio C Hamano 8bb4646dae Merge branch 'maint'
* maint:
  Fix non-literal format in printf-style calls
  git-submodule: Avoid printing a spurious message.
  git ls-remote: make usage string match manpage
  Makefile: help people who run 'make check' by mistake
2008-11-11 14:49:50 -08:00
Daniel Lowe 9db56f71b9 Fix non-literal format in printf-style calls
These were found using gcc 4.3.2-1ubuntu11 with the warning:

    warning: format not a string literal and no format arguments

Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-11 14:43:59 -08:00
Raphael Zimmerer 83caecca2f git grep: Add "-z/--null" option as in GNU's grep.
Here's a trivial patch that adds "-z" and "--null" options to "git
grep". It was discussed on the mailing-list that git's "-z"
convention should be used instead of GNU grep's "-Z".
So things like 'git grep -l -z "$FOO" | xargs -0 sed -i "s/$FOO/$BOO/"'
do work now.

Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-01 09:14:54 -07:00
Junio C Hamano a4d7d2c6db log --author/--committer: really match only with name part
When we tried to find commits done by AUTHOR, the first implementation
tried to pattern match a line with "^author .*AUTHOR", which later was
enhanced to strip leading caret and look for "^author AUTHOR" when the
search pattern was anchored at the left end (i.e. --author="^AUTHOR").

This had a few problems:

 * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"),
   the regexp internally used "^author .*x" would never match anything;

 * To match at the end (e.g. "git log --author='google.com>$'"), the
   generated regexp has to also match the trailing timestamp part the
   commit header lines have.  Also, in order to determine if the '$' at
   the end means "match at the end of the line" or just a literal dollar
   sign (probably backslash-quoted), we would need to parse the regexp
   ourselves.

An earlier alternative tried to make sure that a line matches "^author "
(to limit by field name) and the user supplied pattern at the same time.
While it solved the -F problem by introducing a special override for
matching the "^author ", it did not solve the trailing timestamp nor tail
match problem.  It also would have matched every commit if --author=author
was asked for, not because the author's email part had this string, but
because every commit header line that talks about the author begins with
that field name, regardleses of who wrote it.

Instead of piling more hacks on top of hacks, this rethinks the grep
machinery that is used to look for strings in the commit header, and makes
sure that (1) field name matches literally at the beginning of the line,
followed by a SP, and (2) the user supplied pattern is matched against the
remainder of the line, excluding the trailing timestamp data.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-04 22:21:56 -07:00
Johannes Schindelin 6bfce93e04 Move buffer_is_binary() to xdiff-interface.h
We already have two instances where we want to determine if a buffer
contains binary data as opposed to text.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-04 23:07:00 -07:00
Junio C Hamano 85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Junio C Hamano 0ab7befa31 grep --all-match
This lets you say:

	git grep --all-match -e A -e B -e C

to find lines that match A or B or C but limit the matches from
the files that have all of A, B and C.

This is different from

	git grep -e A --and -e B --and -e C

in that the latter looks for a single line that has all of these
at the same time.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 23:59:09 -07:00
Junio C Hamano a3f5d02edb grep: fix --fixed-strings combined with expression.
"git grep --fixed-strings -e GIT --and -e VERSION .gitignore"
misbehaved because we did not notice this needs to grab lines
that have the given two fixed strings at the same time.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 16:42:53 -07:00
Junio C Hamano b48fb5b6a9 grep: free expressions and patterns when done.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 16:27:10 -07:00
Junio C Hamano 480c1ca6fd Update grep internal for grepping only in head/body
This further updates the built-in grep engine so that we can say
something like "this pattern should match only in head".  This
can be used to simplify grepping in the log messages.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 12:39:46 -07:00
Junio C Hamano 83b5d2f5b0 builtin-grep: make pieces of it available as library.
This makes three functions and associated option structures from
builtin-grep available from other parts of the system.

 * options to drive built-in grep engine is stored in struct
   grep_opt;

 * pattern strings and extended grep expressions are added to
   struct grep_opt with append_grep_pattern();

 * when finished calling append_grep_pattern(), call
   compile_grep_patterns() to prepare for execution;

 * call grep_buffer() to find matches in the in-core buffer.

This also adds an internal option "status_only" to grep_opt,
which suppresses any output from grep_buffer().  Callers of the
function as library can use it to check if there is a match
without producing any output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 11:14:38 -07:00