"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.
* jk/shortlog-group-by-trailer:
shortlog: allow multiple groups to be specified
shortlog: parse trailer idents
shortlog: rename parse_stdin_ident()
shortlog: de-duplicate trailer values
shortlog: match commit trailers with --group
trailer: add interface for iterating over commit trailers
shortlog: add grouping option
shortlog: change "author" variables to "ident"
The trailer code knows how to parse out the trailers and re-format them,
but there's no easy way to iterate over the trailers (you can use
trailer_info, but you have to then do a bunch of extra parsing).
Let's add an iteration interface that makes this easy to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The child_process structure has an embedded strvec for formulating
the command line argument list these days, but code that predates
the wide use of it prepared a separate char *argv[] array and
manually set the child_process.argv pointer point at it.
Teach these old-style code to lose the separate argv[] array.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default trailer lines are terminated by linebreaks ('\n'). By
specifying the new 'separator' option they will instead be separated by
user provided string and have separator semantics rather than terminator
semantics. The separator string can contain the literal formatting codes
%n and %xNN allowing it to be things that are otherwise hard to type
such as %x00, or comma and end-parenthesis which would break parsing.
E.g:
$ git log --pretty='%(trailers:key=Reviewed-by,valueonly,separator=%x00)'
Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new "key=" option to %(trailers) it often makes little sense to
show the key, as it by definition already is knows which trailer is
printed there. This new "valueonly" option makes it omit the key when
printing trailers.
E.g.:
$ git show -s --pretty='%s%n%(trailers:key=Signed-off-by,valueonly)' aaaa88182
will show:
> upload-pack: fix broken if/else chain in config callback
> Jeff King <peff@peff.net>
> Junio C Hamano <gitster@pobox.com>
Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adds a new "key=X" option to "%(trailers)" which will cause it to only
print trailer lines which match any of the specified keys.
Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even with the newly-tightened "---" parser, it's still
possible for a commit message to trigger a false positive if
it contains something like "--- foo". If the caller knows
that it has only a single commit message, it can now tell us
with the "--no-divider" option, eliminating any false
positives.
If we were designing this from scratch, I'd probably make
this the default. But we've advertised the "---" behavior in
the documentation since interpret-trailers has existed.
Since it's meant to be scripted, breaking that would be a
bad idea.
Note that the logic is in the underlying trailer.c code,
which is used elsewhere. The default there will keep the
current behavior, but many callers will benefit from setting
this new option. That's left for future patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interpret-trailers command accepts not only raw commit
messages, but it also can manipulate trailers in
format-patch output. That means it must find the "---"
boundary separating the commit message from the patch.
However, it does so by looking for any line starting with
"---", regardless of whether there is further content.
This is overly lax compared to the parsing done in
mailinfo.c's patchbreak(), and may cause false positives
(e.g., t/perf output tables uses dashes; if you cut and
paste them into your commit message, it fools the parser).
We could try to reuse patchbreak() here, but it actually has
several heuristics that are not of interest to us (e.g.,
matching "diff -" without a three-dash separator or even a
CVS "Index:" line). We're not interested in taking in
whatever random cruft people may send, but rather handling
git-formatted patches.
Note that the existing documentation was written in a loose
way, so technically we are changing the behavior from what
it said. But this should implement the original intent in a
more accurate way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the trailer code has an "opts" struct which is
filled in by the caller. We don't pass it down to
trailer_info_get(), which does the initial parsing, because
there hasn't yet been a need to do so.
Let's start passing it down in preparation for adding new
options. Note that there's a single caller which doesn't
otherwise have such an options struct. Since it's just one
caller (that we'd have to modify anyway), let's not bother
with any special treatment like accepting a NULL options
struct, and just have it allocate one with the defaults.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We store the length of the trailers list in a size_t. So on
a 64-bit system with a 32-bit int, in the unlikely case that
we manage to actually allocate a list with 2^31 entries,
we'd loop forever trying to iterate over it (our "int" would
wrap to negative before exceeding info->trailer_nr).
This probably doesn't matter in practice. Each entry is at
least a pointer plus a non-empty string, so even without
malloc overhead or the memory to hold the original string
we're parsing from, you'd need to allocate tens of
gigabytes. But it's easy enough to do it right.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of the string-parsing functions inside trailer.c return
integer offsets into the string (e.g., to point to the end
of the trailer block). Several of these use an "int" to
return or store the offsets. On a system where "size_t" is
much larger than "int" (e.g., most 64-bit ones), it's easy
to feed a gigantic commit message that results in a negative
offset. This can result in us reading memory before the
string (if the int is used as an index) or far after (if
it's implicitly cast to a size_t by passing to a strbuf
function).
Let's fix this by using size_t for all string offsets. Note
that several of the functions need ssize_t, since they use
"-1" as a sentinel value. The interactions here can be
pretty subtle. E.g., end_of_title in find_trailer_start()
does not itself need to be signed, but it is compared to the
result of last_line(), which is. That promotes the latter to
unsigned, and the ">=" does not behave as you might expect.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git interpret-trailers" has been taught a "--parse" and a few
other options to make it easier for scripts to grab existing
trailer lines from a commit log message.
* jk/trailers-parse:
doc/interpret-trailers: fix "the this" typo
pretty: support normalization options for %(trailers)
t4205: refactor %(trailers) tests
pretty: move trailer formatting to trailer.c
interpret-trailers: add --parse convenience option
interpret-trailers: add an option to unfold values
interpret-trailers: add an option to show only existing trailers
interpret-trailers: add an option to show only the trailers
trailer: put process_trailers() options into a struct
The interpret-trailers command recently learned some options
to make its output easier to parse (for a caller whose only
interested in picking out the trailer values). But it's not
very efficient for asking for the trailers of many commits
in a single invocation.
We already have "%(trailers)" to do that, but it doesn't
know about unfolding or omitting non-trailers. Let's plumb
those options through, so you can have the best of both.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next commit will add many features to the %(trailer)
placeholder in pretty.c. We'll need to access some internal
functions of trailer.c for that, so our options are either:
1. expose those functions publicly
or
2. make an entry point into trailer.c to do the formatting
Doing (2) ends up exposing less surface area, though do note
that caveats in the docstring of the new function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of "--only-trailers" is to give a caller an output
that's easy for them to parse. Getting rid of the
non-trailer material helps, but we still may see more
complicated syntax like whitespace continuation. Let's add
an option to unfold any continuation, giving the output as a
single "key: value" line per trailer.
As a bonus, this could be used even without --only-trailers
to clean up unusual formatting in the incoming data.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be useful to invoke interpret-trailers for the
primary purpose of parsing existing trailers. But in that
case, we don't want to apply existing ifMissing or ifExists
rules from the config. Let's add a special mode where we
avoid applying those rules. Coupled with --only-trailers,
this gives us a reasonable parsing tool.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In theory it's easy for any reader who wants to parse
trailers to do so. But there are a lot of subtle corner
cases around what counts as a trailer, when the trailer
block begins and ends, etc. Since interpret-trailers already
has our parsing logic, let's let callers ask it to just
output the trailers.
They still have to parse the "key: value" lines, but at
least they can ignore all of the other corner cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow using non-default values for trailers without having to set
them up in .gitconfig first. For example, if you have the following
configuration
trailer.signed-off-by.where = end
you may use "--where before" when a patch author forgets his
Signed-off-by and provides it in a separate email. Likewise for
--if-exists and --if-missing
Reverting to the behavior specified by .gitconfig is done with
--no-where, --no-if-exists and --no-if-missing.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will provide a place to store the current state of the
--where, --if-exists and --if-missing options.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have two options and are about to add a few more.
To avoid having a huge number of boolean arguments, let's
convert to an options struct which can be passed in.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate the mechanical changes out of the next patch. The functions
are changed to take a pointer to enum, because struct conf_info is not
going to be public.
Set the default values explicitly in default_conf_info, since they are
not anymore close to default_conf_info and it's not obvious which
constant has value 0. With the next patches, in fact, the values will
not be zero anymore!
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a function that, taking a string, describes the position of its
trailer block (if available) and the contents thereof, and make trailer
use it. This makes it easier for other Git components, in the future, to
interpret trailer blocks in the same way as trailer.
In a subsequent patch, another component will be made to use this.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
trailer.c currently splits lines while processing a buffer (and also
rejoins lines when needing to invoke ignore_non_trailer).
Avoid such line splitting, except when generating the strings
corresponding to trailers (for ease of use by clients - a subsequent
patch will allow other components to obtain the layout of a trailer
block in a buffer, including the trailers themselves). The main purpose
of this is to make it easy to return pointers into the original buffer
(for a subsequent patch), but this also significantly reduces the number
of memory allocations required.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make ignore_non_trailer take a buf/len pair instead of struct strbuf.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a line is interpreted to be a trailer line if it contains a
separator. Make parsing stricter by requiring the text on the left of
the separator, if not the empty string, to be of the "<token><optional
whitespace>" form.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update "interpret-trailers" machinery and teaches it that people in
real world write all sorts of crufts in the "trailer" that was
originally designed to have the neat-o "Mail-Header: like thing"
and nothing else.
* jt/trailer-with-cruft:
trailer: support values folded to multiple lines
trailer: forbid leading whitespace in trailers
trailer: allow non-trailers in trailer block
trailer: clarify failure modes in parse_trailer
trailer: make args have their own struct
trailer: streamline trailer item create and add
trailer: use list.h for doubly-linked list
trailer: improve const correctness
Currently, interpret-trailers requires that a trailer be only on 1 line.
For example:
a: first line
second line
would be interpreted as one trailer line followed by one non-trailer line.
Make interpret-trailers support RFC 822-style folding, treating those
lines as one logical trailer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, interpret-trailers allows leading whitespace in trailer
lines. This leads to false positives, especially for quoted lines or
bullet lists.
Forbid leading whitespace in trailers.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, interpret-trailers requires all lines of a trailer block to
be trailers (or comments) - if not it would not identify that block as a
trailer block, and thus create its own trailer block, inserting a blank
line. For example:
echo -e "\nSigned-off-by: x\nnot trailer" |
git interpret-trailers --trailer "c: d"
would result in:
Signed-off-by: x
not trailer
c: d
Relax the definition of a trailer block to require that the trailers (i)
are all trailers, or (ii) contain at least one Git-generated trailer and
consists of at least 25% trailers.
Signed-off-by: x
not trailer
c: d
(i) is the existing functionality. (ii) allows arbitrary lines to be
included in trailer blocks, like those in [1], and still allow
interpret-trailers to be used.
[1]
e7d316a02f
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parse_trailer function has a few modes of operation, all depending
on whether the separator is present in its input, and if yes, the
separator's position. Some of these modes are failure modes, and these
failure modes are handled differently depending on whether the trailer
line was sourced from a file or from a command-line argument.
Extract a function to find the separator, allowing the invokers of
parse_trailer to determine how to handle the failure modes instead of
making parse_trailer do it. In this function, also take in the list of
separators, so that we can distinguish between command line arguments
(which allow '=' as separator) and file input (which does not allow '='
as separator).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.
This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the existing handwritten implementation of a doubly-linked list
in trailer.c with the functions and macros from list.h. This
significantly simplifies the code.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change "const char *" to "char *" in struct trailer_item and in the
return value of apply_command (since those strings are owned strings).
Change "struct conf_info *" to "const struct conf_info *" (since that
struct is not modified).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d64ea0f83b ("git-compat-util: add xstrdup_or_null helper",
2015-01-12) added a handy wrapper that allows us to get a duplicate
of a string or NULL if the original is NULL, but a handful of
codepath predate its introduction or just weren't aware of it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".
As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a command line option --in-place to support in-place editing akin to
sed -i. This allows to write commands like the following:
git interpret-trailers --trailer "X: Y" a.txt > b.txt && mv b.txt a.txt
in a more concise way:
git interpret-trailers --trailer "X: Y" --in-place a.txt
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use fprintf instead of printf in trailer.c in order to allow printing
to a file other than stdout. This will be needed to support in-place
editing in git interpret-trailers.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "interpret-trailers" helper mistook a multi-paragraph title of
a commit log message with a colon in it as the end of the trailer
block.
* cc/trailers-corner-case-fix:
trailer: support multiline title
We currently ignore the first line passed to `git interpret-trailers`,
when looking for the beginning of the trailers.
Unfortunately this does not work well when a commit is created with a
line break in the title, using for example the following command:
git commit -m 'place of
code: change we made'
That's why instead of ignoring only the first line, it is better to
ignore the first paragraph.
Signed-off-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"interpret-trailers" helper mistook a single-liner log message that
has a colon as the end of existing trailer.
* cc/trailers-corner-case-fix:
trailer: retitle a test and correct an in-comment message
trailer: ignore first line of message
When looking for the start of the trailers in the message
we are passed, we should ignore the first line of the message.
The reason is that if we are passed a patch or commit message
then the first line should be the patch title.
If we are passed only trailers we can expect that they start
with an empty line that can be ignored too.
This way we can properly process commit messages that have
only one line with something that looks like a trailer, for
example like "area of code: change we made".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we read from a trailer.*.command sub-program, the
current code uses run_command followed by a pipe read, which
can result in deadlock (though in practice you would have to
have a large trailer for this to be a problem). The current
code also leaks the file descriptor for the pipe to the
sub-command.
Instead, let's use capture_command, which makes this simpler
(and we can get rid of our custom helper).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>