git/builtin
Jeff King 845de33a5b cat-file: avoid noop calls to sha1_object_info_extended
It is not unreasonable to ask cat-file for a batch-check
format of simply "%(objectname)". At first glance this seems
like a noop (you are generally already feeding the object
names on stdin!), but it has a few uses:

  1. With --batch-all-objects, you can generate a listing of
     the sha1s present in the repository, without any input.

  2. You do not have to feed sha1s; you can feed arbitrary
     sha1 expressions and have git resolve them en masse.

  3. You can even feed a raw sha1, with the result that git
     will tell you whether we actually have the object or
     not.

In case 3, the call to sha1_object_info is useful; it tells
us whether the object exists or not (technically we could
swap this out for has_sha1_file, but the cost is roughly the
same).

In case 2, the existence check is of debatable value. A
mass-resolution might prefer performance to safety (against
outputting a value for a corrupted ref, for example).
However, the object lookup cost is likely not as noticeable
compared to the resolution cost. And since we have provided
that safety in the past, the conservative choice is to keep
it.

In case 1, though, the object lookup is a definite noop; we
know about the object because we found it in the object
database. There is no new information gained by making the
call.

This patch detects that case and optimizes out the call.
Here are best-of-five timings for linux.git:

  [before]
  $ time git cat-file --buffer \
                      --batch-all-objects \
                      --batch-check='%(objectname)'
  real    0m2.117s
  user    0m2.044s
  sys     0m0.072s

  [after]
  $ time git cat-file --buffer \
                      --batch-all-objects \
                      --batch-check='%(objectname)'
  real    0m1.230s
  user    0m1.176s
  sys     0m0.052s

There are two implementation details to note here.

One is that we detect the noop case by seeing that "struct
object_info" does not request any information. But besides
object existence, there is one other piece of information
which sha1_object_info may fill in: whether the object is
cached, loose, or packed. We don't currently provide that
information in the output, but if we were to do so later,
we'd need to take note and disable the optimization in that
case.

And that leads to the second note. If we were to output
that information, a better implementation would be to
remember where we saw the object in --batch-all-objects in
the first place, and avoid looking it up again by sha1.

In fact, we could probably squeeze out some extra
performance for less-trivial cases, too, by remembering the
pack location where we saw the object, and going directly
there to find its information (like type, size, etc). That
would in theory make this optimization unnecessary.

I didn't pursue that path here for two reasons:

  1. It's non-trivial to implement, and has memory
     implications. Because we sort and de-dup the list of
     output sha1s, we'd have to record the pack information
     for each object, too.

  2. It doesn't save as much as you might hope. It saves the
     find_pack_entry() call, but getting the size and type
     for deltified objects requires walking down the delta
     chain (for the real type) or reading the delta data
     header (for the size). These costs tend to dominate the
     non-trivial cases.

By contrast, this optimization is easy and self-contained,
and speeds up a real-world case I've used.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-18 14:17:38 -07:00
..
add.c Merge branch 'jc/add-u-A-default-to-top' into maint 2015-11-05 12:18:12 -08:00
am.c Merge branch 'jc/am-3-fallback-regression-fix' into maint 2015-11-03 15:32:39 -08:00
annotate.c annotate: use argv_array 2014-07-16 11:10:11 -07:00
apply.c Merge branch 'gb/apply-comment-typofix' 2015-09-14 11:44:44 -07:00
archive.c
bisect--helper.c
blame.c Merge branch 'mk/blame-error-message' into maint 2015-11-03 15:32:43 -08:00
branch.c strbuf: make stripspace() part of strbuf 2015-10-16 09:45:15 -07:00
bundle.c bundle: verify arguments more strictly 2015-05-08 10:52:11 -07:00
cat-file.c cat-file: avoid noop calls to sha1_object_info_extended 2016-05-18 14:17:38 -07:00
check-attr.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
check-ignore.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
check-mailmap.c standardize usage info string format 2015-01-14 09:32:04 -08:00
check-ref-format.c standardize usage info string format 2015-01-14 09:32:04 -08:00
checkout-index.c prefix_path(): unconditionally free results in the callers 2015-05-05 10:31:51 -07:00
checkout.c Merge branch 'jc/calloc-pathspec' into maint 2015-09-03 19:18:00 -07:00
clean.c Merge branch 'rs/janitorial' into maint 2015-06-16 14:33:47 -07:00
clone.c Merge branch 'nd/clone-linked-checkout' into maint 2015-11-05 12:18:08 -08:00
column.c standardize usage info string format 2015-01-14 09:32:04 -08:00
commit-tree.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
commit.c allow hooks to ignore their standard input stream 2015-11-16 08:59:19 -05:00
config.c get_urlmatch: avoid useless strbuf write 2015-08-20 13:16:50 -07:00
count-objects.c prepare_packed_git(): refactor garbage reporting in pack directory 2015-08-17 09:14:59 -07:00
credential.c
describe.c Merge branch 'sg/describe-contains' 2015-08-31 15:39:10 -07:00
diff-files.c standardize usage info string format 2015-01-14 09:32:04 -08:00
diff-index.c standardize usage info string format 2015-01-14 09:32:04 -08:00
diff-tree.c standardize usage info string format 2015-01-14 09:32:04 -08:00
diff.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
fast-export.c refs: move the remaining ref module declarations to refs.h 2015-06-22 13:17:12 -07:00
fetch-pack.c standardize usage info string format 2015-01-14 09:32:04 -08:00
fetch.c Merge branch 'mh/get-remote-group-fix' into maint 2015-09-03 19:17:48 -07:00
fmt-merge-msg.c Merge branch 'rs/pop-commit' into maint 2015-12-11 11:14:13 -08:00
for-each-ref.c Merge branch 'mh/reporting-broken-refs-from-for-each-ref' into maint 2015-08-03 10:41:31 -07:00
fsck.c Merge branch 'jc/fsck-dropped-errors' into maint 2015-10-16 14:32:50 -07:00
gc.c Merge branch 'dk/gc-idx-wo-pack' into maint 2015-12-04 11:33:08 -08:00
get-tar-commit-id.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
grep.c Merge branch 'ps/grep-help-all-callback-arg' 2015-04-20 15:28:34 -07:00
hash-object.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
help.c Merge branch 'sb/leaks' 2015-03-20 13:11:53 -07:00
index-pack.c Merge branch 'jc/finalize-temp-file' 2015-08-19 14:48:55 -07:00
init-db.c write_file(): drop caller-supplied LF from calls to create a one-liner file 2015-08-25 12:49:19 -07:00
interpret-trailers.c trailer: add interpret-trailers command 2014-10-13 13:55:27 -07:00
log.c builtin/log.c: minor reformat 2015-08-25 13:11:21 -07:00
ls-files.c ps_matched: xcalloc() takes nmemb and then element size 2015-08-20 09:57:38 -07:00
ls-remote.c ls-remote.txt: delete unsupported option 2015-09-28 11:07:04 -07:00
ls-tree.c ls-tree: disable negative pathspec because it's not supported 2014-12-01 11:33:45 -08:00
mailinfo.c standardize usage info string format 2015-01-14 09:32:04 -08:00
mailsplit.c mailsplit: remove unnecessary unlink(2) call 2014-10-07 10:49:57 -07:00
merge-base.c standardize usage info string format 2015-01-14 09:32:04 -08:00
merge-file.c Merge branch 'jk/merge-file-exit-code' into maint 2015-11-03 15:32:41 -08:00
merge-index.c standardize usage info string format 2015-01-14 09:32:04 -08:00
merge-ours.c
merge-recursive.c
merge-tree.c react to errors in xdi_diff 2015-09-28 14:57:10 -07:00
merge.c Merge branch 'rs/pop-commit' into maint 2015-12-11 11:14:13 -08:00
mktag.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
mktree.c builtin/mktree.c: use ALLOC_GROW() in append_to_tree() 2014-03-03 14:54:45 -08:00
mv.c standardize usage info string format 2015-01-14 09:32:04 -08:00
name-rev.c name_ref(): rewrite to take an object_id argument 2015-05-25 12:19:29 -07:00
notes.c strbuf: make stripspace() part of strbuf 2015-10-16 09:45:15 -07:00
pack-objects.c Merge branch 'maint-2.5' into maint-2.6 2016-03-17 11:26:18 -07:00
pack-redundant.c standardize usage info string format 2015-01-14 09:32:04 -08:00
pack-refs.c standardize usage info string format 2015-01-14 09:32:04 -08:00
patch-id.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
prune-packed.c standardize usage info string format 2015-01-14 09:32:04 -08:00
prune.c Merge branch 'jk/repository-extension' into maint 2015-11-03 15:32:25 -08:00
pull.c Merge branch 'pt/pull-builtin' into maint 2015-10-16 14:32:32 -07:00
push.c push: add a config option push.gpgSign for default signed pushes 2015-08-19 12:58:58 -07:00
read-tree.c Merge branch 'ah/read-tree-usage-string' 2015-09-01 16:31:16 -07:00
receive-pack.c Merge branch 'jx/do-not-crash-receive-pack-wo-head' into maint 2015-08-19 14:41:26 -07:00
reflog.c Merge branch 'rs/pop-commit' into maint 2015-12-11 11:14:13 -08:00
remote-ext.c use skip_prefix() to avoid more magic numbers 2014-10-07 11:09:16 -07:00
remote-fd.c
remote.c remote.c: drop extraneous local variable from migrate_file 2015-08-10 15:37:12 -07:00
repack.c Merge branch 'jk/repository-extension' into maint 2015-11-03 15:32:25 -08:00
replace.c Merge branch 'mh/replace-refs' 2015-08-03 11:01:10 -07:00
rerere.c Sync with v2.5.4 2015-09-28 19:16:54 -07:00
reset.c memoize common git-path "constant" files 2015-08-10 15:37:14 -07:00
rev-list.c Merge branch 'maint-2.4' into maint-2.5 2016-03-17 11:24:14 -07:00
rev-parse.c use pop_commit() for consuming the first entry of a struct commit_list 2015-10-26 14:06:46 -07:00
revert.c standardize usage info string format 2015-01-14 09:32:04 -08:00
rm.c use file_exists() to check if a file exists in the worktree 2015-05-20 13:49:10 -07:00
send-pack.c push: add a config option push.gpgSign for default signed pushes 2015-08-19 12:58:58 -07:00
shortlog.c convert "enum date_mode" into a struct 2015-06-29 11:39:07 -07:00
show-branch.c Merge branch 'rs/show-branch-argv-array' into maint 2015-12-11 11:14:14 -08:00
show-ref.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
stripspace.c Merge branch 'jc/usage-stdin' into maint 2015-11-03 15:32:38 -08:00
symbolic-ref.c symbolic-ref: propagate error code from create_symref() 2015-12-21 12:03:03 -08:00
tag.c strbuf: make stripspace() part of strbuf 2015-10-16 09:45:15 -07:00
unpack-file.c
unpack-objects.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
update-index.c Merge branch 'nd/untracked-cache' 2015-05-26 13:24:46 -07:00
update-ref.c tag, update-ref: improve description of option "create-reflog" 2015-09-11 09:50:02 -07:00
update-server-info.c
upload-archive.c
var.c
verify-commit.c verify-commit: add option to print raw gpg status information 2015-06-22 14:20:47 -07:00
verify-pack.c standardize usage info string format 2015-01-14 09:32:04 -08:00
verify-tag.c verify-tag: add option to print raw gpg status information 2015-06-22 14:20:47 -07:00
worktree.c Merge branch 'es/worktree-add' into maint 2015-11-04 14:20:44 -08:00
write-tree.c