git/Documentation
Elijah Newren 37a2514364 diffcore-rename: use directory rename guided basename comparisons
A previous commit noted that it is very common for people to move files
across directories while keeping their filename the same.  The last few
commits took advantage of this and showed that we can accelerate rename
detection significantly using basenames; since files with the same
basename serve as likely rename candidates, we can check those first and
remove them from the rename candidate pool if they are sufficiently
similar.

Unfortunately, the previous optimization was limited by the fact that
the remaining basenames after exact rename detection are not always
unique.  Many repositories have hundreds of build files with the same
name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have
hundreds of source files with the same name.  (For example, the linux
kernel has 100 setup.c, 87 irq.c, and 112 core.c files.  A repository at
$DAYJOB has a lot of ObjectFactory.java and Plugin.java files).

For these files with non-unique basenames, we are faced with the task of
attempting to determine or guess which directory they may have been
relocated to.  Such a task is precisely the job of directory rename
detection.  However, there are two catches: (1) the directory rename
detection code has traditionally been part of the merge machinery rather
than diffcore-rename.c, and (2) directory rename detection currently
runs after regular rename detection is complete.  The 1st catch is just
an implementation issue that can be overcome by some code shuffling.
The 2nd requires us to add a further approximation: we only have access
to exact renames at this point, so we need to do directory rename
detection based on just exact renames.  In some cases we won't have
exact renames, in which case this extra optimization won't apply.  We
also choose to not apply the optimization unless we know that the
underlying directory was removed, which will require extra data to be
passed in to diffcore_rename_extended().  Also, even if we get a
prediction about which directory a file may have relocated to, we will
still need to check to see if there is a file in the predicted
directory, and then compare the two files to see if they meet the higher
min_basename_score threshold required for marking the two files as
renames.

This commit introduces an idx_possible_rename() function which will
do this directory rename detection for us and give us the index within
rename_dst of the resulting filename.  For now, this function is
hardcoded to return -1 (not found) and just hooks up how its results
would be used once we have a more complete implementation in place.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-26 17:53:11 -08:00
..
RelNotes The seventh batch 2021-02-10 14:48:33 -08:00
config
howto
technical
.gitattributes
.gitignore
CodingGuidelines
Makefile
MyFirstContribution.txt
MyFirstObjectWalk.txt
SubmittingPatches
asciidoc.conf
asciidoctor-extensions.rb
blame-options.txt
build-docdep.perl
cat-texi.perl
cmd-list.perl
config.txt
date-formats.txt
diff-format.txt
diff-generate-patch.txt
diff-options.txt
doc-diff
docbook-xsl.css
docbook.xsl
everyday.txto
fetch-options.txt
fix-texi.perl
git-add.txt
git-am.txt
git-annotate.txt
git-apply.txt
git-archimport.txt
git-archive.txt
git-bisect-lk2009.txt
git-bisect.txt
git-blame.txt
git-branch.txt
git-bugreport.txt
git-bundle.txt
git-cat-file.txt
git-check-attr.txt
git-check-ignore.txt
git-check-mailmap.txt
git-check-ref-format.txt
git-checkout-index.txt
git-checkout.txt
git-cherry-pick.txt
git-cherry.txt
git-citool.txt
git-clean.txt
git-clone.txt
git-column.txt
git-commit-graph.txt
git-commit-tree.txt
git-commit.txt
git-config.txt
git-count-objects.txt
git-credential-cache--daemon.txt
git-credential-cache.txt
git-credential-store.txt
git-credential.txt
git-cvsexportcommit.txt
git-cvsimport.txt
git-cvsserver.txt
git-daemon.txt
git-describe.txt
git-diff-files.txt
git-diff-index.txt
git-diff-tree.txt
git-diff.txt
git-difftool.txt
git-fast-export.txt
git-fast-import.txt
git-fetch-pack.txt
git-fetch.txt
git-filter-branch.txt
git-fmt-merge-msg.txt
git-for-each-ref.txt
git-for-each-repo.txt
git-format-patch.txt
git-fsck-objects.txt
git-fsck.txt
git-gc.txt
git-get-tar-commit-id.txt
git-grep.txt
git-gui.txt
git-hash-object.txt
git-help.txt
git-http-backend.txt
git-http-fetch.txt
git-http-push.txt
git-imap-send.txt
git-index-pack.txt
git-init-db.txt
git-init.txt
git-instaweb.txt
git-interpret-trailers.txt
git-log.txt
git-ls-files.txt
git-ls-remote.txt
git-ls-tree.txt
git-mailinfo.txt
git-mailsplit.txt
git-maintenance.txt
git-merge-base.txt
git-merge-file.txt
git-merge-index.txt
git-merge-one-file.txt
git-merge-tree.txt
git-merge.txt
git-mergetool--lib.txt
git-mergetool.txt
git-mktag.txt
git-mktree.txt
git-multi-pack-index.txt
git-mv.txt
git-name-rev.txt
git-notes.txt
git-p4.txt
git-pack-objects.txt
git-pack-redundant.txt
git-pack-refs.txt
git-patch-id.txt
git-prune-packed.txt
git-prune.txt
git-pull.txt
git-push.txt
git-quiltimport.txt
git-range-diff.txt
git-read-tree.txt
git-rebase.txt
git-receive-pack.txt
git-reflog.txt
git-remote-ext.txt
git-remote-fd.txt
git-remote-helpers.txto
git-remote.txt
git-repack.txt
git-replace.txt
git-request-pull.txt
git-rerere.txt
git-reset.txt
git-restore.txt
git-rev-list.txt
git-rev-parse.txt
git-revert.txt
git-rm.txt
git-send-email.txt
git-send-pack.txt
git-sh-i18n--envsubst.txt
git-sh-i18n.txt
git-sh-setup.txt
git-shell.txt
git-shortlog.txt
git-show-branch.txt
git-show-index.txt
git-show-ref.txt
git-show.txt
git-sparse-checkout.txt
git-stage.txt
git-stash.txt
git-status.txt
git-stripspace.txt
git-submodule.txt
git-svn.txt
git-switch.txt
git-symbolic-ref.txt
git-tag.txt
git-tools.txt
git-unpack-file.txt
git-unpack-objects.txt
git-update-index.txt
git-update-ref.txt
git-update-server-info.txt
git-upload-archive.txt
git-upload-pack.txt
git-var.txt
git-verify-commit.txt
git-verify-pack.txt
git-verify-tag.txt
git-web--browse.txt
git-whatchanged.txt
git-worktree.txt
git-write-tree.txt
git.txt
gitattributes.txt
gitcli.txt
gitcore-tutorial.txt
gitcredentials.txt
gitcvs-migration.txt
gitdiffcore.txt diffcore-rename: use directory rename guided basename comparisons 2021-02-26 17:53:11 -08:00
giteveryday.txt
gitfaq.txt
gitglossary.txt
githooks.txt
gitignore.txt
gitk.txt
gitmailmap.txt
gitmodules.txt
gitnamespaces.txt
gitremote-helpers.txt
gitrepository-layout.txt
gitrevisions.txt
gitsubmodules.txt
gittutorial-2.txt
gittutorial.txt
gitweb.conf.txt
gitweb.txt
gitworkflows.txt
glossary-content.txt
howto-index.sh
i18n.txt
install-doc-quick.sh
install-webdoc.sh
line-range-format.txt
line-range-options.txt
lint-gitlink.perl
manpage-base-url.xsl.in
manpage-bold-literal.xsl
manpage-normal.xsl
manpage-quote-apos.xsl
manpage.xsl
merge-options.txt
merge-strategies.txt
object-format-disclaimer.txt
pretty-formats.txt
pretty-options.txt
pull-fetch-param.txt
ref-reachability-filters.txt
rev-list-description.txt
rev-list-options.txt
revisions.txt
sequencer.txt
signoff-option.txt
texi.xsl
trace2-target-values.txt
transfer-data-leaks.txt
urls-remotes.txt
urls.txt
user-manual.conf
user-manual.txt