Граф коммитов

16898 Коммитов

Автор SHA1 Сообщение Дата
Derrick Stolee 0c18b70081 maintenance: add [un]register subcommands
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee 4950b2a2b5 for-each-repo: run subcommands on configured repos
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee b08ff1fee0 maintenance: add --schedule option and config
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --schedule=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --schedule=daily
  0 0    * * 0    git -C <repo> maintenance run --schedule=weekly

This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee 1942d48380 maintenance: optionally skip --auto process
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.

Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee e841a79a13 maintenance: add incremental-repack auto condition
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.

The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.

Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee a13e3d0ec8 maintenance: auto-size incremental-repack batch
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee 52fe41ff1c maintenance: add incremental-repack task
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.

One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.

Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).

The 'incremental-repack' task runs the following steps:

1. 'git multi-pack-index write' creates a multi-pack-index file if
   one did not exist, and otherwise will update the multi-pack-index
   with any new pack-files that appeared since the last write. This
   is particularly relevant with the background fetch job.

   When the multi-pack-index sees two copies of the same object, it
   stores the offset data into the newer pack-file. This means that
   some old pack-files could become "unreferenced" which I will use
   to mean "a pack-file that is in the pack-file list of the
   multi-pack-index but none of the objects in the multi-pack-index
   reference a location inside that pack-file."

2. 'git multi-pack-index expire' deletes any unreferenced pack-files
   and updaes the multi-pack-index to drop those pack-files from the
   list. This is safe to do as concurrent Git processes will see the
   multi-pack-index and not open those packs when looking for object
   contents. (Similar to the 'loose-objects' job, there are some Git
   commands that open pack-files regardless of the multi-pack-index,
   but they are rarely used. Further, a user that self-selects to
   use background operations would likely refrain from using those
   commands.)

3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
   of pack-files that are listed in the multi-pack-index and creates
   a new pack-file containing the objects whose offsets are listed
   by the multi-pack-index to be in those objects. The set of pack-
   files is selected greedily by sorting the pack-files by modified
   time and adding a pack-file to the set if its "expected size" is
   smaller than the batch size until the total expected size of the
   selected pack-files is at least the batch size. The "expected
   size" is calculated by taking the size of the pack-file divided
   by the number of objects in the pack-file and multiplied by the
   number of objects from the multi-pack-index with offset in that
   pack-file. The expected size approximates how much data from that
   pack-file will contribute to the resulting pack-file size. The
   intention is that the resulting pack-file will be close in size
   to the provided batch size.

   The next run of the incremental-repack task will delete these
   repacked pack-files during the 'expire' step.

   In this version, the batch size is set to "0" which ignores the
   size restrictions when selecting the pack-files. It instead
   selects all pack-files and repacks all packed objects into a
   single pack-file. This will be updated in the next change, but
   it requires doing some calculations that are better isolated to
   a separate change.

These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).

These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.

By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee efdd2f0d4c midx: use start_delayed_progress()
Now that the multi-pack-index may be written as part of auto maintenance
at the end of a command, reduce the progress output when the operations
are quick. Use start_delayed_progress() instead of start_progress().

Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that
the progress indicators are conditional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 3e220e6069 maintenance: create auto condition for loose-objects
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 252cfb7cb8 maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 28cb5e66dd maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a50264a (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 916d0626c2 maintenance: use pointers to check --auto
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 65d655b52d maintenance: create maintenance.<task>.enabled config
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 090511bc0b maintenance: add --task option
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 663b2b1b90 maintenance: add commit-graph task
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee a95ce12430 maintenance: replace run_auto_gc()
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 3ddaad0e06 maintenance: add --quiet option
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 2057d75038 maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:04 -07:00
Junio C Hamano 887952b8c6 fetch: optionally allow disabling FETCH_HEAD update
If you run fetch but record the result in remote-tracking branches,
and either if you do nothing with the fetched refs (e.g. you are
merely mirroring) or if you always work from the remote-tracking
refs (e.g. you fetch and then merge origin/branchname separately),
you can get away with having no FETCH_HEAD at all.

Teach "git fetch" a command line option "--[no-]write-fetch-head".
The default is to write FETCH_HEAD, and the option is primarily
meant to be used with the "--no-" prefix to override this default,
because there is no matching fetch.writeFetchHEAD configuration
variable to flip the default to off (in which case, the positive
form may become necessary to defeat it).

Note that under "--dry-run" mode, FETCH_HEAD is never written;
otherwise you'd see list of objects in the file that you do not
actually have.  Passing `--write-fetch-head` does not force `git
fetch` to write the file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 12:56:57 -07:00
Junio C Hamano a555b514cd Merge branch 'so/log-diff-merges-opt'
Earlier, to countermand the implicit "-m" option when the
"--first-parent" option is used with "git log", we added the
"--[no-]diff-merges" option in the jk/log-fp-implies-m topic.  To
leave the door open to allow the "--diff-merges" option to take
values that instructs how patches for merge commits should be
computed (e.g. "cc"? "-p against first parent?"), redefine
"--diff-merges" to take non-optional value, and implement "off"
that means the same thing as "--no-diff-merges".

* so/log-diff-merges-opt:
  t/t4013: add test for --diff-merges=off
  doc/git-log: describe --diff-merges=off
  revision: change "--diff-merges" option to require parameter
2020-08-17 17:02:50 -07:00
Junio C Hamano eca8c62a50 Merge branch 'jk/log-fp-implies-m'
"git log --first-parent -p" showed patches only for single-parent
commits on the first-parent chain; the "--first-parent" option has
been made to imply "-m".  Use "--no-diff-merges" to restore the
previous behaviour to omit patches for merge commits.

* jk/log-fp-implies-m:
  doc/git-log: clarify handling of merge commit diffs
  doc/git-log: move "-t" into diff-options list
  doc/git-log: drop "-r" diff option
  doc/git-log: move "Diff Formatting" from rev-list-options
  log: enable "-m" automatically with "--first-parent"
  revision: add "--no-diff-merges" option to counteract "-m"
  log: drop "--cc implies -m" logic
2020-08-17 17:02:49 -07:00
Junio C Hamano 07f14d31ab Merge branch 'es/test-cmp-typocatcher'
Test framework update.

* es/test-cmp-typocatcher:
  test_cmp: diagnose incorrect arguments
2020-08-17 17:02:47 -07:00
Junio C Hamano ca81676a10 Merge branch 'rp/apply-cached-with-i-t-a'
Recent versions of "git diff-files" shows a diff between the index
and the working tree for "intent-to-add" paths as a "new file"
patch; "git apply --cached" should be able to take "git diff-files"
and should act as an equivalent to "git add" for the path, but the
command failed to do so for such a path.

* rp/apply-cached-with-i-t-a:
  t4140: test apply with i-t-a paths
  apply: make i-t-a entries never match worktree
  apply: allow "new file" patches on i-t-a entries
2020-08-17 17:02:46 -07:00
Junio C Hamano 47f0f94bc7 Merge branch 'al/bisect-first-parent'
"git bisect" learns the "--first-parent" option to find the first
breakage along the first-parent chain.

* al/bisect-first-parent:
  bisect: combine args passed to find_bisection()
  bisect: introduce first-parent flag
  cmd_bisect__helper: defer parsing no-checkout flag
  rev-list: allow bisect and first-parent flags
  t6030: modernize "git bisect run" tests
2020-08-17 17:02:45 -07:00
Junio C Hamano 95c687bf85 Merge branch 'hn/reftable-prep-part-2'
Further preliminary change to refs API.

* hn/reftable-prep-part-2:
  Make HEAD a PSEUDOREF rather than PER_WORKTREE.
  Modify pseudo refs through ref backend storage
  t1400: use git rev-parse for testing PSEUDOREF existence
2020-08-17 17:02:42 -07:00
Junio C Hamano a00bda2b2f Merge branch 'dd/send-email-config'
Stop when "sendmail.*" configuration variables are defined, which
could be a mistaken attempt to define "sendemail.*" variables.

* dd/send-email-config:
  git-send-email: die if sendmail.* config is set
2020-08-17 17:02:41 -07:00
Junio C Hamano 5676db2612 Merge branch 'ps/ref-transaction-hook'
The logic to find the ref transaction hook script attempted to
cache the path to the found hook without realizing that it needed
to keep a copied value, as the API it used returned a transitory
buffer space.  This has been corrected.

* ps/ref-transaction-hook:
  t1416: avoid hard-coded sha1 ids
  refs: fix interleaving hook calls with reference-transaction hook
2020-08-17 17:02:41 -07:00
Junio C Hamano 0a41a89306 Merge branch 'ma/test-quote-cleanup'
Test cleanup.

* ma/test-quote-cleanup:
  t4104: modernize and simplify quoting
  t: don't spuriously close and reopen quotes
2020-08-13 14:13:39 -07:00
Junio C Hamano d1a8a8979d Merge branch 'jt/has_object'
A new helper function has_object() has been introduced to make it
easier to mark object existence checks that do and don't want to
trigger lazy fetches, and a few such checks are converted using it.

* jt/has_object:
  fsck: do not lazy fetch known non-promisor object
  pack-objects: no fetch when allow-{any,promisor}
  apply: do not lazy fetch when applying binary
  sha1-file: introduce no-lazy-fetch has_object()
2020-08-13 14:13:39 -07:00
Junio C Hamano 73a9255166 Merge branch 'tb/upload-pack-filters'
The component to respond to "git fetch" request is made more
configurable to selectively allow or reject object filtering
specification used for partial cloning.

* tb/upload-pack-filters:
  t5616: use test_i18ngrep for upload-pack errors
  upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
  upload-pack.c: allow banning certain object filter(s)
  list_objects_filter_options: introduce 'list_object_filter_config_name'
2020-08-11 18:04:13 -07:00
Junio C Hamano e0ad9574dd Merge branch 'bc/sha-256-part-3'
The final leg of SHA-256 transition.

* bc/sha-256-part-3: (39 commits)
  t: remove test_oid_init in tests
  docs: add documentation for extensions.objectFormat
  ci: run tests with SHA-256
  t: make SHA1 prerequisite depend on default hash
  t: allow testing different hash algorithms via environment
  t: add test_oid option to select hash algorithm
  repository: enable SHA-256 support by default
  setup: add support for reading extensions.objectformat
  bundle: add new version for use with SHA-256
  builtin/verify-pack: implement an --object-format option
  http-fetch: set up git directory before parsing pack hashes
  t0410: mark test with SHA1 prerequisite
  t5308: make test work with SHA-256
  t9700: make hash size independent
  t9500: ensure that algorithm info is preserved in config
  t9350: make hash size independent
  t9301: make hash size independent
  t9300: use $ZERO_OID instead of hard-coded object ID
  t9300: abstract away SHA-1-specific constants
  t8011: make hash size independent
  ...
2020-08-11 18:04:11 -07:00
Sergey Organov 298889d3e2 t/t4013: add test for --diff-merges=off
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11 14:21:45 -07:00
Jeff King 09b2aa30c9 t1416: avoid hard-coded sha1 ids
The test added by e5256c82e5 (refs: fix interleaving hook calls with
reference-transaction hook, 2020-08-07) uses hard-coded sha1 object ids
in its expected output. This causes it to fail when run with
GIT_TEST_DEFAULT_HASH=sha256.

Let's make use of the oid variables we define earlier, as the rest of
the nearby tests do.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11 14:13:00 -07:00
Junio C Hamano 4339259d5f Merge branch 'en/eol-attrs-gotchas'
All "mergy" operations that internally use the merge-recursive
machinery should honor the merge.renormalize configuration, but
many of them didn't.

* en/eol-attrs-gotchas:
  checkout: support renormalization with checkout -m <paths>
  merge: make merge.renormalize work for all uses of merge machinery
  t6038: remove problematic test
  t6038: make tests fail for the right reason
2020-08-10 10:24:02 -07:00
Junio C Hamano 5b53175b7a Merge branch 'ma/t1450-quotefix'
Test fix.

* ma/t1450-quotefix:
  t1450: fix quoting of NUL byte when corrupting pack
2020-08-10 10:23:59 -07:00
Junio C Hamano 46b225f153 Merge branch 'jk/strvec'
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree.  It has been renamed to "strvec" to reduce the
barrier to adoption.

* jk/strvec:
  strvec: rename struct fields
  strvec: drop argv_array compatibility layer
  strvec: update documention to avoid argv_array
  strvec: fix indentation in renamed calls
  strvec: convert remaining callers away from argv_array name
  strvec: convert more callers away from argv_array name
  strvec: convert builtin/ callers away from argv_array name
  quote: rename sq_dequote_to_argv_array to mention strvec
  strvec: rename files from argv-array to strvec
  argv-array: rename to strvec
  argv-array: use size_t for count and alloc
2020-08-10 10:23:57 -07:00
Eric Sunshine d572f52a64 test_cmp: diagnose incorrect arguments
Under normal circumstances, if a test author misspells a filename passed
to test_cmp(), the error is quickly discovered when the test fails
unexpectedly due to test_cmp() being unable to find the file. However,
if the test is expected to fail, as with test_expect_failure(), a
misspelled filename as argument to test_cmp() will go unnoticed since
the test will indeed fail, but for the wrong reason. Make it easier for
test authors to discover such problems early by sanity-checking the
arguments to test_cmp(). To avoid penalizing all clients of test_cmp()
in the general case, only check for missing files if the comparison
fails.

While at it, make test_cmp_bin() sanity-check its arguments, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-09 12:13:02 -07:00
Raymond E. Pasco 4c025c667e t4140: test apply with i-t-a paths
apply --cached (as used by add -p) should accept creation and deletion
patches to intent-to-add paths in the index. apply --index, however,
should always fail because an intent-to-add path never matches the
worktree (by definition).

Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-09 11:00:46 -07:00
Aaron Lipman e8861ffc20 bisect: introduce first-parent flag
Upon seeing a merge commit when bisecting, this option may be used to
follow only the first parent.

In detecting regressions introduced through the merging of a branch, the
merge commit will be identified as introduction of the bug and its
ancestors will be ignored.

This option is particularly useful in avoiding false positives when a
merged branch contained broken or non-buildable commits, but the merge
itself was OK.

Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:13:03 -07:00
Aaron Lipman 0fe305a5d3 rev-list: allow bisect and first-parent flags
Add first_parent_only parameter to find_bisection(), removing the
barrier that prevented combining the --bisect and --first-parent flags
when using git rev-list

Based-on-patch-by: Tiago Botelho <tiagonbotelho@hotmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:11:59 -07:00
Aaron Lipman 15a4802a69 t6030: modernize "git bisect run" tests
Enforce consistent styling for tests on "git bisect run":
- Use "write_script" to abstract away platform-specific details.
- Favor current whitespace conventions.
- While at it, change "introduced" to "added" in the comments to make
  them read better.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:11:59 -07:00
Patrick Steinhardt e5256c82e5 refs: fix interleaving hook calls with reference-transaction hook
In order to not repeatedly search for the reference-transaction hook in
case it's getting called multiple times, we use a caching mechanism to
only call `find_hook()` once. What was missed though is that the return
value of `find_hook()` actually comes from a static strbuf, which means
it will get overwritten when calling `find_hook()` again. As a result,
we may call the wrong hook with parameters of the reference-transaction
hook.

This scenario was spotted in the wild when executing a git-push(1) with
multiple references, where there are interleaving calls to both the
update and the reference-transaction hook. While initial calls to the
reference-transaction hook work as expected, it will stop working after
the next invocation of the update hook. The result is that we now start
calling the update hook with parameters and stdin of the
reference-transaction hook.

This commit fixes the issue by storing a copy of `find_hook()`'s return
value in the cache.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 12:27:41 -07:00
Martin Ågren 289218de2b t4104: modernize and simplify quoting
Drop whitespace in the value of `$test_description` and in a test body
and use `test_write_lines`.

Stop defining `$u` with a trailing space just so that we can tuck it in
like `git foo $u$more...` and get minimal whitespace in the command:
`git foo $u $more...` is more readable at the "cost" of an empty `$u`
yielding `git foo  something...`.

Finally, avoid using single quotes within the test scripts to repeatedly
close and reopen the quotes that wrap the test scripts (see the previous
commit). This "unnecessary" quoting does mean that the verbose test
output shows the interpolated values, i.e., the shell code we're
running. But the downside is that the source of the script does *not*
show the shell code we're eventually executing, leaving the reader to
reason about what we really do and whether there are any quoting issues.
(There aren't.)

Where we run through loops to generate several "identical but different"
tests, the test message contains the interpolated variables we're
looping on, meaning one can always identify exactly which instance has
failed, even if the verbose test output shows the exact same test body
several times.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 15:14:34 -07:00
Martin Ågren c76b84a121 t: don't spuriously close and reopen quotes
In the test scripts, the recommended style is, e.g.:

    test_expect_success 'name' '
        do-something somehow &&
        do-some-more testing
    '

When using this style, any single quote in the multi-line test section
is actually closing the lone single quotes that surround it.

It can be a non-issue in practice:

    test_expect_success 'sed a little' '
        sed -e 's/hi/lo/' in >out # "ok": no whitespace in s/hi/lo/
    '

Or it can be a bug in the test, e.g., because variable interpolation
happens before the test even begins executing:

    v=abc

    test_expect_success 'variable interpolation' '
        v=def &&
        echo '"$v"' # abc
    '

Change several such in-test single quotes to use double quotes instead
or, in a few cases, drop them altogether. These were identified using
some crude grepping. We're not fixing any test bugs here, but we're
hopefully making these tests slightly easier to grok and to maintain.

There are legitimate use cases for closing a quote and opening a new
one, e.g., both '\'' and '"'"' can be used to produce a literal single
quote. I'm not touching any of those here.

In t9401, tuck the redirecting ">" to the filename while we're touching
those lines.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 15:14:32 -07:00
Jonathan Tan 3318238db9 apply: do not lazy fetch when applying binary
When applying a binary patch, as an optimization, "apply" checks if the
postimage is already present. During this fetch, it is perfectly
expected for the postimage not to be present, so there is no need to
lazy-fetch missing objects. Teach "apply" not to lazy-fetch in this
case.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 13:01:02 -07:00
Jeff King 6cc275ea56 t5616: use test_i18ngrep for upload-pack errors
The tests added to t5616 in 6dd3456a8c (upload-pack.c: allow banning
certain object filter(s), 2020-08-03) can fail racily, but only with
GETTEXT_POISON enabled.

The tests in question look something like this:

  test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err &&
  grep "filter blob:none not supported' err

The remote upload-pack process writes that error message both as an ERR
packet, but also via a die() message. In theory we should see the
message twice in the "err" file. The client relays the message from the
packet to its stderr (with a "remote error:" prefix), and because this
is a local-system clone, upload-pack's stderr goes to the same place.

But because clone may be writing to the pipe when upload-pack calls
die(), it may get SIGPIPE and fail to relay the message. That's why we
need our "ok=sigpipe" trick. But our grep should still work reliably in
that case. Either:

  - we got SIGPIPE on the client, which means upload-pack completed its
    die(), and we'll see that version of the message.

  - the client didn't get SIGPIPE, and so it successfully relays the
    message.

In theory we'd see both copies of the message in the second case. But
now always! As soon as the client sees ERR, it exits and we run grep.
But we have no guarantee that the upload-pack process has exited at this
point, or even written its die() message. We might only see the client
version of the message.

Normally that's OK. We only need to see one or the other to pass the
test. But now consider GETTEXT_POISON. upload-pack doesn't translate the
die() message nor the ERR packet. But once the client receives it, it
calls:

  die(_("remote error: %s"), buffer + 4);

That message _is_ marked for translation. Normally we'd just replace the
"remote error:" portion of it, but in GETTEXT_POISON mode, we replace
the whole thing with "# GETTEXT POISON #" and don't include the "%s"
part at all. So the whole text from the ERR packet is dropped, and so we
may racily see a test failure if upload-pack's die() call wasn't yet
written.

We can fix it by using test_i18ngrep, which just makes this grep a noop
in the poison mode.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-05 09:37:19 -07:00
Junio C Hamano 5b137e8441 Merge branch 'jt/pretend-object-never-come-from-elsewhere'
The pretend-object mechanism checks if the given object already
exists in the object store before deciding to keep the data
in-core, but the check would have triggered lazy fetching of such
an object from a promissor remote.

* jt/pretend-object-never-come-from-elsewhere:
  sha1-file: make pretend_object_file() not prefetch
2020-08-04 13:53:58 -07:00
Junio C Hamano 5c454b3825 Merge branch 'jt/pack-objects-prefetch-in-batch'
While packing many objects in a repository with a promissor remote,
lazily fetching missing objects from the promissor remote one by
one may be inefficient---the code now attempts to fetch all the
missing objects in batch (obviously this won't work for a lazy
clone that lazily fetches tree objects as you cannot even enumerate
what blobs are missing until you learn which trees are missing).

* jt/pack-objects-prefetch-in-batch:
  pack-objects: prefetch objects to be packed
  pack-objects: refactor to oid_object_info_extended
2020-08-04 13:53:57 -07:00
Taylor Blau 5b01a4e8ff upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.

In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.

However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.

While it would be sufficient to simply write

  $ git config uploadpackfilter.tree.allow true

(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.

In order to do this, introduce a new configuration key, as follows:

  $ git config uploadpackfilter.tree.maxDepth <m>

where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:

  $ git config uploadpackfilter.tree.allow true
  $ git config uploadpackfilter.tree.maxDepth 0

which allows '--filter=tree:0', but no other values.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 18:03:46 -07:00
Taylor Blau 6dd3456a8c upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.

However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).

Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:

  - 'uploadpackfilter.allow'
  - 'uploadpackfilter.<kind>.allow'

where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.

Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.

If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 18:03:41 -07:00