microsoft/git - git

Граф коммитов

Автор	SHA1	Сообщение	Дата
Emily Shaffer	540267304d	run-command: allow stdin for run_processes_parallel While it makes sense not to inherit stdin from the parent process to avoid deadlocking, it's not necessary to completely ban stdin to children. An informed user should be able to configure stdin safely. By setting `some_child.process.no_stdin=1` before calling `get_next_task()` we provide a reasonable default behavior but enable users to set up stdin streaming for themselves during the callback. `some_child.process.stdout_to_stderr`, however, remains unmodifiable by `get_next_task()` - the rest of the run_processes_parallel() API depends on child output in stderr. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-08 12:50:03 -08:00
Ævar Arnfjörð Bjarmason	5123e6e7bd	run-command.c: remove dead assignment in while-loop Remove code that's been unused since it was added in `c553c72eed` (run-command: add an asynchronous parallel child processor, 2015-12-15). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-08 12:50:03 -08:00
Andrei Rybak	b39a84185e	*: fix typos which duplicate a word Fix typos in code comments which repeat various words. Most of the cases are simple in that they repeat a word that usually cannot be repeated in a grammatically correct sentence. Just remove the incorrectly duplicated word in these cases and rewrap text, if needed. A tricky case is usage of "that that", which is sometimes grammatically correct. However, an instance of this in "t7527-builtin-fsmonitor.sh" doesn't need two words "that", because there is only one daemon being discussed, so replace the second "that" with "the". Reword code comment "entries exist on on-disk index" in function update_one in file cache-tree.c, by replacing incorrect preposition "on" with "in". Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-08 10:28:34 +09:00
Junio C Hamano	4e09e0dae6	Merge branch 'sx/pthread-error-check-fix' Correct pthread API usage. * sx/pthread-error-check-fix: maintenance: compare output of pthread functions for inequality with 0	2022-12-19 11:46:17 +09:00
Seija	786e67611d	maintenance: compare output of pthread functions for inequality with 0 The documentation for pthread_create and pthread_sigmask state that: "On success, pthread_create() returns 0; on error, it returns an error number" As such, we ought to check for an error by seeing if the output is not 0. Checking for "less than" is a mistake as the error code numbers can be greater than 0. Signed-off-by: Seija <doremylover123@gmail.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 10:15:54 +09:00
Taylor Blau	be4ac3b197	Merge branch 'rs/no-more-run-command-v' Simplify the run-command API. * rs/no-more-run-command-v: replace and remove run_command_v_opt() replace and remove run_command_v_opt_cd_env_tr2() replace and remove run_command_v_opt_tr2() replace and remove run_command_v_opt_cd_env() use child_process members "args" and "env" directly use child_process member "args" instead of string array variable sequencer: simplify building argument list in do_exec() bisect--helper: factor out do_bisect_run() bisect: simplify building "checkout" argument list am: simplify building "show" argument list run-command: fix return value comment merge: remove always-the-same "verbose" arguments	2022-11-08 17:15:12 -05:00
René Scharfe	ddbb47fde9	replace and remove run_command_v_opt() Replace the remaining calls of run_command_v_opt() with run_command() calls and explict struct child_process variables. This is more verbose, but not by much overall. The code becomes more flexible, e.g. it's easy to extend to conditionally add a new argument. Then remove the now unused function and its own flag names, simplifying the run-command API. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-30 14:04:51 -04:00
René Scharfe	ef249b398e	replace and remove run_command_v_opt_cd_env_tr2() The convenience function run_command_v_opt_cd_env_tr2() has no external callers left. Inline it and remove it from the API. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-30 14:04:50 -04:00
René Scharfe	d82dbbd849	replace and remove run_command_v_opt_tr2() The convenience function run_command_v_opt_tr2() is only used by a single caller. Use struct child_process and run_command() directly instead and remove the underused function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-30 14:04:48 -04:00
René Scharfe	eb5b6b57d0	replace and remove run_command_v_opt_cd_env() run_command_v_opt_cd_env() is only used in an example in a comment. Use the struct child_process member "env" and run_command() directly instead and then remove the unused convenience function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-30 14:04:47 -04:00
Ævar Arnfjörð Bjarmason	0b0ab95f17	run-command.c: remove "max_processes", add "const" to signal() handler As with the *_fn members removed in a preceding commit, let's not copy the "processes" member of the "struct run_process_parallel_opts" over to the "struct parallel_processes". In this case we need the number of processes for the kill_children() function, which will be called from a signal handler. To do that adjust this code added in `c553c72eed` (run-command: add an asynchronous parallel child processor, 2015-12-15) so that we use a dedicated "struct parallel_processes_for_signal" for passing data to the signal handler, in addition to the "struct parallel_process" it'll now have access to our "opts" variable. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:42 -07:00
Ævar Arnfjörð Bjarmason	d1610eef3f	run-command.c: pass "opts" further down, and use "opts->processes" Continue the migration away from the "max_processes" member of "struct parallel_processes" to the "processes" member of the "struct run_process_parallel_opts", in this case we needed to pass the "opts" further down into pp_cleanup() and pp_buffer_stderr(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:42 -07:00
Ævar Arnfjörð Bjarmason	9f3df6c048	run-command.c: use "opts->processes", not "pp->max_processes" Neither the "processes" nor "max_processes" members ever change after their initialization, and they're always equivalent, but some existing code used "pp->max_processes" when we were already passing the "opts" to the function, let's use the "opts" directly instead. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:42 -07:00
Ævar Arnfjörð Bjarmason	2aa8d2259f	run-command.c: don't copy "data" to "struct parallel_processes" As with the *_fn members removed in a preceding commit, let's not copy the "data" member of the "struct run_process_parallel_opts" over to the "struct parallel_processes". Now that we're passing the "opts" down there's no reason to do so. This makes the code easier to follow, as we have a "const" attribute on the "struct run_process_parallel_opts", but not "struct parallel_processes". We do not alter the "ungroup" argument, so storing it in the non-const structure would make this control flow less obvious. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:42 -07:00
Ævar Arnfjörð Bjarmason	357f8e6e18	run-command.c: don't copy "ungroup" to "struct parallel_processes" As with the *_fn members removed in the preceding commit, let's not copy the "ungroup" member of the "struct run_process_parallel_opts" over to the "struct parallel_processes". Now that we're passing the "opts" down there's no reason to do so. This makes the code easier to follow, as we have a "const" attribute on the "struct run_process_parallel_opts", but not "struct parallel_processes". We do not alter the "ungroup" argument, so storing it in the non-const structure would make this control flow less obvious. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	fa93951d79	run-command.c: don't copy _fn to "struct parallel_processes" The only remaining reason for copying the callbacks in the "struct run_process_parallel_opts" over to the "struct parallel_processes" was to avoid two if/else statements in case the "start_failure" and "task_finished" callbacks were NULL. Let's handle those cases in pp_start_one() and pp_collect_finished() instead, and avoid the default_ stub functions, and the need to copy this data around. Organizing the code like this made more sense before the "struct run_parallel_parallel_opts" existed, as we'd have needed to pass each of these as a separate parameter. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	e39c9de860	run-command.c: make "struct parallel_processes" const if possible Add a "const" to two "struct parallel_processes" parameters where we're not modifying anything in "pp". For kill_children() we'll call it from both the signal handler, and from run_processes_parallel() itself. Adding a "const" there makes it clear that we don't need to modify any state when killing our children. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	36d69bf77e	run-command API: move _tr2() users to "run_processes_parallel()" Have the users of the "run_processes_parallel_tr2()" function use "run_processes_parallel()" instead. In preceding commits the latter was refactored to take a "struct run_process_parallel_opts" argument, since the only reason for "run_processes_parallel_tr2()" to exist was to take arguments that are now a part of that struct we can do away with it. See `ee4512ed48` (trace2: create new combined trace facility, 2019-02-22) for the addition of the "_tr2()" variant of the function, it was used by every caller except "t/helper/test-run-command.c".. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	6e5ba0bae4	run-command API: have run_process_parallel() take an "opts" struct As noted in `fd3aaf53f7` (run-command: add an "ungroup" option to run_process_parallel(), 2022-06-07) which added the "ungroup" passing it to "run_process_parallel()" via the global "run_processes_parallel_ungroup" variable was a compromise to get the smallest possible regression fix for "maint" at the time. This follow-up to that is a start at passing that parameter and others via a new "struct run_process_parallel_opts", as the earlier version[1] of what became `fd3aaf53f7` did. Since we need to change all of the occurrences of "n" to "opt->SOMETHING" let's take the opportunity and rename the terse "n" to "processes". We could also have picked "max_processes", "jobs", "threads" etc., but as the API is named "run_processes_parallel()" let's go with "processes". Since the new "run_processes_parallel()" function is able to take an optional "tr2_category" and "tr2_label" via the struct we can at this point migrate all of the users of "run_processes_parallel_tr2()" over to it. But let's not migrate all the API users yet, only the two users that passed the "ungroup" parameter via the "run_processes_parallel_ungroup" global 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	c333e6f3a8	run-command.c: use designated init for pp_init(), add "const" Use a designated initializer to initialize those parts of pp_init() that don't need any conditionals for their initialization, this sets us on a path to pp_init() itself into mostly a validation and allocation function. Since we're doing that we can add "const" to some of the members of the "struct parallel_processes", which helps to clarify and self-document this code. E.g. we never alter the "data" pointer we pass t user callbacks, nor (after the preceding change to stop invoking online_cpus()) do we change "max_processes", the same goes for the "ungroup" option. We can also do away with a call to strbuf_init() in favor of macro initialization, and to rely on other fields being NULL'd or zero'd. Making members of a struct "const" rather that the pointer to the struct itself is usually painful, as e.g. it precludes us from incrementally setting up the structure. In this case we only set it up with the assignment in run_process_parallel() and pp_init(), and don't pass the struct pointer around as "const", so making individual members "const" is worth the potential hassle for extra safety. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	51243f9f0f	run-command API: don't fall back on online_cpus() When a "jobs = 0" is passed let's BUG() out rather than fall back on online_cpus(). The default behavior was added when this API was implemented in `c553c72eed` (run-command: add an asynchronous parallel child processor, 2015-12-15). Most of our code in-tree that scales up to "online_cpus()" by default calls that function by itself. Keeping this default behavior just for the sake of two callers means that we'd need to maintain this one spot where we're second-guessing the config passed down into pp_init(). The preceding commit has an overview of the API callers that passed "jobs = 0". There were only two of them (actually three, but they resolved to these two config parsing codepaths). The "fetch.parallel" caller already had a test for the "fetch.parallel=0" case added in `0353c68818` (fetch: do not run a redundant fetch from submodule, 2022-05-16), but there was no such test for "submodule.fetchJobs". Let's add one here. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:41 -07:00
Ævar Arnfjörð Bjarmason	6a48b428b4	run-command API: make "n" parameter a "size_t" Make the "n" variable added in `c553c72eed` (run-command: add an asynchronous parallel child processor, 2015-12-15) a "size_t". As we'll see in a subsequent commit we do pass "0" here, but never "jobs < 0". We could have made it an "unsigned int", but as we're having to change this let's not leave another case in the codebase where a size_t and "unsigned int" size differ on some platforms. In this case it's likely to never matter, but it's easier to not need to worry about it. After this and preceding changes: make run-command.o DEVOPTS=extra-all CFLAGS=-Wno-unused-parameter Only has one (and new) -Wsigned-compare warning relevant to a comparison about our "n" or "{nr,max}_processes": About using our "n" (size_t) in the same expression as online_cpus() (int). A subsequent commit will adjust & deal with online_cpus() and that warning. The only users of the "n" parameter are: * builtin/fetch.c: defaults to 1, reads from the "fetch.parallel" config. As seen in the code that parses the config added in `d54dea77db` (fetch: let --jobs=<n> parallelize --multiple, too, 2019-10-05) will die if the git_config_int() return value is < 0. It will however pass us n = 0, as we'll see in a subsequent commit. * submodule.c: defaults to 1, reads from "submodule.fetchJobs" config. Read via code originally added in `a028a1930c` (fetching submodules: respect `submodule.fetchJobs` config option, 2016-02-29). It now piggy-backs on the the submodule.fetchJobs code and validation added in `f20e7c1ea2` (submodule: remove submodule.fetchjobs from submodule-config parsing, 2017-08-02). Like builtin/fetch.c it will die if the git_config_int() return value is < 0, but like builtin/fetch.c it will pass us n = 0. * builtin/submodule--helper.c: defaults to 1. Read via code originally added in `2335b870fa` (submodule update: expose parallelism to the user, 2016-02-29). Since `f20e7c1ea2` (submodule: remove submodule.fetchjobs from submodule-config parsing, 2017-08-02) it shares a config parser and semantics with the submodule.c caller. * hook.c: hardcoded to 1, see `96e7225b31` (hook: add 'run' subcommand, 2021-12-22). * t/helper/test-run-command.c: can be -1 after parsing the arguments, but will then be overridden to online_cpus() before passing it to this API. See `be5d88e112` (test-tool run-command: learn to run (parts of) the testsuite, 2019-10-04). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:40 -07:00
Ævar Arnfjörð Bjarmason	7dd5762d9f	run-command API: have "run_processes_parallel{,_tr2}()" return void Change the "run_processes_parallel{,_tr2}()" functions to return void, instead of int. Ever since `c553c72eed` (run-command: add an asynchronous parallel child processor, 2015-12-15) they have unconditionally returned 0. To get a "real" return value out of this function the caller needs to get it via the "task_finished_fn" callback, see the example in hook.c added in `96e7225b31` (hook: add 'run' subcommand, 2021-12-22). So the "result = " and "if (!result)" code added to "builtin/fetch.c" `d54dea77db` (fetch: let --jobs=<n> parallelize --multiple, too, 2019-10-05) has always been redundant, we always took that "if" path. Likewise the "ret =" in "t/helper/test-run-command.c" added in `be5d88e112` (test-tool run-command: learn to run (parts of) the testsuite, 2019-10-04) wasn't used, instead we got the return value from the "if (suite.failed.nr > 0)" block seen in the context. Subsequent commits will alter this API interface, getting rid of this always-zero return value makes it easier to understand those changes. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 14:12:40 -07:00
Jeff King	716c1f649e	pipe_command(): mark stdin descriptor as non-blocking Our pipe_command() helper lets you both write to and read from a child process on its stdin/stdout. It's supposed to work without deadlocks because we use poll() to check when descriptors are ready for reading or writing. But there's a bug: if both the data to be written and the data to be read back exceed the pipe buffer, we'll deadlock. The issue is that the code assumes that if you have, say, a 2MB buffer to write and poll() tells you that the pipe descriptor is ready for writing, that calling: write(cmd->in, buf, 210241024); will do a partial write, filling the pipe buffer and then returning what it did write. And that is what it would do on a socket, but not for a pipe. When writing to a pipe, at least on Linux, it will block waiting for the child process to read() more. And now we have a potential deadlock, because the child may be writing back to us, waiting for us to read() ourselves. An easy way to trigger this is: git -c add.interactive.useBuiltin=true \ -c interactive.diffFilter=cat \ checkout -p HEAD~200 The diff against HEAD~200 will be big, and the filter wants to write all of it back to us (obviously this is a dummy filter, but in the real world something like diff-highlight would similarly stream back a big output). If you set add.interactive.useBuiltin to false, the problem goes away, because now we're not using pipe_command() anymore (instead, that part happens in perl). But this isn't a bug in the interactive code at all. It's the underlying pipe_command() code which is broken, and has been all along. We presumably didn't notice because most calls only do input _or_ output, not both. And the few that do both, like gpg calls, may have large inputs or outputs, but never both at the same time (e.g., consider signing, which has a large payload but a small signature comes back). The obvious fix is to put the descriptor into non-blocking mode, and indeed, that makes the problem go away. Callers shouldn't need to care, because they never see the descriptor (they hand us a buffer to feed into it). The included test fails reliably on Linux without this patch. Curiously, it doesn't fail in our Windows CI environment, but has been reported to do so for individual developers. It should pass in any environment after this patch (courtesy of the compat/ layers added in the last few commits). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:41 -07:00
Jeff King	c6d3cce6f3	pipe_command(): handle ENOSPC when writing to a pipe When write() to a non-blocking pipe fails because the buffer is full, POSIX says we should see EAGAIN. But our mingw_write() compat layer on Windows actually returns ENOSPC for this case. This is probably something we want to correct, but given that we don't plan to use non-blocking descriptors in a lot of places, we can work around it by just catching ENOSPC alongside EAGAIN. If we ever do fix mingw_write(), then this patch can be reverted. We don't actually use a non-blocking pipe yet, so this is still just preparation. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:41 -07:00
Jeff King	14eab817e4	pipe_command(): avoid xwrite() for writing to pipe If xwrite() sees an EAGAIN response, it will loop forever until the write succeeds (or encounters a real error). This is due to `ef1cf0167a` (xwrite: poll on non-blocking FDs, 2016-06-26), with the idea that we won't be surprised by a descriptor unexpectedly set as non-blocking. But that will make things awkward when we do want a non-blocking descriptor, and a future patch will switch pipe_command() to using one. In that case, looping on EAGAIN is bad, because the process on the other end of the pipe may be waiting on us before doing another read() on the pipe, which would mean we deadlock. In practice we're not supposed to ever see EAGAIN here, since poll() will have just told us the descriptor is ready for writing. But our Windows emulation of poll() will always return "ready" for writing to a pipe descriptor! This is due to `94f4d01932` (mingw: workaround for hangs when sending STDIN, 2020-02-17). Our best bet in that case is to keep handling other descriptors, as any read() we do may allow the child command to make forward progress (i.e., its write() finishes, and then it read()s from its stdin, freeing up space in the pipe buffer). This means we might busy-loop between poll() and write() on Windows if the child command is slow to read our input, but it's much better than the alternative of deadlocking. In practice, this busy-looping should be rare: - for small inputs, we'll just write the whole thing in a single write() anyway, non-blocking or not - for larger inputs where the child reads input and then processes it before writing (e.g., gpg verifying a signature), we may make a few extra write() calls that get EAGAIN during the initial write, but once it has taken in the whole input, we'll correctly block waiting to read back the data. - for larger inputs where the child process is streaming output back (like a diff filter), we'll likewise see some extra EAGAINs, but most of them will be followed immediately by a read(), which will let the child command make forward progress. Of course it won't happen at all for now, since we don't yet use a non-blocking pipe. This is just preparation for when we do. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:40 -07:00
Junio C Hamano	da4827056a	Merge branch 'js/wait-or-whine-can-fail' We used to log an error return from wait_or_whine() as process termination of the waited child, which was incorrect. * js/wait-or-whine-can-fail: run-command: don't spam trace2_child_exit()	2022-06-13 15:53:44 -07:00
Junio C Hamano	1a7f6be5b1	Merge branch 'ab/hooks-regression-fix' In Git 2.36 we revamped the way how hooks are invoked. One change that is end-user visible is that the output of a hook is no longer directly connected to the standard output of "git" that spawns the hook, which was noticed post release. This is getting corrected. * ab/hooks-regression-fix: hook API: fix v2.36.0 regression: hooks should be connected to a TTY run-command: add an "ungroup" option to run_process_parallel()	2022-06-13 15:53:41 -07:00
Josh Steadmon	ce3986bb22	run-command: don't spam trace2_child_exit() In rare cases[1], wait_or_whine() cannot determine a child process's status (and will return -1 in this case). This can cause Git to issue trace2 child_exit events despite the fact that the child may still be running. In pathological cases, we've seen > 80 million exit events in our trace logs for a single child process. Fix this by only issuing trace2 events in finish_command_in_signal() if we get a value other than -1 from wait_or_whine(). This can lead to missing child_exit events in such a case, but that is preferable to duplicating events on a scale that threatens to fill the user's filesystem with invalid trace logs. [1]: This can happen when: * waitpid() returns -1 and errno != EINTR * waitpid() returns an invalid PID * the status set by waitpid() has neither the WIFEXITED() nor WIFSIGNALED() flags Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-07 12:48:19 -07:00
Ævar Arnfjörð Bjarmason	fd3aaf53f7	run-command: add an "ungroup" option to run_process_parallel() Extend the parallel execution API added in `c553c72eed` (run-command: add an asynchronous parallel child processor, 2015-12-15) to support a mode where the stdout and stderr of the processes isn't captured and output in a deterministic order, instead we'll leave it to the kernel and stdio to sort it out. This gives the API same functionality as GNU parallel's --ungroup option. As we'll see in a subsequent commit the main reason to want this is to support stdout and stderr being connected to the TTY in the case of jobs=1, demonstrated here with GNU parallel: $ parallel --ungroup 'test -t {} && echo TTY \|\| echo NTTY' ::: 1 2 TTY TTY $ parallel 'test -t {} && echo TTY \|\| echo NTTY' ::: 1 2 NTTY NTTY Another is as GNU parallel's documentation notes a potential for optimization. As demonstrated in next commit our results with "git hook run" will be similar, but generally speaking this shows that if you want to run processes in parallel where the exact order isn't important this can be a lot faster: $ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null ' Benchmark 1: parallel seq ::: 10000000 >/dev/null Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms] Range (min … max): 212.3 ms … 230.5 ms 3 runs Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms] Range (min … max): 153.9 ms … 155.7 ms 3 runs Summary 'parallel --ungroup seq ::: 10000000 >/dev/null ' ran 1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null ' A large part of the juggling in the API is to make the API safer for its maintenance and consumers alike. For the maintenance of the API we e.g. avoid malloc()-ing the "pp->pfd", ensuring that SANITIZE=address and other similar tools will catch any unexpected misuse. For API consumers we take pains to never pass the non-NULL "out" buffer to an API user that provided the "ungroup" option. The resulting code in t/helper/test-run-command.c isn't typical of such a user, i.e. they'd typically use one mode or the other, and would know whether they'd provided "ungroup" or not. We could also avoid the strbuf_init() for "buffered_output" by having "struct parallel_processes" use a static PARALLEL_PROCESSES_INIT initializer, but let's leave that cleanup for later. Using a global "run_processes_parallel_ungroup" variable to enable this option is rather nasty, but is being done here to produce as minimal of a change as possible for a subsequent regression fix. This change is extracted from a larger initial version[1] which ends up with a better end-state for the API, but in doing so needed to modify all existing callers of the API. Let's defer that for now, and narrowly focus on what we need for fixing the regression in the subsequent commit. It's safe to do this with a global variable because: A) hook.c is the only user of it that sets it to non-zero, and before we'll get any other API users we'll refactor away this method of passing in the option, i.e. re-roll [1]. B) Even if hook.c wasn't the only user we don't have callers of this API that concurrently invoke this parallel process starting API itself in parallel. As noted above "A" && "B" are rather nasty, and we don't want to live with those caveats long-term, but for now they should be an acceptable compromise. 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-07 10:01:41 -07:00
Ævar Arnfjörð Bjarmason	b3193252c4	run-command API users: use "env" not "env_array" in comments & names Follow-up on a preceding commit which changed all references to the "env_array" when referring to the "struct child_process" member. These changes are all unnecessary for the compiler, but help the code's human readers. All the comments that referred to "env_array" have now been updated, as well as function names and variables that had "env_array" in their name, they now refer to "env". In addition the "out" name for the submodule.h prototype was inconsistent with the function definition's use of "env_array" in submodule.c. Both of them use "env" now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-02 14:31:27 -07:00
Ævar Arnfjörð Bjarmason	29fda24dd1	run-command API: rename "env_array" to "env" Start following-up on the rename mentioned in `c7c4bdeccf` (run-command API: remove "env" member, always use "env_array", 2021-11-25) of "env_array" to "env". The "env_array" name was picked in `19a583dc39` (run-command: add env_array, an optional argv_array for env, 2014-10-19) because "env" was taken. Let's not forever keep the oddity of "_array" for this "struct strvec", but not for its "args" sibling. This commit is almost entirely made with a coccinelle rule[1]. The only manual change here is in run-command.h to rename the struct member itself and to change "env_array" to "env" in the CHILD_PROCESS_INIT initializer. The rest of this is all a result of applying [1]: make contrib/coccinelle/run_command.cocci.patch * patch -p1 <contrib/coccinelle/run_command.cocci.patch * git add -u 1. cat contrib/coccinelle/run_command.pending.cocci @@ struct child_process E; @@ - E.env_array + E.env @@ struct child_process *E; @@ - E->env_array + E->env I've avoided changing any comments and derived variable names here, that will all be done in the next commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-02 14:31:16 -07:00
Junio C Hamano	c70bc338e9	Merge branch 'ab/config-based-hooks-2' More "config-based hooks". * ab/config-based-hooks-2: run-command: remove old run_hook_{le,ve}() hook API receive-pack: convert push-to-checkout hook to hook.h read-cache: convert post-index-change to use hook.h commit: convert {pre-commit,prepare-commit-msg} hook to hook.h git-p4: use 'git hook' to run hooks send-email: use 'git hook run' for 'sendemail-validate' git hook run: add an --ignore-missing flag hooks: convert worktree 'post-checkout' hook to hook library hooks: convert non-worktree 'post-checkout' hook to hook library merge: convert post-merge to use hook.h am: convert applypatch-msg to use hook.h rebase: convert pre-rebase to use hook.h hook API: add a run_hooks_l() wrapper am: convert {pre,post}-applypatch to use hook.h gc: use hook library for pre-auto-gc hook hook API: add a run_hooks() wrapper hook: add 'run' subcommand	2022-02-09 14:21:00 -08:00
Junio C Hamano	4b51386bbf	Merge branch 'ab/usage-die-message' Code clean-up to hide vreportf() from public API. * ab/usage-die-message: config API: use get_error_routine(), not vreportf() usage.c + gc: add and use a die_message_errno() gc: return from cmd_gc(), don't call exit() usage.c API users: use die_message() for error() + exit 128 usage.c API users: use die_message() for "fatal :" + exit 128 usage.c: add a die_message() routine	2022-01-10 11:52:53 -08:00
Emily Shaffer	95ba86a203	run-command: remove old run_hook_{le,ve}() hook API The new hook.h library has replaced all run-command.h hook-related functionality. So let's delete this dead code. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-07 15:19:35 -08:00
Emily Shaffer	dbb1c61365	read-cache: convert post-index-change to use hook.h Move the post-index-change hook away from run-command.h to and over to the new hook.h library. This removes the last direct user of "run_hook_ve()" outside of run-command.c ("run_hook_le()" still uses it). So we can make the function static now. A subsequent commit will remove this code entirely when "run_hook_le()" itself goes away. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-07 15:19:35 -08:00
Junio C Hamano	832ec72c3e	Merge branch 'ab/run-command' API clean-up. * ab/run-command: run-command API: remove "env" member, always use "env_array" difftool: use "env_array" to simplify memory management run-command API: remove "argv" member, always use "args" run-command API users: use strvec_push(), not argv construction run-command API users: use strvec_pushl(), not argv construction run-command tests: use strvec_pushv(), not argv assignment run-command API users: use strvec_pushv(), not argv assignment upload-archive: use regular "struct child_process" pattern worktree: stop being overly intimate with run_command() internals	2021-12-15 09:39:47 -08:00
Junio C Hamano	97991dfab7	Merge branch 'jk/t7006-sigpipe-tests-fix' The function to cull a child process and determine the exit status had two separate code paths for normal callers and callers in a signal handler, and the latter did not yield correct value when the child has caught a signal. The handling of the exit status has been unified for these two code paths. An existing test with flakiness has also been corrected. * jk/t7006-sigpipe-tests-fix: t7006: simplify exit-code checks for sigpipe tests t7006: clean up SIGPIPE handling in trace2 tests run-command: unify signal and regular logic for wait_or_whine()	2021-12-10 14:35:13 -08:00
Ævar Arnfjörð Bjarmason	e081a7c3b7	usage.c API users: use die_message() for "fatal :" + exit 128 Change code that printed its own "fatal: " message and exited with a status code of 128 to use the die_message() function added in a preceding commit. This change also demonstrates why the return value of die_message_routine() needed to be that of "report_fn". We have callers such as the run-command.c::child_err_spew() which would like to replace its error routine with the return value of "get_die_message_routine()". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-07 13:25:15 -08:00
Ævar Arnfjörð Bjarmason	c7c4bdeccf	run-command API: remove "env" member, always use "env_array" Remove the "env" member from "struct child_process" in favor of always using the "env_array". As with the preceding removal of "argv" in favor of "args" this gets rid of current and future oddities around memory management at the API boundary (see the amended API docs). For some of the conversions we can replace patterns like: child.env = env->v; With: strvec_pushv(&child.env_array, env->v); But for others we need to guard the strvec_pushv() with a NULL check, since we're not passing in the "v" member of a "struct strvec", e.g. in the case of tmp_objdir_env()'s return value. Ideally we'd rename the "env_array" member to simply "env" as a follow-up, since it and "args" are now inconsistent in not having an "_array" suffix, and seemingly without any good reason, unless we look at the history of how they came to be. But as we've currently got 122 in-tree hits for a "git grep env_array" let's leave that for now (and possibly forever). Doing that rename would be too disruptive. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-25 22:15:08 -08:00
Ævar Arnfjörð Bjarmason	d3b2159712	run-command API: remove "argv" member, always use "args" Remove the "argv" member from the run-command API, ever since "args" was added in `c460c0ecdc` (run-command: store an optional argv_array, 2014-05-15) being able to provide either "argv" or "args" has led to some confusion and bugs. If we hadn't gone in that direction and only had an "argv" our problems wouldn't have been solved either, as noted in [1] (and in the documentation amended here) it comes with inherent memory management issues: The caller would have to hang on to the "argv" until the run-command API was finished. If the "argv" was an argument to main() this wasn't an issue, but if it it was manually constructed using the API might be painful. We also have a recent report[2] of a user of the API segfaulting, which is a direct result of it being complex to use. This commit addresses the root cause of that bug. This change is larger than I'd like, but there's no easy way to avoid it that wouldn't involve even more verbose intermediate steps. We use the "argv" as the source of truth over the "args", so we need to change all parts of run-command.[ch] itself, as well as the trace2 logging at the same time. The resulting Windows-specific code in start_command() is a bit nasty, as we're now assigning to a strvec's "v" member, instead of to our own "argv". There was a suggestion of some alternate approaches in reply to an earlier version of this commit[3], but let's leave larger a larger and needless refactoring of this code for now. 1. http://lore.kernel.org/git/YT6BnnXeAWn8BycF@coredump.intra.peff.net 2. https://lore.kernel.org/git/20211120194048.12125-1-ematsumiya@suse.de/ 3. https://lore.kernel.org/git/patch-5.5-ea1011f7473-20211122T153605Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-25 22:15:07 -08:00
Ævar Arnfjörð Bjarmason	6def0ff878	run-command API users: use strvec_pushv(), not argv assignment Migrate those run-command API users that assign directly to the "argv" member to use a strvec_pushv() of "args" instead. In these cases it did not make sense to further refactor these callers, e.g. daemon.c could be made to construct the arguments closer to handle(), but that would require moving the construction from its cmd_main() and pass "argv" through two intermediate functions. It would be possible for a change like this to introduce a regression if we were doing: cp.argv = argv; argv[1] = "foo"; And changed the code, as is being done here, to: strvec_pushv(&cp.args, argv); argv[1] = "foo"; But as viewing this change with the "-W" flag reveals none of these functions modify variable that's being pushed afterwards in a way that would introduce such a logic error. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-25 22:15:07 -08:00
Jeff King	96bfb2d8ce	run-command: unify signal and regular logic for wait_or_whine() Since `507d7804c0` (pager: don't use unsafe functions in signal handlers, 2015-09-04), we have a separate code path in wait_or_whine() for the case that we're in a signal handler. But that code path misses some of the cases handled by the main logic. This was improved in `be8fc53e36` (pager: properly log pager exit code when signalled, 2021-02-02), but that covered only case: actually returning the correct error code. But there are some other cases: - if waitpid() returns failure, we wouldn't notice and would look at uninitialized garbage in the status variable; it's not clear if it's possible to trigger this or not - if the process exited by signal, then we would still report "-1" rather than the correct signal code This latter case even had a test added in `be8fc53e36`, but it doesn't work reliably. It sets the pager command to: >pager-used; test-tool sigchain The latter command will die by signal, but because there are multiple commands, there will be a shell in between. And it's the shell whose waitpid() call will see the signal death, and it will then exit with code 143, which is what Git will see. To make matters even more confusing, some shells (such as bash) will realize that there's nothing for the shell to do after test-tool finishes, and will turn it into an exec. So the test was only checking what it thought when /bin/sh points to a shell like bash (we're relying on the shell used internally by Git to spawn sub-commands here, so even running the test under bash would not be enough). This patch adjusts the tests to explicitly call "exec" in the pager command, which produces a consistent outcome regardless of shell. Note that without the code change in this patch it _should_ fail reliably, but doesn't. That test, like its siblings, tries to trigger SIGPIPE in the git-log process writing to the pager, but only do so racily. That will be fixed in a follow-on patch. For the code change here, we have two options: - we can teach the in_signal code to handle WIFSIGNALED() - we can stop returning early when in_signal is set, and instead annotate individual calls that we need to skip in this case The former is a simpler patch, but means we're essentially duplicating all of the logic. So instead I went with the latter. The result is a bigger patch, and we do run the risk of new code being added but forgetting to handle in_signal. But in the long run it seems more maintainable. I've skipped any non-trivial calls for the in_signal case, like calling error(). We'll also skip the call to clear_child_for_cleanup(), as we were before. This is arguably the wrong thing to do, since we wouldn't want to try to clean it up again. But: - we can't call it as-is, because it calls free(), which we must avoid in a signal handler (we'd have to pass in_signal so it can skip the free() call) - we'll only go through the list of children to clean once, since our cleanup_children_on_signal() handler pops itself after running (and then re-raises, so eventually we'd just exit). So this cleanup only matters if a process is on the cleanup list _and_ it has a separate handler to clean itself up. Which is questionable in the first place (and AFAIK we do not do). - double-cleanup isn't actually that bad anyway. waitpid() will just return an error, which we won't even report because of in_signal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-22 15:43:18 -08:00
Junio C Hamano	a73934c320	Merge branch 'vd/pthread-setspecific-g11-fix' One CI task based on Fedora image noticed a not-quite-kosher consturct recently, which has been corrected. * vd/pthread-setspecific-g11-fix: async_die_is_recursing: work around GCC v11.x issue on Fedora	2021-11-04 12:07:47 -07:00
Victoria Dye	4b540cf913	async_die_is_recursing: work around GCC v11.x issue on Fedora This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI build, first appearing after the introduction of a new version of the Fedora docker image version. This image includes a version of `glibc` with the attribute `__attr_access_none` added to `pthread_setspecific` [1], the implementation of which only exists for GCC 11.X - the version included in the Fedora image. The attribute requires that the pointer provided in the second argument of `pthread_getspecific` must, if not NULL, be a pointer to a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not valid, causing the error. This fix imitates a workaround added in SELinux [2] by using the pointer to the static `async_die_counter` itself as the second argument to `pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the intent of the current usage while not triggering the build error. [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8 [2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/ Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Victoria Dye <vdye@github.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-03 23:12:10 -07:00
Junio C Hamano	af303ee392	Merge branch 'jh/builtin-fsmonitor-part1' Built-in fsmonitor (part 1). * jh/builtin-fsmonitor-part1: t/helper/simple-ipc: convert test-simple-ipc to use start_bg_command run-command: create start_bg_command simple-ipc/ipc-win32: add Windows ACL to named pipe simple-ipc/ipc-win32: add trace2 debugging simple-ipc: move definition of ipc_active_state outside of ifdef simple-ipc: preparations for supporting binary messages. trace2: add trace2_child_ready() to report on background children	2021-10-13 15:15:58 -07:00
Ævar Arnfjörð Bjarmason	5e3aba33da	hook.[ch]: move find_hook() from run-command.c to hook.c Move the find_hook() function from run-command.c to a new hook.c library. This change establishes a stub library that's pretty pointless right now, but will see much wider use with Emily Shaffer's upcoming "configuration-based hooks" series. Eventually all the hook related code will live in hook.[ch]. Let's start that process by moving the simple find_hook() function over as-is. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 09:44:54 -07:00
Junio C Hamano	0a4cb1f1f2	Merge branch 'mr/bisect-in-c-4' Rewrite of "git bisect" in C continues. * mr/bisect-in-c-4: bisect--helper: retire `--bisect-next-check` subcommand bisect--helper: reimplement `bisect_run` shell function in C bisect--helper: reimplement `bisect_visualize()` shell function in C run-command: make `exists_in_PATH()` non-static t6030-bisect-porcelain: add test for bisect visualize t6030-bisect-porcelain: add tests to control bisect run exit cases	2021-09-23 13:44:48 -07:00
Junio C Hamano	c042ad5ad5	Merge branch 'js/run-command-close-packs' The run-command API has been updated so that the callers can easily ask the file descriptors open for packfiles to be closed immediately before spawning commands that may trigger auto-gc. * js/run-command-close-packs: Close object store closer to spawning child processes run_auto_maintenance(): implicitly close the object store run-command: offer to close the object store before running run-command: prettify the `RUN_COMMAND_*` flags pull: release packs before fetching commit-graph: when closing the graph, also release the slab	2021-09-20 15:20:45 -07:00
Jeff Hostetler	fdb1322651	run-command: create start_bg_command Create a variation of `run_command()` and `start_command()` to launch a command into the background and optionally wait for it to become "ready" before returning. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-20 08:57:58 -07:00

1 2 3 4 5 ...

287 Коммитов