microsoft/git - git

Граф коммитов

Автор	SHA1	Сообщение	Дата
Derrick Stolee	f37b612cf7	setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, `8959555cee` (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states, so some positive results are successful. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com>	2022-07-12 12:28:23 +02:00
Neeraj Singh	3d7138022e	unpack-trees:virtualfilesystem: Improve efficiency of clear_ce_flags When the virtualfilesystem is enabled the previous implementation of clear_ce_flags would iterate all of the cache entries and query whether each one is in the virtual filesystem to determine whether to clear one of the SKIP_WORKTREE bits. For each cache entry, we would do a hash lookup for each parent directory in the is_included_in_virtualfilesystem function. The former approach is slow for a typical Windows OS enlistment with 3 million files where only a small percentage is in the virtual filesystem. The cost is O(n_index_entries * n_chars_per_path * n_parent_directories_per_path). In this change, we use the same approach as apply_virtualfilesystem, which iterates the set of entries in the virtualfilesystem and searches in the cache for the corresponding entries in order to clear their flags. This approach has a cost of O(n_virtual_filesystem_entries * n_chars_per_path * log(n_index_entries)). The apply_virtualfilesystem code was refactored a bit and modified to clear flags for all names that 'alias' a given virtual filesystem name when ignore_case is set. n_virtual_filesystem_entries is typically much less than n_index_entries, in which case the new approach is much faster. We wind up building the name hash for the index, but this occurs quickly thanks to the multi-threading. Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com>	2022-07-12 12:28:14 +02:00
Jeff Hostetler	00ed2071c0	sha1-file: create shared-cache directory if it doesn't exist The config variable `gvfs.sharedCache` contains the pathname to an alternate <odb> that will be used by `gvfs-helper` to store dynamically-fetched missing objects. If this directory does not exist on disk, `prepare_alt_odb()` omits this directory from the in-memory list of alternates. This causes `git` commands (and `gvfs-helper` in particular) to fall-back to `.git/objects` for storage of these objects. This disables the shared-cache and leads to poorer performance. Teach `alt_obj_usable()` and `prepare_alt_odb()`, match up the directory named in `gvfs.sharedCache` with an entry in `.git/objects/info/alternates` and force-create the `<odb>` root directory (and the associated `<odb>/pack` directory) if necessary. If the value of `gvfs.sharedCache` refers to a directory that is NOT listed as an alternate, create an in-memory alternate entry in the odb-list. (This is similar to how GIT_ALTERNATE_OBJECT_DIRECTORIES works.) This work happens the first time that `prepare_alt_odb()` is called. Furthermore, teach the `--shared-cache=<odb>` command line option in `gvfs-helper` (which is runs after the first call to `prepare_alt_odb()`) to override the inherited shared-cache (and again, create the ODB directory if necessary). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-07-12 12:28:12 +02:00
Jeff Hostetler	80b339ac75	gvfs-helper: create tool to fetch objects using the GVFS Protocol Create gvfs-helper. This is a helper tool to use the GVFS Protocol REST API to fetch objects and configuration data from a GVFS cache-server or Git server. This tool uses libcurl to send object requests to either server. This tool creates loose objects and/or packfiles. Create gvfs-helper-client. This code resides within git proper and uses the sub-process API to manage gvfs-helper as a long-running background process. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>	2022-07-12 12:28:12 +02:00
Kevin Willford	a176b041df	fsmonitor: check CE_FSMONITOR_VALID in ce_uptodate When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when wanting to know if the entry has been updated. If the flag is set the entry should be considered up to date and the same as if the CE_UPTODATE is set. In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to be refreshed when the fsmonitor bitmap is applied to the index in tweak_fsmonitor. Since the fsmonitor data is kept up to date for every command, some tests needed to be updated to take that into account. istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since that is being called in tweak_fsmonitor. refresh_fsmonitor will only be called once and any other callers should be setting it when refreshing the fsmonitor data so that code can use the fsmonitor data when checking untracked files. When writing the index, fsmonitor_last_update is used to determine if the fsmonitor bitmap should be created and the extension data written to the index. When running through unpack-trees this is not copied to the result index. This makes the next time a git command is ran do all the work of lstating all files to determine what is clean since all entries in the index are marked as dirty since there wasn't any fsmonitor data saved in the index extension. Copying the fsmonitor_last_update to the result index will cause the extension data for fsmonitor to be in the index for the next git command to use. Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>	2022-07-12 12:28:10 +02:00
Ben Peart	b734f746a6	Add virtual file system settings and hook proc On index load, clear/set the skip worktree bits based on the virtual file system data. Use virtual file system data to update skip-worktree bit in unpack-trees. Use virtual file system data to exclude files and folders not explicitly requested. Update 2022-04-05: disable the "present-despite-SKIP_WORKTREE" file removal behavior when 'core.virtualfilesystem' is enabled. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2022-07-12 12:26:34 +02:00
Ben Peart	56ba8237e0	gvfs: allow "virtualizing" objects The idea is to allow blob objects to be missing from the local repository, and to load them lazily on demand. After discussing this idea on the mailing list, we will rename the feature to "lazy clone" and work more on this. Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>	2022-07-12 12:26:32 +02:00
Kevin Willford	cd86be3b98	gvfs: add the core.gvfs config setting This does not do anything yet. The next patches will add various values for that config setting that correspond to the various features offered/required by GVFS. Signed-off-by: Kevin Willford <kewillf@microsoft.com>	2022-07-12 12:26:32 +02:00
Johannes Schindelin	e5f35b1f13	clean: do not traverse mount points It seems to be not exactly rare on Windows to install NTFS junction points (the equivalent of "bind mounts" on Linux/Unix) in worktrees, e.g. to map some development tools into a subdirectory. In such a scenario, it is pretty horrible if `git clean -dfx` traverses into the mapped directory and starts to "clean up". Let's just not do that. Let's make sure before we traverse into a directory that it is not a mount point (or junction). This addresses https://github.com/git-for-windows/git/issues/607 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-07-09 23:26:22 +02:00
Junio C Hamano	c276c21da6	Merge branch 'ds/sparse-sparse-checkout' "sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add\|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test	2022-06-03 14:30:35 -07:00
Junio C Hamano	83937e9592	Merge branch 'ns/batch-fsync' Introduce a filesystem-dependent mechanism to optimize the way the bits for many loose object files are ensured to hit the disk platter. * ns/batch-fsync: core.fsyncmethod: performance tests for batch mode t/perf: add iteration setup mechanism to perf-lib core.fsyncmethod: tests for batch mode test-lib-functions: add parsing helpers for ls-files and ls-tree core.fsync: use batch mode and sync loose objects by default on Windows unpack-objects: use the bulk-checkin infrastructure update-index: use the bulk-checkin infrastructure builtin/add: add ODB transaction around add_files_to_cache cache-tree: use ODB transaction around writing a tree core.fsyncmethod: batched disk flushes for loose-objects bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' bulk-checkin: rename 'state' variable and separate 'plugged' boolean	2022-06-03 14:30:34 -07:00
Derrick Stolee	9fadb373dd	sparse-index: introduce partially-sparse indexes A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * INDEX_EXPANDED (0): No sparse directories exist. This is always the case for repositories that do not use cone-mode sparse-checkout. * INDEX_COLLAPSED: Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in INDEX_EXPANDED mode but to actually do the necessary work when in INDEX_PARTIALLY_SPARSE mode. The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Junio C Hamano	8b28e2e2e4	Merge branch 'ds/midx-normalize-pathname-before-comparison' The path taken by "git multi-pack-index" command from the end user was compared with path internally prepared by the tool withut first normalizing, which lead to duplicated paths not being noticed, which has been corrected. * ds/midx-normalize-pathname-before-comparison: cache: use const char * for get_object_directory() multi-pack-index: use --object-dir real path midx: use real paths in lookup_multi_pack_index()	2022-05-04 09:51:29 -07:00
Derrick Stolee	11f9e8de3d	cache: use const char * for get_object_directory() The get_object_directory() method returns the exact string stored at the_repository->objects->odb->path. The return type of "char " implies that the caller must keep track of the buffer and free() it when complete. This causes significant problems later when the ODB is accessed. Use "const char " as the return type to avoid this confusion. There are no current callers that care about the non-const definition. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-25 11:31:13 -07:00
Neeraj Singh	8a94d83349	core.fsync: use batch mode and sync loose objects by default on Windows Git for Windows has defaulted to core.fsyncObjectFiles=true since September 2017. We turn on syncing of loose object files with batch mode in upstream Git so that we can get broad coverage of the new code upstream. We don't actually do fsyncs in the most of the test suite, since GIT_TEST_FSYNC is set to 0. However, we do exercise all of the surrounding batch mode code since GIT_TEST_FSYNC merely makes the maybe_fsync wrapper always appear to succeed. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-06 13:13:26 -07:00
Neeraj Singh	c0f4752ed2	core.fsyncmethod: batched disk flushes for loose-objects When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin odb-transaction functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1a. Create the object in a tmp-objdir. 2a. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1b. Issue an fsync against a dummy file to flush the log and hardware writeback cache, which should by now have seen the tmp-objdir writes. 2b. Rename all of the tmp-objdir files to their final names. 3b. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This sequence also ensures that no object files appear in the main object store unless they are fsync-durable. Batch mode is only enabled if core.fsync includes loose-objects. If the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does not include loose-objects, we will use file-by-file fsyncing. In step (1a) of the sequence, the tmp-objdir is created lazily to avoid work if no loose objects are ever added to the ODB. We use a tmp-objdir to maintain the invariant that no loose-objects are visible in the main ODB unless they are properly fsync-durable. This is important since future ODB operations that try to create an object with specific contents will silently drop the new data if an object with the target hash exists without checking that the loose-object contents match the hash. Only a full git-fsck would restore the ODB to a functional state where dataloss doesn't occur. In step (1b) of the sequence, we issue a fsync against a dummy file created specifically for the purpose. This method has a little higher cost than using one of the input object files, but makes adding new callers of this mechanism easier, since we don't need to figure out which object file is "last" or risk sharing violations by caching the fd of the last object file. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing \| Linux \| Mac \| Windows --------------------\|-------\|-------\|-------- disabled \| 0.06 \| 0.35 \| 0.61 fsync \| 1.88 \| 11.18 \| 2.47 batch \| 0.15 \| 0.41 \| 1.53 Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-06 13:13:01 -07:00
Junio C Hamano	fca85986bb	Merge branch 'ns/core-fsyncmethod' into ns/batch-fsync * ns/core-fsyncmethod: configure.ac: fix HAVE_SYNC_FILE_RANGE definition core.fsyncmethod: correctly camel-case warning message core.fsync: fix incorrect expression for default configuration core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-04-06 13:01:54 -07:00
Junio C Hamano	439c1e6d5d	Merge branch 'jh/builtin-fsmonitor-part2' Built-in fsmonitor (part 2). * jh/builtin-fsmonitor-part2: (30 commits) t7527: test status with untracked-cache and fsmonitor--daemon fsmonitor: force update index after large responses fsmonitor--daemon: use a cookie file to sync with file system fsmonitor--daemon: periodically truncate list of modified files t/perf/p7519: add fsmonitor--daemon test cases t/perf/p7519: speed up test on Windows t/perf/p7519: fix coding style t/helper/test-chmtime: skip directories on Windows t/perf: avoid copying builtin fsmonitor files into test repo t7527: create test for fsmonitor--daemon t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon help: include fsmonitor--daemon feature flag in version info fsmonitor--daemon: implement handle_client callback compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows fsmonitor--daemon: create token-based changed path cache fsmonitor--daemon: define token-ids fsmonitor--daemon: add pathname classification fsmonitor--daemon: implement 'start' command ...	2022-04-04 10:56:24 -07:00
Junio C Hamano	27dd460799	Merge branch 'ns/core-fsyncmethod' A couple of fix-up to a topic that is now in 'master'. * ns/core-fsyncmethod: core.fsyncmethod: correctly camel-case warning message core.fsync: fix incorrect expression for default configuration	2022-04-04 10:56:22 -07:00
Neeraj Singh	e5ec440c98	core.fsync: fix incorrect expression for default configuration Commit `b9f5d035` (core.fsync: documentation and user-friendly aggregate options, 2022-03-15) introduced an incorrect value for FSYNC_COMPONENTS_DEFAULT. We need an AND-NOT rather than OR-NOT. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-29 16:04:16 -07:00
Junio C Hamano	6e1a8952e9	Merge branch 'ps/fsync-refs' Updates to refs traditionally weren't fsync'ed, but we can configure using core.fsync variable to do so. * ps/fsync-refs: core.fsync: new option to harden references	2022-03-25 16:38:25 -07:00
Junio C Hamano	eb804cd405	Merge branch 'ns/core-fsyncmethod' Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-03-25 16:38:24 -07:00
Jeff Hostetler	1e0ea5c431	fsmonitor: config settings are repository-specific Move fsmonitor config settings to a new and opaque `struct fsmonitor_settings` structure. Add a lazily-loaded pointer to this into `struct repo_settings` Create an `enum fsmonitor_mode` type in `struct fsmonitor_settings` to represent the state of fsmonitor. This lets us represent which, if any, fsmonitor provider (hook or IPC) is enabled. Create `fsm_settings__get_()` getters to lazily look up fsmonitor- related config settings. Get rid of the `core_fsmonitor` global variable. Move the code to lookup the existing `core.fsmonitor` config value into the fsmonitor settings. Create a hook pathname variable in `struct fsmonitor-settings` and only set it when in hook mode. Extend the definition of `core.fsmonitor` to be either a boolean or a hook pathname. When true, the builtin FSMonitor is used. When false or unset, no FSMonitor (neither builtin nor hook) is used. The existing `core_fsmonitor` global variable was used to store the pathname to the fsmonitor hook and* it was used as a boolean to see if fsmonitor was enabled. This dual usage and global visibility leads to confusion when we add the IPC-based provider. So lets hide the details in fsmonitor-settings.c and let it decide which provider to use in the case of multiple settings. This avoids cluttering up repo-settings.c with these private details. A future commit in builtin-fsmonitor series will add the ability to disqualify worktrees for various reasons, such as being mounted from a remote volume, where fsmonitor should not be started. Having the config settings hidden in fsmonitor-settings.c allows such worktree restrictions to override the config values used. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-25 16:04:15 -07:00
Junio C Hamano	430883a70c	Merge branch 'ab/object-file-api-updates' Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in .c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables	2022-03-16 17:53:08 -07:00
Patrick Steinhardt	bc22d845c4	core.fsync: new option to harden references When writing both loose and packed references to disk we first create a lockfile, write the updated values into that lockfile, and on commit we rename the file into place. According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency [1][2][3]. If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data. This kind of corruption can in fact be easily observed with Git when the machine hard-resets shortly after writing references to disk. On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk. The result is that the reference is corrupt, and in the worst case this can lead to data loss. Implement a new option to harden references so that users and admins can avoid this scenario by syncing locked loose and packed references to disk before we rename them into place. [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-15 13:30:58 -07:00
Junio C Hamano	0099792400	Merge branch 'ns/core-fsyncmethod' into ps/fsync-refs * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-03-15 13:30:37 -07:00
Neeraj Singh	b9f5d0358d	core.fsync: documentation and user-friendly aggregate options This commit adds aggregate options for the core.fsync setting that are more user-friendly. These options are specified in terms of 'levels of safety', indicating which Git operations are considered to be sync points for durability. The new documentation is also included here in its entirety for ease of review. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-15 12:32:55 -07:00
Neeraj Singh	ba95e96d4c	core.fsync: new option to harden the index This commit introduces the new ability for the user to harden the index. In the event of a system crash, the index must be durable for the user to actually find a file that has been added to the repo and then deleted from the working tree. We use the presence of the COMMIT_LOCK flag and absence of the alternate_index_output as a proxy for determining whether we're updating the persistent index of the repo or some temporary index. We don't sync these temporary indexes. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Neeraj Singh	020406eaa5	core.fsync: introduce granular fsync control infrastructure This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Neeraj Singh	abf38abec2	core.fsyncmethod: add writeout-only mode This commit introduces the `core.fsyncMethod` configuration knob, which can currently be set to `fsync` or `writeout-only`. The new writeout-only mode attempts to tell the operating system to flush its in-memory page cache to the storage hardware without issuing a CACHE_FLUSH command to the storage controller. Writeout-only fsync is significantly faster than a vanilla fsync on common hardware, since data is written to a disk-side cache rather than all the way to a durable medium. Later changes in this patch series will take advantage of this primitive to implement batching of hardware flushes. When git_fsync is called with FSYNC_WRITEOUT_ONLY, it may fail and the caller is expected to do an ordinary fsync as needed. On Apple platforms, the fsync system call does not issue a CACHE_FLUSH directive to the storage controller. This change updates fsync to do fcntl(F_FULLFSYNC) to make fsync actually durable. We maintain parity with existing behavior on Apple platforms by setting the default value of the new core.fsyncMethod option. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Junio C Hamano	82386b4496	Merge branch 'en/present-despite-skipped' In sparse-checkouts, files mis-marked as missing from the working tree could lead to later problems. Such files were hard to discover, and harder to correct. Automatically detecting and correcting the marking of such files has been added to avoid these problems. * en/present-despite-skipped: repo_read_index: add config to expect files outside sparse patterns Accelerate clear_skip_worktree_from_present_files() by caching Update documentation related to sparsity and the skip-worktree bit repo_read_index: clear SKIP_WORKTREE bit from files present in worktree unpack-trees: fix accidental loss of user changes t1011: add testcase demonstrating accidental loss of user modifications	2022-03-09 13:38:23 -08:00
Elijah Newren	ecc7c8841d	repo_read_index: add config to expect files outside sparse patterns Typically with sparse checkouts, we expect files outside the sparsity patterns to be marked as SKIP_WORKTREE and be missing from the working tree. Sometimes this expectation would be violated however; including in cases such as: * users grabbing files from elsewhere and writing them to the worktree (perhaps by editing a cached copy in an editor, copying/renaming, or even untarring) * various git commands having incomplete or no support for the SKIP_WORKTREE bit[1,2] * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. When the SKIP_WORKTREE bit in the index did not reflect the presence of the file in the working tree, it traditionally caused confusion and was difficult to detect and recover from. So, in a sparse checkout, since `af6a51875a` (repo_read_index: clear SKIP_WORKTREE bit from files present in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE bit at index read time for entries corresponding to files that are present in the working tree. There is another workflow, however, where it is expected that paths outside the sparsity patterns appear to exist in the working tree and that they do not lose the SKIP_WORKTREE bit, at least until they get modified. A Git-aware virtual file system[4] takes advantage of its position as a file system driver to expose all files in the working tree, fetch them on demand using partial clone on access, and tell Git to pay attention to them on demand by updating the sparse checkout pattern on writes. This means that commands like "git status" only have to examine files that have potentially been modified, whereas commands like "ls" are able to show the entire codebase without requiring manual updates to the sparse checkout pattern. Thus since `af6a51875a`, Git with such Git-aware virtual file systems unsets the SKIP_WORKTREE bit for all files and commands like "git status" have to fetch and examine them all. Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to allow limiting the tracked set of files to a small set once again. A Git-aware virtual file system or other application that wants to maintain files outside of the sparse checkout can set this in a repository to instruct Git not to check for the presence of SKIP_WORKTREE files. The setting defaults to false, so most users of sparse checkout will still get the benefit of an automatically updating index to recover from the variety of difficult issues detailed in `af6a51875a` for paths with SKIP_WORKTREE set despite the path being present. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] The three long paragraphs in the middle of https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ [4] such as the vfsd described in https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 23:37:48 -08:00
Ævar Arnfjörð Bjarmason	6aea6baeb3	object-file API: pass an enum to read_object_with_reference() Change the read_object_with_reference() function to take an "enum object_type". It was not prepared to handle an arbitrary "const char *type", as it was itself calling type_from_string(). Let's change the only caller that passes in user data to use type_from_string(), and convert the rest to use e.g. "OBJ_TREE" instead of "tree_type". The "cat-file" caller is not on the codepath that handles"--allow-unknown", so the type_from_string() there is safe. Its use of type_from_string() doesn't functionally differ from that of the pre-image. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	44439c1c58	object-file API: have hash_object_file() take "enum object_type" Change the hash_object_file() function to take an "enum object_type". Since a preceding commit all of its callers are passing either "{commit,tree,blob,tag}_type", or the result of a call to type_name(), the parse_object() caller that would pass NULL is now using stream_object_signature(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	0f156dbb04	object-file API: split up and simplify check_object_signature() Split up the check_object_signature() function into that non-streaming version (it accepts an already filled "buf"), and a new stream_object_signature() which will retrieve the object from storage, and hash it on-the-fly. All of the callers of check_object_signature() were effectively calling two different functions, if we go by cyclomatic complexity. I.e. they'd either take the early "if (map)" branch and return early, or not. This has been the case since the "if (map)" condition was added in `090ea12671` (parse_object: avoid putting whole blob in core, 2012-03-07). We can then further simplify the resulting check_object_signature() function since only one caller wanted to pass a non-NULL "buf" and a non-NULL "real_oidp". That "read_loose_object()" codepath used by "git fsck" can instead use hash_object_file() followed by oideq(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	ee213de22d	object API users + docs: check <0, not !0 with check_object_signature() Change those users of the object API that misused check_object_signature() by assuming it returned any non-zero when the OID didn't match the expected value to check <0 instead. In practice all of this code worked before, but it wasn't consistent with rest of the users of the API. Let's also clarify what the <0 return value means in API docs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	cdcaaec9a6	object API docs: move check_object_signature() docs to cache.h Move the API documentation for check_object_signature() to cache.h, where its prototype is declared. This is in preparation for adding a companion function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Junio C Hamano	0a01df08c0	Merge branch 'ab/date-mode-release' Plug (some) memory leaks around parse_date_format(). * ab/date-mode-release: date API: add and use a date_mode_release() date API: add basic API docs date API: provide and use a DATE_MODE_INIT date API: create a date.h, split from cache.h cache.h: remove always unused show_date_human() declaration	2022-02-25 15:47:36 -08:00
Junio C Hamano	00e38ba6d8	Merge branch 'ab/auto-detect-zlib-compress2' The build procedure has been taught to notice older version of zlib and enable our replacement uncompress2() automatically. * ab/auto-detect-zlib-compress2: compat: auto-detect if zlib has uncompress2()	2022-02-16 15:14:30 -08:00
Ævar Arnfjörð Bjarmason	88c7b4c3c8	date API: create a date.h, split from cache.h Move the declaration of the date.c functions from cache.h, and adjust the relevant users to include the new date.h header. The show_ident_date() function belonged in pretty.h (it's defined in pretty.c), its two users outside of pretty.c didn't strictly need to include pretty.h, as they get it indirectly, but let's add it to them anyway. Similarly, the change to "builtin/{fast-import,show-branch,tag}.c" isn't needed as far as the compiler is concerned, but since they all use the "DATE_MODE()" macro we now define in date.h, let's have them include it. We could simply include this new header in "cache.h", but as this change shows these functions weren't common enough to warrant including in it in the first place. By moving them out of cache.h changes to this API will no longer cause a (mostly) full re-build of the project when "make" is run. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-16 09:40:00 -08:00
Ævar Arnfjörð Bjarmason	f6c71f81f9	cache.h: remove always unused show_date_human() declaration There has never been a show_date_human() function on the "master" branch in git.git. This declaration was added in `b841d4ff43` (Add `human` format to test-tool, 2019-01-28). A look at the ML history reveals that it was leftover cruft from an earlier version of that commit[1]. 1. https://lore.kernel.org/git/20190118061805.19086-5-ischis2@cox.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-16 09:40:00 -08:00
Junio C Hamano	ee52b35e50	Merge branch 'ms/update-index-racy' "git update-index --refresh" has been taught to deal better with racy timestamps (just like "git status" already does). * ms/update-index-racy: update-index: refresh should rewrite index in case of racy timestamps t7508: add tests capturing racy timestamp handling t7508: fix bogus mtime verification test-lib: introduce API for verifying file mtime	2022-02-05 09:42:32 -08:00
Junio C Hamano	008028a910	Merge branch 'ab/cat-file' Assorted updates to "git cat-file", especially "-h". * ab/cat-file: cat-file: s/_/-/ in typo'd usage_msg_optf() message cat-file: don't whitespace-pad "(...)" in SYNOPSIS and usage output cat-file: use GET_OID_ONLY_TO_DIE in --(textconv\|filters) object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY cat-file: correct and improve usage information cat-file: fix remaining usage bugs cat-file: make --batch-all-objects a CMDMODE cat-file: move "usage" variable to cmd_cat_file() cat-file docs: fix SYNOPSIS and "-h" output parse-options API: add a usage_msg_optf() cat-file tests: test messaging on bad objects/paths cat-file tests: test bad usage	2022-02-05 09:42:31 -08:00
Ævar Arnfjörð Bjarmason	07564773c2	compat: auto-detect if zlib has uncompress2() We have a copy of uncompress2() implementation in compat/ so that we can build with an older version of zlib that lack the function, and the build procedure selects if it is used via the NO_UNCOMPRESS2 $(MAKE) variable. This is yet another "annoying" knob the porters need to tweak on platforms that are not common enough to have the default set in the config.mak.uname file. Attempt to instead ask the system header <zlib.h> to decide if we need the compatibility implementation. This is a deviation from the way we have been handling the "compatiblity" features so far, and if it can be done cleanly enough, it could work as a model for features that need compatibility definition we discover in the future. With that goal in mind, avoid expedient but ugly hacks, like shoving the code that is conditionally compiled into an unrelated .c file, which may not work in future cases---instead, take an approach that uses a file that is independently compiled and stands on its own. Compile and link compat/zlib-uncompress2.c file unconditionally, but conditionally hide the implementation behind #if/#endif when zlib version is 1.2.9 or newer, and unconditionally archive the resulting object file in the libgit.a to be picked up by the linker. There are a few things to note in the shape of the code base after this change: - We no longer use NO_UNCOMPRESS2 knob; if the system header <zlib.h> claims a version that is more cent than the library actually is, this would break, but it is easy to add it back when we find such a system. - The object file compat/zlib-uncompress2.o is always compiled and archived in libgit.a, just like a few other compat/ object files already are. - The inclusion of <zlib.h> is done in <git-compat-util.h>; we used to do so from <cache.h> which includes <git-compat-util.h> as the first thing it does, so from the *.c codes, there is no practical change. - Until objects in libgit.a that is already used gains a reference to the function, the reftable code will be the only one that wants it, so libgit.a on the linker command line needs to appear once more at the end to satisify the mutual dependency. - Beat found a trick used by OpenSSL to avoid making the conditionally-compiled object truly empty (apparently because they had to deal with compilers that do not want to see an effectively empty input file). Our compat/zlib-uncompress2.c file borrows the same trick for portabilty. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-26 09:05:55 -08:00
Junio C Hamano	453cef7455	Merge branch 'ma/header-dup-cleanup' Code clean-up. * ma/header-dup-cleanup: cache.h: drop duplicate `ensure_full_index()` declaration	2022-01-12 15:11:43 -08:00
Martin Ågren	97d6fb5a1f	cache.h: drop duplicate `ensure_full_index()` declaration There are two identical declarations of `ensure_full_index()` in cache.h. Commit `3964fc2aae` ("sparse-index: add guard to ensure full index", 2021-03-30) provided an empty implementation of `ensure_full_index()`, declaring it in a new file sparse-index.h. When commit `4300f8442a` ("sparse-index: implement ensure_full_index()", 2021-03-30) fleshed out the implementation, it added an identical declaration to cache.h. Then `118a2e8bde` ("cache: move ensure_full_index() to cache.h", 2021-04-01) favored having the declaration in cache.h. Because of the double declaration, at that point we could have just dropped the one in sparse-index.h, but instead it got moved to cache.h. As a result, cache.h contains the exact same function declaration twice. Drop the one under "/* Name hashing /", in favor of the one under "/ Initialize and use the cache information */". Signed-off-by: Martin Ågren <martin.agren@gmail.com> Acked-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-10 11:30:33 -08:00
Marc Strapetz	2ede073fd2	update-index: refresh should rewrite index in case of racy timestamps 'git update-index --refresh' and '--really-refresh' should force writing of the index file if racy timestamps have been encountered, as 'git status' already does [1]. Note that calling 'git update-index --refresh' still does not guarantee that there will be no more racy timestamps afterwards (the same holds true for 'git status'): - calling 'git update-index --refresh' immediately after touching and adding a file may still leave racy timestamps if all three operations occur within the racy-tolerance (usually 1 second unless USE_NSEC has been defined) - calling 'git update-index --refresh' for timestamps which are set into the future will leave them racy To guarantee that such racy timestamps will be resolved would require to wait until the system clock has passed beyond these timestamps and only then write the index file. Especially for future timestamps, this does not seem feasible because of possibly long delays/hangs. [1] https://lore.kernel.org/git/d3dd805c-7c1d-30a9-6574-a7bfcb7fc013@syntevo.com/ Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-07 12:37:31 -08:00
Junio C Hamano	da81d473fc	Merge branch 'en/keep-cwd' Many git commands that deal with working tree files try to remove a directory that becomes empty (i.e. "git switch" from a branch that has the directory to another branch that does not would attempt remove all files in the directory and the directory itself). This drops users into an unfamiliar situation if the command was run in a subdirectory that becomes subject to removal due to the command. The commands have been taught to keep an empty directory if it is the directory they were started in to avoid surprising users. * en/keep-cwd: t2501: simplify the tests since we can now assume desired behavior dir: new flag to remove_dir_recurse() to spare the original_cwd dir: avoid incidentally removing the original_cwd in remove_path() stash: do not attempt to remove startup_info->original_cwd rebase: do not attempt to remove startup_info->original_cwd clean: do not attempt to remove startup_info->original_cwd symlinks: do not include startup_info->original_cwd in dir removal unpack-trees: add special cwd handling unpack-trees: refuse to remove startup_info->original_cwd setup: introduce startup_info->original_cwd t2501: add various tests for removing the current working directory	2022-01-05 14:01:28 -08:00
Ævar Arnfjörð Bjarmason	245b948815	cat-file: use GET_OID_ONLY_TO_DIE in --(textconv\|filters) Change the cat_one_file() logic that calls get_oid_with_context() under --textconv and --filters to use the GET_OID_ONLY_TO_DIE flag, thus improving the error messaging emitted when e.g. <path> is missing but <rev> is not. To service the "cat-file" use-case we need to introduce a new "GET_OID_REQUIRE_PATH" flag, otherwise it would exit early as soon as a valid "HEAD" was resolved, but in the "cat-file" case being changed we always need a valid revision and path. This arguably makes the "<bad rev>:<bad path>" and "<bad rev>:<good (in HEAD) path>" use cases worse, as we won't quote the <path> component at the user anymore, but let's just use the existing logic "git log" et al use for now. We can improve the messaging for those cases as a follow-up for all callers. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-30 13:05:29 -08:00
Junio C Hamano	63a2e8b41e	Merge branch 'ew/test-wo-fsync' Allow running our tests while disabling fsync. * ew/test-wo-fsync: tests: disable fsync everywhere	2021-12-15 09:39:52 -08:00

1 2 3 4 5 ...

2094 Коммитов