зеркало из https://github.com/microsoft/git.git
Hydrate missing loose objects in check_and_freshen()
Hydrate missing loose objects in check_and_freshen() when running virtualized. Add test cases to verify read-object hook works when running virtualized. This hook is called in check_and_freshen() rather than check_and_freshen_local() to make the hook work also with alternates. Helped-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
This commit is contained in:
Родитель
4d6aa2f1b3
Коммит
df0309baaa
|
@ -0,0 +1,102 @@
|
|||
Read Object Process
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The read-object process enables Git to read all missing blobs with a
|
||||
single process invocation for the entire life of a single Git command.
|
||||
This is achieved by using a packet format (pkt-line, see technical/
|
||||
protocol-common.txt) based protocol over standard input and standard
|
||||
output as follows. All packets, except for the "*CONTENT" packets and
|
||||
the "0000" flush packet, are considered text and therefore are
|
||||
terminated by a LF.
|
||||
|
||||
Git starts the process when it encounters the first missing object that
|
||||
needs to be retrieved. After the process is started, Git sends a welcome
|
||||
message ("git-read-object-client"), a list of supported protocol version
|
||||
numbers, and a flush packet. Git expects to read a welcome response
|
||||
message ("git-read-object-server"), exactly one protocol version number
|
||||
from the previously sent list, and a flush packet. All further
|
||||
communication will be based on the selected version.
|
||||
|
||||
The remaining protocol description below documents "version=1". Please
|
||||
note that "version=42" in the example below does not exist and is only
|
||||
there to illustrate how the protocol would look with more than one
|
||||
version.
|
||||
|
||||
After the version negotiation Git sends a list of all capabilities that
|
||||
it supports and a flush packet. Git expects to read a list of desired
|
||||
capabilities, which must be a subset of the supported capabilities list,
|
||||
and a flush packet as response:
|
||||
------------------------
|
||||
packet: git> git-read-object-client
|
||||
packet: git> version=1
|
||||
packet: git> version=42
|
||||
packet: git> 0000
|
||||
packet: git< git-read-object-server
|
||||
packet: git< version=1
|
||||
packet: git< 0000
|
||||
packet: git> capability=get
|
||||
packet: git> capability=have
|
||||
packet: git> capability=put
|
||||
packet: git> capability=not-yet-invented
|
||||
packet: git> 0000
|
||||
packet: git< capability=get
|
||||
packet: git< 0000
|
||||
------------------------
|
||||
The only supported capability in version 1 is "get".
|
||||
|
||||
Afterwards Git sends a list of "key=value" pairs terminated with a flush
|
||||
packet. The list will contain at least the command (based on the
|
||||
supported capabilities) and the sha1 of the object to retrieve. Please
|
||||
note, that the process must not send any response before it received the
|
||||
final flush packet.
|
||||
|
||||
When the process receives the "get" command, it should make the requested
|
||||
object available in the git object store and then return success. Git will
|
||||
then check the object store again and this time find it and proceed.
|
||||
------------------------
|
||||
packet: git> command=get
|
||||
packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
|
||||
packet: git> 0000
|
||||
------------------------
|
||||
|
||||
The process is expected to respond with a list of "key=value" pairs
|
||||
terminated with a flush packet. If the process does not experience
|
||||
problems then the list must contain a "success" status.
|
||||
------------------------
|
||||
packet: git< status=success
|
||||
packet: git< 0000
|
||||
------------------------
|
||||
|
||||
In case the process cannot or does not want to process the content, it
|
||||
is expected to respond with an "error" status.
|
||||
------------------------
|
||||
packet: git< status=error
|
||||
packet: git< 0000
|
||||
------------------------
|
||||
|
||||
In case the process cannot or does not want to process the content as
|
||||
well as any future content for the lifetime of the Git process, then it
|
||||
is expected to respond with an "abort" status at any point in the
|
||||
protocol.
|
||||
------------------------
|
||||
packet: git< status=abort
|
||||
packet: git< 0000
|
||||
------------------------
|
||||
|
||||
Git neither stops nor restarts the process in case the "error"/"abort"
|
||||
status is set.
|
||||
|
||||
If the process dies during the communication or does not adhere to the
|
||||
protocol then Git will stop the process and restart it with the next
|
||||
object that needs to be processed.
|
||||
|
||||
After the read-object process has processed an object it is expected to
|
||||
wait for the next "key=value" list containing a command. Git will close
|
||||
the command pipe on exit. The process is expected to detect EOF and exit
|
||||
gracefully on its own. Git will wait until the process has stopped.
|
||||
|
||||
A long running read-object process demo implementation can be found in
|
||||
`contrib/long-running-read-object/example.pl` located in the Git core
|
||||
repository. If you develop your own long running process then the
|
||||
`GIT_TRACE_PACKET` environment variables can be very helpful for
|
||||
debugging (see linkgit:git[1]).
|
|
@ -0,0 +1,114 @@
|
|||
#!/usr/bin/perl
|
||||
#
|
||||
# Example implementation for the Git read-object protocol version 1
|
||||
# See Documentation/technical/read-object-protocol.txt
|
||||
#
|
||||
# Allows you to test the ability for blobs to be pulled from a host git repo
|
||||
# "on demand." Called when git needs a blob it couldn't find locally due to
|
||||
# a lazy clone that only cloned the commits and trees.
|
||||
#
|
||||
# A lazy clone can be simulated via the following commands from the host repo
|
||||
# you wish to create a lazy clone of:
|
||||
#
|
||||
# cd /host_repo
|
||||
# git rev-parse HEAD
|
||||
# git init /guest_repo
|
||||
# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
|
||||
# cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
|
||||
# cd /guest_repo
|
||||
# git config core.virtualizeobjects true
|
||||
# git reset --hard <sha from rev-parse call above>
|
||||
#
|
||||
# Please note, this sample is a minimal skeleton. No proper error handling
|
||||
# was implemented.
|
||||
#
|
||||
|
||||
use strict;
|
||||
use warnings;
|
||||
|
||||
#
|
||||
# Point $DIR to the folder where your host git repo is located so we can pull
|
||||
# missing objects from it
|
||||
#
|
||||
my $DIR = "/host_repo/.git/";
|
||||
|
||||
sub packet_bin_read {
|
||||
my $buffer;
|
||||
my $bytes_read = read STDIN, $buffer, 4;
|
||||
if ( $bytes_read == 0 ) {
|
||||
|
||||
# EOF - Git stopped talking to us!
|
||||
exit();
|
||||
}
|
||||
elsif ( $bytes_read != 4 ) {
|
||||
die "invalid packet: '$buffer'";
|
||||
}
|
||||
my $pkt_size = hex($buffer);
|
||||
if ( $pkt_size == 0 ) {
|
||||
return ( 1, "" );
|
||||
}
|
||||
elsif ( $pkt_size > 4 ) {
|
||||
my $content_size = $pkt_size - 4;
|
||||
$bytes_read = read STDIN, $buffer, $content_size;
|
||||
if ( $bytes_read != $content_size ) {
|
||||
die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
|
||||
}
|
||||
return ( 0, $buffer );
|
||||
}
|
||||
else {
|
||||
die "invalid packet size: $pkt_size";
|
||||
}
|
||||
}
|
||||
|
||||
sub packet_txt_read {
|
||||
my ( $res, $buf ) = packet_bin_read();
|
||||
unless ( $buf =~ s/\n$// ) {
|
||||
die "A non-binary line MUST be terminated by an LF.";
|
||||
}
|
||||
return ( $res, $buf );
|
||||
}
|
||||
|
||||
sub packet_bin_write {
|
||||
my $buf = shift;
|
||||
print STDOUT sprintf( "%04x", length($buf) + 4 );
|
||||
print STDOUT $buf;
|
||||
STDOUT->flush();
|
||||
}
|
||||
|
||||
sub packet_txt_write {
|
||||
packet_bin_write( $_[0] . "\n" );
|
||||
}
|
||||
|
||||
sub packet_flush {
|
||||
print STDOUT sprintf( "%04x", 0 );
|
||||
STDOUT->flush();
|
||||
}
|
||||
|
||||
( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
|
||||
( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
|
||||
( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
|
||||
|
||||
packet_txt_write("git-read-object-server");
|
||||
packet_txt_write("version=1");
|
||||
packet_flush();
|
||||
|
||||
( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
|
||||
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
|
||||
|
||||
packet_txt_write("capability=get");
|
||||
packet_flush();
|
||||
|
||||
while (1) {
|
||||
my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
|
||||
|
||||
if ( $command eq "get" ) {
|
||||
my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
|
||||
packet_bin_read();
|
||||
|
||||
system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
|
||||
packet_txt_write(($?) ? "status=error" : "status=success");
|
||||
packet_flush();
|
||||
} else {
|
||||
die "bad command '$command'";
|
||||
}
|
||||
}
|
141
object-file.c
141
object-file.c
|
@ -46,6 +46,9 @@
|
|||
#include "wrapper.h"
|
||||
#include "trace.h"
|
||||
#include "hook.h"
|
||||
#include "sigchain.h"
|
||||
#include "sub-process.h"
|
||||
#include "pkt-line.h"
|
||||
|
||||
/* The maximum size for an object header. */
|
||||
#define MAX_HEADER_LEN 32
|
||||
|
@ -962,6 +965,115 @@ int has_alt_odb(struct repository *r)
|
|||
return !!r->objects->odb->next;
|
||||
}
|
||||
|
||||
#define CAP_GET (1u<<0)
|
||||
|
||||
static int subprocess_map_initialized;
|
||||
static struct hashmap subprocess_map;
|
||||
|
||||
struct read_object_process {
|
||||
struct subprocess_entry subprocess;
|
||||
unsigned int supported_capabilities;
|
||||
};
|
||||
|
||||
static int start_read_object_fn(struct subprocess_entry *subprocess)
|
||||
{
|
||||
struct read_object_process *entry = (struct read_object_process *)subprocess;
|
||||
static int versions[] = {1, 0};
|
||||
static struct subprocess_capability capabilities[] = {
|
||||
{ "get", CAP_GET },
|
||||
{ NULL, 0 }
|
||||
};
|
||||
|
||||
return subprocess_handshake(subprocess, "git-read-object", versions,
|
||||
NULL, capabilities,
|
||||
&entry->supported_capabilities);
|
||||
}
|
||||
|
||||
static int read_object_process(const struct object_id *oid)
|
||||
{
|
||||
int err;
|
||||
struct read_object_process *entry;
|
||||
struct child_process *process;
|
||||
struct strbuf status = STRBUF_INIT;
|
||||
const char *cmd = find_hook("read-object");
|
||||
uint64_t start;
|
||||
|
||||
start = getnanotime();
|
||||
|
||||
if (!subprocess_map_initialized) {
|
||||
subprocess_map_initialized = 1;
|
||||
hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp,
|
||||
NULL, 0);
|
||||
entry = NULL;
|
||||
} else {
|
||||
entry = (struct read_object_process *) subprocess_find_entry(&subprocess_map, cmd);
|
||||
}
|
||||
|
||||
if (!entry) {
|
||||
entry = xmalloc(sizeof(*entry));
|
||||
entry->supported_capabilities = 0;
|
||||
|
||||
if (subprocess_start(&subprocess_map, &entry->subprocess, cmd,
|
||||
start_read_object_fn)) {
|
||||
free(entry);
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
process = &entry->subprocess.process;
|
||||
|
||||
if (!(CAP_GET & entry->supported_capabilities))
|
||||
return -1;
|
||||
|
||||
sigchain_push(SIGPIPE, SIG_IGN);
|
||||
|
||||
err = packet_write_fmt_gently(process->in, "command=get\n");
|
||||
if (err)
|
||||
goto done;
|
||||
|
||||
err = packet_write_fmt_gently(process->in, "sha1=%s\n", oid_to_hex(oid));
|
||||
if (err)
|
||||
goto done;
|
||||
|
||||
err = packet_flush_gently(process->in);
|
||||
if (err)
|
||||
goto done;
|
||||
|
||||
err = subprocess_read_status(process->out, &status);
|
||||
err = err ? err : strcmp(status.buf, "success");
|
||||
|
||||
done:
|
||||
sigchain_pop(SIGPIPE);
|
||||
|
||||
if (err || errno == EPIPE) {
|
||||
err = err ? err : errno;
|
||||
if (!strcmp(status.buf, "error")) {
|
||||
/* The process signaled a problem with the file. */
|
||||
}
|
||||
else if (!strcmp(status.buf, "abort")) {
|
||||
/*
|
||||
* The process signaled a permanent problem. Don't try to read
|
||||
* objects with the same command for the lifetime of the current
|
||||
* Git process.
|
||||
*/
|
||||
entry->supported_capabilities &= ~CAP_GET;
|
||||
}
|
||||
else {
|
||||
/*
|
||||
* Something went wrong with the read-object process.
|
||||
* Force shutdown and restart if needed.
|
||||
*/
|
||||
error("external process '%s' failed", cmd);
|
||||
subprocess_stop(&subprocess_map,
|
||||
(struct subprocess_entry *)entry);
|
||||
free(entry);
|
||||
}
|
||||
}
|
||||
|
||||
trace_performance_since(start, "read_object_process");
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
/* Returns 1 if we have successfully freshened the file, 0 otherwise. */
|
||||
static int freshen_file(const char *fn)
|
||||
{
|
||||
|
@ -1012,8 +1124,19 @@ static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
|
|||
|
||||
static int check_and_freshen(const struct object_id *oid, int freshen)
|
||||
{
|
||||
return check_and_freshen_local(oid, freshen) ||
|
||||
int ret;
|
||||
int tried_hook = 0;
|
||||
|
||||
retry:
|
||||
ret = check_and_freshen_local(oid, freshen) ||
|
||||
check_and_freshen_nonlocal(oid, freshen);
|
||||
if (!ret && core_virtualize_objects && !tried_hook) {
|
||||
tried_hook = 1;
|
||||
if (!read_object_process(oid))
|
||||
goto retry;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int has_loose_object_nonlocal(const struct object_id *oid)
|
||||
|
@ -1555,20 +1678,6 @@ void disable_obj_read_lock(void)
|
|||
pthread_mutex_destroy(&obj_read_mutex);
|
||||
}
|
||||
|
||||
static int run_read_object_hook(const struct object_id *oid)
|
||||
{
|
||||
struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
|
||||
int ret;
|
||||
uint64_t start;
|
||||
|
||||
start = getnanotime();
|
||||
strvec_push(&opt.args, oid_to_hex(oid));
|
||||
ret = run_hooks_opt("read-object", &opt);
|
||||
trace_performance_since(start, "run_read_object_hook");
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int fetch_if_missing = 1;
|
||||
|
||||
static int do_oid_object_info_extended(struct repository *r,
|
||||
|
@ -1627,7 +1736,7 @@ retry:
|
|||
break;
|
||||
if (core_virtualize_objects && !tried_hook) {
|
||||
tried_hook = 1;
|
||||
if (!run_read_object_hook(oid))
|
||||
if (!read_object_process(oid))
|
||||
goto retry;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -0,0 +1,114 @@
|
|||
#!/usr/bin/perl
|
||||
#
|
||||
# Example implementation for the Git read-object protocol version 1
|
||||
# See Documentation/technical/read-object-protocol.txt
|
||||
#
|
||||
# Allows you to test the ability for blobs to be pulled from a host git repo
|
||||
# "on demand." Called when git needs a blob it couldn't find locally due to
|
||||
# a lazy clone that only cloned the commits and trees.
|
||||
#
|
||||
# A lazy clone can be simulated via the following commands from the host repo
|
||||
# you wish to create a lazy clone of:
|
||||
#
|
||||
# cd /host_repo
|
||||
# git rev-parse HEAD
|
||||
# git init /guest_repo
|
||||
# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
|
||||
# cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
|
||||
# cd /guest_repo
|
||||
# git config core.virtualizeobjects true
|
||||
# git reset --hard <sha from rev-parse call above>
|
||||
#
|
||||
# Please note, this sample is a minimal skeleton. No proper error handling
|
||||
# was implemented.
|
||||
#
|
||||
|
||||
use strict;
|
||||
use warnings;
|
||||
|
||||
#
|
||||
# Point $DIR to the folder where your host git repo is located so we can pull
|
||||
# missing objects from it
|
||||
#
|
||||
my $DIR = "../.git/";
|
||||
|
||||
sub packet_bin_read {
|
||||
my $buffer;
|
||||
my $bytes_read = read STDIN, $buffer, 4;
|
||||
if ( $bytes_read == 0 ) {
|
||||
|
||||
# EOF - Git stopped talking to us!
|
||||
exit();
|
||||
}
|
||||
elsif ( $bytes_read != 4 ) {
|
||||
die "invalid packet: '$buffer'";
|
||||
}
|
||||
my $pkt_size = hex($buffer);
|
||||
if ( $pkt_size == 0 ) {
|
||||
return ( 1, "" );
|
||||
}
|
||||
elsif ( $pkt_size > 4 ) {
|
||||
my $content_size = $pkt_size - 4;
|
||||
$bytes_read = read STDIN, $buffer, $content_size;
|
||||
if ( $bytes_read != $content_size ) {
|
||||
die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
|
||||
}
|
||||
return ( 0, $buffer );
|
||||
}
|
||||
else {
|
||||
die "invalid packet size: $pkt_size";
|
||||
}
|
||||
}
|
||||
|
||||
sub packet_txt_read {
|
||||
my ( $res, $buf ) = packet_bin_read();
|
||||
unless ( $buf =~ s/\n$// ) {
|
||||
die "A non-binary line MUST be terminated by an LF.";
|
||||
}
|
||||
return ( $res, $buf );
|
||||
}
|
||||
|
||||
sub packet_bin_write {
|
||||
my $buf = shift;
|
||||
print STDOUT sprintf( "%04x", length($buf) + 4 );
|
||||
print STDOUT $buf;
|
||||
STDOUT->flush();
|
||||
}
|
||||
|
||||
sub packet_txt_write {
|
||||
packet_bin_write( $_[0] . "\n" );
|
||||
}
|
||||
|
||||
sub packet_flush {
|
||||
print STDOUT sprintf( "%04x", 0 );
|
||||
STDOUT->flush();
|
||||
}
|
||||
|
||||
( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
|
||||
( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
|
||||
( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
|
||||
|
||||
packet_txt_write("git-read-object-server");
|
||||
packet_txt_write("version=1");
|
||||
packet_flush();
|
||||
|
||||
( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
|
||||
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
|
||||
|
||||
packet_txt_write("capability=get");
|
||||
packet_flush();
|
||||
|
||||
while (1) {
|
||||
my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
|
||||
|
||||
if ( $command eq "get" ) {
|
||||
my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40,64})$/;
|
||||
packet_bin_read();
|
||||
|
||||
system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
|
||||
packet_txt_write(($?) ? "status=error" : "status=success");
|
||||
packet_flush();
|
||||
} else {
|
||||
die "bad command '$command'";
|
||||
}
|
||||
}
|
|
@ -0,0 +1,30 @@
|
|||
#!/bin/sh
|
||||
|
||||
test_description='tests for long running read-object process'
|
||||
|
||||
. ./test-lib.sh
|
||||
|
||||
test_expect_success 'setup host repo with a root commit' '
|
||||
test_commit zero &&
|
||||
hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\ -f3)
|
||||
'
|
||||
|
||||
test_expect_success 'blobs can be retrieved from the host repo' '
|
||||
git init guest-repo &&
|
||||
(cd guest-repo &&
|
||||
mkdir -p .git/hooks &&
|
||||
sed "1s|/usr/bin/perl|$PERL_PATH|" \
|
||||
<$TEST_DIRECTORY/t0410/read-object \
|
||||
>.git/hooks/read-object &&
|
||||
chmod +x .git/hooks/read-object &&
|
||||
git config core.virtualizeobjects true &&
|
||||
git cat-file blob "$hash1")
|
||||
'
|
||||
|
||||
test_expect_success 'invalid blobs generate errors' '
|
||||
(cd guest-repo &&
|
||||
test_must_fail git cat-file blob "invalid")
|
||||
'
|
||||
|
||||
|
||||
test_done
|
Загрузка…
Ссылка в новой задаче