Hydrate missing loose objects in check_and_freshen()

Hydrate missing loose objects in check_and_freshen() when running
virtualized. Add test cases to verify read-object hook works when
running virtualized.

This hook is called in check_and_freshen() rather than
check_and_freshen_local() to make the hook work also with alternates.

Helped-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
This commit is contained in:
Ben Peart 2017-03-15 18:43:05 +00:00 коммит произвёл Victoria Dye
Родитель 4d6aa2f1b3
Коммит df0309baaa
5 изменённых файлов: 485 добавлений и 16 удалений

Просмотреть файл

@ -0,0 +1,102 @@
Read Object Process
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The read-object process enables Git to read all missing blobs with a
single process invocation for the entire life of a single Git command.
This is achieved by using a packet format (pkt-line, see technical/
protocol-common.txt) based protocol over standard input and standard
output as follows. All packets, except for the "*CONTENT" packets and
the "0000" flush packet, are considered text and therefore are
terminated by a LF.
Git starts the process when it encounters the first missing object that
needs to be retrieved. After the process is started, Git sends a welcome
message ("git-read-object-client"), a list of supported protocol version
numbers, and a flush packet. Git expects to read a welcome response
message ("git-read-object-server"), exactly one protocol version number
from the previously sent list, and a flush packet. All further
communication will be based on the selected version.
The remaining protocol description below documents "version=1". Please
note that "version=42" in the example below does not exist and is only
there to illustrate how the protocol would look with more than one
version.
After the version negotiation Git sends a list of all capabilities that
it supports and a flush packet. Git expects to read a list of desired
capabilities, which must be a subset of the supported capabilities list,
and a flush packet as response:
------------------------
packet: git> git-read-object-client
packet: git> version=1
packet: git> version=42
packet: git> 0000
packet: git< git-read-object-server
packet: git< version=1
packet: git< 0000
packet: git> capability=get
packet: git> capability=have
packet: git> capability=put
packet: git> capability=not-yet-invented
packet: git> 0000
packet: git< capability=get
packet: git< 0000
------------------------
The only supported capability in version 1 is "get".
Afterwards Git sends a list of "key=value" pairs terminated with a flush
packet. The list will contain at least the command (based on the
supported capabilities) and the sha1 of the object to retrieve. Please
note, that the process must not send any response before it received the
final flush packet.
When the process receives the "get" command, it should make the requested
object available in the git object store and then return success. Git will
then check the object store again and this time find it and proceed.
------------------------
packet: git> command=get
packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
packet: git> 0000
------------------------
The process is expected to respond with a list of "key=value" pairs
terminated with a flush packet. If the process does not experience
problems then the list must contain a "success" status.
------------------------
packet: git< status=success
packet: git< 0000
------------------------
In case the process cannot or does not want to process the content, it
is expected to respond with an "error" status.
------------------------
packet: git< status=error
packet: git< 0000
------------------------
In case the process cannot or does not want to process the content as
well as any future content for the lifetime of the Git process, then it
is expected to respond with an "abort" status at any point in the
protocol.
------------------------
packet: git< status=abort
packet: git< 0000
------------------------
Git neither stops nor restarts the process in case the "error"/"abort"
status is set.
If the process dies during the communication or does not adhere to the
protocol then Git will stop the process and restart it with the next
object that needs to be processed.
After the read-object process has processed an object it is expected to
wait for the next "key=value" list containing a command. Git will close
the command pipe on exit. The process is expected to detect EOF and exit
gracefully on its own. Git will wait until the process has stopped.
A long running read-object process demo implementation can be found in
`contrib/long-running-read-object/example.pl` located in the Git core
repository. If you develop your own long running process then the
`GIT_TRACE_PACKET` environment variables can be very helpful for
debugging (see linkgit:git[1]).

Просмотреть файл

@ -0,0 +1,114 @@
#!/usr/bin/perl
#
# Example implementation for the Git read-object protocol version 1
# See Documentation/technical/read-object-protocol.txt
#
# Allows you to test the ability for blobs to be pulled from a host git repo
# "on demand." Called when git needs a blob it couldn't find locally due to
# a lazy clone that only cloned the commits and trees.
#
# A lazy clone can be simulated via the following commands from the host repo
# you wish to create a lazy clone of:
#
# cd /host_repo
# git rev-parse HEAD
# git init /guest_repo
# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
# cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
# cd /guest_repo
# git config core.virtualizeobjects true
# git reset --hard <sha from rev-parse call above>
#
# Please note, this sample is a minimal skeleton. No proper error handling
# was implemented.
#
use strict;
use warnings;
#
# Point $DIR to the folder where your host git repo is located so we can pull
# missing objects from it
#
my $DIR = "/host_repo/.git/";
sub packet_bin_read {
my $buffer;
my $bytes_read = read STDIN, $buffer, 4;
if ( $bytes_read == 0 ) {
# EOF - Git stopped talking to us!
exit();
}
elsif ( $bytes_read != 4 ) {
die "invalid packet: '$buffer'";
}
my $pkt_size = hex($buffer);
if ( $pkt_size == 0 ) {
return ( 1, "" );
}
elsif ( $pkt_size > 4 ) {
my $content_size = $pkt_size - 4;
$bytes_read = read STDIN, $buffer, $content_size;
if ( $bytes_read != $content_size ) {
die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
}
return ( 0, $buffer );
}
else {
die "invalid packet size: $pkt_size";
}
}
sub packet_txt_read {
my ( $res, $buf ) = packet_bin_read();
unless ( $buf =~ s/\n$// ) {
die "A non-binary line MUST be terminated by an LF.";
}
return ( $res, $buf );
}
sub packet_bin_write {
my $buf = shift;
print STDOUT sprintf( "%04x", length($buf) + 4 );
print STDOUT $buf;
STDOUT->flush();
}
sub packet_txt_write {
packet_bin_write( $_[0] . "\n" );
}
sub packet_flush {
print STDOUT sprintf( "%04x", 0 );
STDOUT->flush();
}
( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
packet_txt_write("git-read-object-server");
packet_txt_write("version=1");
packet_flush();
( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
packet_txt_write("capability=get");
packet_flush();
while (1) {
my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
if ( $command eq "get" ) {
my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
packet_bin_read();
system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
packet_txt_write(($?) ? "status=error" : "status=success");
packet_flush();
} else {
die "bad command '$command'";
}
}

Просмотреть файл

@ -46,6 +46,9 @@
#include "wrapper.h"
#include "trace.h"
#include "hook.h"
#include "sigchain.h"
#include "sub-process.h"
#include "pkt-line.h"
/* The maximum size for an object header. */
#define MAX_HEADER_LEN 32
@ -962,6 +965,115 @@ int has_alt_odb(struct repository *r)
return !!r->objects->odb->next;
}
#define CAP_GET (1u<<0)
static int subprocess_map_initialized;
static struct hashmap subprocess_map;
struct read_object_process {
struct subprocess_entry subprocess;
unsigned int supported_capabilities;
};
static int start_read_object_fn(struct subprocess_entry *subprocess)
{
struct read_object_process *entry = (struct read_object_process *)subprocess;
static int versions[] = {1, 0};
static struct subprocess_capability capabilities[] = {
{ "get", CAP_GET },
{ NULL, 0 }
};
return subprocess_handshake(subprocess, "git-read-object", versions,
NULL, capabilities,
&entry->supported_capabilities);
}
static int read_object_process(const struct object_id *oid)
{
int err;
struct read_object_process *entry;
struct child_process *process;
struct strbuf status = STRBUF_INIT;
const char *cmd = find_hook("read-object");
uint64_t start;
start = getnanotime();
if (!subprocess_map_initialized) {
subprocess_map_initialized = 1;
hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp,
NULL, 0);
entry = NULL;
} else {
entry = (struct read_object_process *) subprocess_find_entry(&subprocess_map, cmd);
}
if (!entry) {
entry = xmalloc(sizeof(*entry));
entry->supported_capabilities = 0;
if (subprocess_start(&subprocess_map, &entry->subprocess, cmd,
start_read_object_fn)) {
free(entry);
return -1;
}
}
process = &entry->subprocess.process;
if (!(CAP_GET & entry->supported_capabilities))
return -1;
sigchain_push(SIGPIPE, SIG_IGN);
err = packet_write_fmt_gently(process->in, "command=get\n");
if (err)
goto done;
err = packet_write_fmt_gently(process->in, "sha1=%s\n", oid_to_hex(oid));
if (err)
goto done;
err = packet_flush_gently(process->in);
if (err)
goto done;
err = subprocess_read_status(process->out, &status);
err = err ? err : strcmp(status.buf, "success");
done:
sigchain_pop(SIGPIPE);
if (err || errno == EPIPE) {
err = err ? err : errno;
if (!strcmp(status.buf, "error")) {
/* The process signaled a problem with the file. */
}
else if (!strcmp(status.buf, "abort")) {
/*
* The process signaled a permanent problem. Don't try to read
* objects with the same command for the lifetime of the current
* Git process.
*/
entry->supported_capabilities &= ~CAP_GET;
}
else {
/*
* Something went wrong with the read-object process.
* Force shutdown and restart if needed.
*/
error("external process '%s' failed", cmd);
subprocess_stop(&subprocess_map,
(struct subprocess_entry *)entry);
free(entry);
}
}
trace_performance_since(start, "read_object_process");
return err;
}
/* Returns 1 if we have successfully freshened the file, 0 otherwise. */
static int freshen_file(const char *fn)
{
@ -1012,8 +1124,19 @@ static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
static int check_and_freshen(const struct object_id *oid, int freshen)
{
return check_and_freshen_local(oid, freshen) ||
int ret;
int tried_hook = 0;
retry:
ret = check_and_freshen_local(oid, freshen) ||
check_and_freshen_nonlocal(oid, freshen);
if (!ret && core_virtualize_objects && !tried_hook) {
tried_hook = 1;
if (!read_object_process(oid))
goto retry;
}
return ret;
}
int has_loose_object_nonlocal(const struct object_id *oid)
@ -1555,20 +1678,6 @@ void disable_obj_read_lock(void)
pthread_mutex_destroy(&obj_read_mutex);
}
static int run_read_object_hook(const struct object_id *oid)
{
struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
int ret;
uint64_t start;
start = getnanotime();
strvec_push(&opt.args, oid_to_hex(oid));
ret = run_hooks_opt("read-object", &opt);
trace_performance_since(start, "run_read_object_hook");
return ret;
}
int fetch_if_missing = 1;
static int do_oid_object_info_extended(struct repository *r,
@ -1627,7 +1736,7 @@ retry:
break;
if (core_virtualize_objects && !tried_hook) {
tried_hook = 1;
if (!run_read_object_hook(oid))
if (!read_object_process(oid))
goto retry;
}
}

114
t/t0410/read-object Executable file
Просмотреть файл

@ -0,0 +1,114 @@
#!/usr/bin/perl
#
# Example implementation for the Git read-object protocol version 1
# See Documentation/technical/read-object-protocol.txt
#
# Allows you to test the ability for blobs to be pulled from a host git repo
# "on demand." Called when git needs a blob it couldn't find locally due to
# a lazy clone that only cloned the commits and trees.
#
# A lazy clone can be simulated via the following commands from the host repo
# you wish to create a lazy clone of:
#
# cd /host_repo
# git rev-parse HEAD
# git init /guest_repo
# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
# cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
# cd /guest_repo
# git config core.virtualizeobjects true
# git reset --hard <sha from rev-parse call above>
#
# Please note, this sample is a minimal skeleton. No proper error handling
# was implemented.
#
use strict;
use warnings;
#
# Point $DIR to the folder where your host git repo is located so we can pull
# missing objects from it
#
my $DIR = "../.git/";
sub packet_bin_read {
my $buffer;
my $bytes_read = read STDIN, $buffer, 4;
if ( $bytes_read == 0 ) {
# EOF - Git stopped talking to us!
exit();
}
elsif ( $bytes_read != 4 ) {
die "invalid packet: '$buffer'";
}
my $pkt_size = hex($buffer);
if ( $pkt_size == 0 ) {
return ( 1, "" );
}
elsif ( $pkt_size > 4 ) {
my $content_size = $pkt_size - 4;
$bytes_read = read STDIN, $buffer, $content_size;
if ( $bytes_read != $content_size ) {
die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
}
return ( 0, $buffer );
}
else {
die "invalid packet size: $pkt_size";
}
}
sub packet_txt_read {
my ( $res, $buf ) = packet_bin_read();
unless ( $buf =~ s/\n$// ) {
die "A non-binary line MUST be terminated by an LF.";
}
return ( $res, $buf );
}
sub packet_bin_write {
my $buf = shift;
print STDOUT sprintf( "%04x", length($buf) + 4 );
print STDOUT $buf;
STDOUT->flush();
}
sub packet_txt_write {
packet_bin_write( $_[0] . "\n" );
}
sub packet_flush {
print STDOUT sprintf( "%04x", 0 );
STDOUT->flush();
}
( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
packet_txt_write("git-read-object-server");
packet_txt_write("version=1");
packet_flush();
( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
packet_txt_write("capability=get");
packet_flush();
while (1) {
my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
if ( $command eq "get" ) {
my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40,64})$/;
packet_bin_read();
system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
packet_txt_write(($?) ? "status=error" : "status=success");
packet_flush();
} else {
die "bad command '$command'";
}
}

30
t/t0411-read-object.sh Executable file
Просмотреть файл

@ -0,0 +1,30 @@
#!/bin/sh
test_description='tests for long running read-object process'
. ./test-lib.sh
test_expect_success 'setup host repo with a root commit' '
test_commit zero &&
hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\ -f3)
'
test_expect_success 'blobs can be retrieved from the host repo' '
git init guest-repo &&
(cd guest-repo &&
mkdir -p .git/hooks &&
sed "1s|/usr/bin/perl|$PERL_PATH|" \
<$TEST_DIRECTORY/t0410/read-object \
>.git/hooks/read-object &&
chmod +x .git/hooks/read-object &&
git config core.virtualizeobjects true &&
git cat-file blob "$hash1")
'
test_expect_success 'invalid blobs generate errors' '
(cd guest-repo &&
test_must_fail git cat-file blob "invalid")
'
test_done