2015-08-10 12:47:41 +03:00
|
|
|
#ifndef TEMPFILE_H
|
|
|
|
#define TEMPFILE_H
|
|
|
|
|
tempfile: use list.h for linked list
The tempfile API keeps to-be-cleaned tempfiles in a
singly-linked list and never removes items from the list. A
future patch would like to start removing items, but removal
from a singly linked list is O(n), as we have to walk the
list to find the predecessor element. This means that a
process which takes "n" simultaneous lockfiles (for example,
an atomic transaction on "n" refs) may end up quadratic in
"n".
Before we start allowing items to be removed, it would be
nice to have a way to cover this case in linear time.
The simplest solution is to make an assumption about the
order in which tempfiles are added and removed from the
list. If both operations iterate over the tempfiles in the
same order, then by putting new items at the end of the list
our removal search will always find its items at the
beginning of the list. And indeed, that would work for the
case of refs. But it creates a hidden dependency between
unrelated parts of the code. If anybody changes the ref code
(or if we add a new caller that opens multiple simultaneous
tempfiles) they may unknowingly introduce a performance
regression.
Another solution is to use a better data structure. A
doubly-linked list works fine, and we already have an
implementation in list.h. But there's one snag: the elements
of "struct tempfile" are all marked as "volatile", since a
signal handler may interrupt us and iterate over the list at
any moment (even if we were in the middle of adding a new
entry).
We can declare a "volatile struct list_head", but we can't
actually use it with the normal list functions. The compiler
complains about passing a pointer-to-volatile via a regular
pointer argument. And rightfully so, as the sub-function
would potentially need different code to deal with the
volatile case.
That leaves us with a few options:
1. Drop the "volatile" modifier for the list items.
This is probably a bad idea. I checked the assembly
output from "gcc -O2", and the "volatile" really does
impact the order in which it updates memory.
2. Use macros instead of inline functions. The irony here
is that list.h is entirely implemented as trivial
inline functions. So we basically are already
generating custom code for each call. But sadly there's no
way in C to declare the inline function to take a more
generic type.
We could do so by switching the inline functions to
macros, but it does make the end result harder to read.
And it doesn't fully solve the problem (for instance,
the declaration of list_head needs to change so that
its "prev" and "next" pointers point to other volatile
structs).
3. Don't use list.h, and just make our own ad-hoc
doubly-linked list. It's not that much code to
implement the basics that we need here. But if we're
going to do so, why not add the few extra lines
required to model it after the actual list.h interface?
We can even reuse a few of the macro helpers.
So this patch takes option 3, but actually implements a
parallel "volatile list" interface in list.h, where it could
potentially be reused by other code. This implements just
enough for tempfile.c's use, though we could easily port
other functions later if need be.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:00 +03:00
|
|
|
#include "list.h"
|
2018-08-15 20:54:05 +03:00
|
|
|
#include "strbuf.h"
|
tempfile: use list.h for linked list
The tempfile API keeps to-be-cleaned tempfiles in a
singly-linked list and never removes items from the list. A
future patch would like to start removing items, but removal
from a singly linked list is O(n), as we have to walk the
list to find the predecessor element. This means that a
process which takes "n" simultaneous lockfiles (for example,
an atomic transaction on "n" refs) may end up quadratic in
"n".
Before we start allowing items to be removed, it would be
nice to have a way to cover this case in linear time.
The simplest solution is to make an assumption about the
order in which tempfiles are added and removed from the
list. If both operations iterate over the tempfiles in the
same order, then by putting new items at the end of the list
our removal search will always find its items at the
beginning of the list. And indeed, that would work for the
case of refs. But it creates a hidden dependency between
unrelated parts of the code. If anybody changes the ref code
(or if we add a new caller that opens multiple simultaneous
tempfiles) they may unknowingly introduce a performance
regression.
Another solution is to use a better data structure. A
doubly-linked list works fine, and we already have an
implementation in list.h. But there's one snag: the elements
of "struct tempfile" are all marked as "volatile", since a
signal handler may interrupt us and iterate over the list at
any moment (even if we were in the middle of adding a new
entry).
We can declare a "volatile struct list_head", but we can't
actually use it with the normal list functions. The compiler
complains about passing a pointer-to-volatile via a regular
pointer argument. And rightfully so, as the sub-function
would potentially need different code to deal with the
volatile case.
That leaves us with a few options:
1. Drop the "volatile" modifier for the list items.
This is probably a bad idea. I checked the assembly
output from "gcc -O2", and the "volatile" really does
impact the order in which it updates memory.
2. Use macros instead of inline functions. The irony here
is that list.h is entirely implemented as trivial
inline functions. So we basically are already
generating custom code for each call. But sadly there's no
way in C to declare the inline function to take a more
generic type.
We could do so by switching the inline functions to
macros, but it does make the end result harder to read.
And it doesn't fully solve the problem (for instance,
the declaration of list_head needs to change so that
its "prev" and "next" pointers point to other volatile
structs).
3. Don't use list.h, and just make our own ad-hoc
doubly-linked list. It's not that much code to
implement the basics that we need here. But if we're
going to do so, why not add the few extra lines
required to model it after the actual list.h interface?
We can even reuse a few of the macro helpers.
So this patch takes option 3, but actually implements a
parallel "volatile list" interface in list.h, where it could
potentially be reused by other code. This implements just
enough for tempfile.c's use, though we could easily port
other functions later if need be.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:00 +03:00
|
|
|
|
2015-08-10 12:47:41 +03:00
|
|
|
/*
|
|
|
|
* Handle temporary files.
|
|
|
|
*
|
|
|
|
* The tempfile API allows temporary files to be created, deleted, and
|
|
|
|
* atomically renamed. Temporary files that are still active when the
|
|
|
|
* program ends are cleaned up automatically. Lockfiles (see
|
|
|
|
* "lockfile.h") are built on top of this API.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Calling sequence
|
|
|
|
* ----------------
|
|
|
|
*
|
|
|
|
* The caller:
|
|
|
|
*
|
|
|
|
* * Attempts to create a temporary file by calling
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* `create_tempfile()`. The resources used for the temporary file are
|
|
|
|
* managed by the tempfile API.
|
2015-08-10 12:47:41 +03:00
|
|
|
*
|
|
|
|
* * Writes new content to the file by either:
|
|
|
|
*
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* * writing to the `tempfile->fd` file descriptor
|
2015-08-10 12:47:41 +03:00
|
|
|
*
|
|
|
|
* * calling `fdopen_tempfile()` to get a `FILE` pointer for the
|
|
|
|
* open file and writing to the file using stdio.
|
|
|
|
*
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* Note that the file descriptor created by create_tempfile()
|
2016-08-22 15:47:55 +03:00
|
|
|
* is marked O_CLOEXEC, so the new contents must be written by
|
|
|
|
* the current process, not any spawned one.
|
|
|
|
*
|
2015-08-10 12:47:41 +03:00
|
|
|
* When finished writing, the caller can:
|
|
|
|
*
|
|
|
|
* * Close the file descriptor and remove the temporary file by
|
|
|
|
* calling `delete_tempfile()`.
|
|
|
|
*
|
|
|
|
* * Close the temporary file and rename it atomically to a specified
|
|
|
|
* filename by calling `rename_tempfile()`. This relinquishes
|
|
|
|
* control of the file.
|
|
|
|
*
|
|
|
|
* * Close the file descriptor without removing or renaming the
|
tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:
if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);
wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).
Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:
if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
}
already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.
Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:14:30 +03:00
|
|
|
* temporary file by calling `close_tempfile_gently()`, and later call
|
2015-08-10 12:47:41 +03:00
|
|
|
* `delete_tempfile()` or `rename_tempfile()`.
|
|
|
|
*
|
2017-09-05 15:15:04 +03:00
|
|
|
* After the temporary file is renamed or deleted, the `tempfile`
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* object is no longer valid and should not be reused.
|
2015-08-10 12:47:41 +03:00
|
|
|
*
|
|
|
|
* If the program exits before `rename_tempfile()` or
|
|
|
|
* `delete_tempfile()` is called, an `atexit(3)` handler will close
|
|
|
|
* and remove the temporary file.
|
|
|
|
*
|
|
|
|
* If you need to close the file descriptor yourself, do so by calling
|
tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:
if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);
wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).
Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:
if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
}
already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.
Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:14:30 +03:00
|
|
|
* `close_tempfile_gently()`. You should never call `close(2)` or `fclose(3)`
|
2015-08-10 12:47:41 +03:00
|
|
|
* yourself, otherwise the `struct tempfile` structure would still
|
|
|
|
* think that the file descriptor needs to be closed, and a later
|
|
|
|
* cleanup would result in duplicate calls to `close(2)`. Worse yet,
|
|
|
|
* if you close and then later open another file descriptor for a
|
|
|
|
* completely different purpose, then the unrelated file descriptor
|
|
|
|
* might get closed.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Error handling
|
|
|
|
* --------------
|
|
|
|
*
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* `create_tempfile()` returns an allocated tempfile on success or NULL
|
|
|
|
* on failure. On errors, `errno` describes the reason for failure.
|
2015-08-10 12:47:41 +03:00
|
|
|
*
|
2017-10-05 23:32:06 +03:00
|
|
|
* `rename_tempfile()` and `close_tempfile_gently()` return 0 on success.
|
|
|
|
* On failure they set `errno` appropriately and return -1.
|
|
|
|
* `delete_tempfile()` and `rename` (but not `close`) do their best to
|
|
|
|
* delete the temporary file before returning.
|
2015-08-10 12:47:41 +03:00
|
|
|
*/
|
|
|
|
|
|
|
|
struct tempfile {
|
tempfile: use list.h for linked list
The tempfile API keeps to-be-cleaned tempfiles in a
singly-linked list and never removes items from the list. A
future patch would like to start removing items, but removal
from a singly linked list is O(n), as we have to walk the
list to find the predecessor element. This means that a
process which takes "n" simultaneous lockfiles (for example,
an atomic transaction on "n" refs) may end up quadratic in
"n".
Before we start allowing items to be removed, it would be
nice to have a way to cover this case in linear time.
The simplest solution is to make an assumption about the
order in which tempfiles are added and removed from the
list. If both operations iterate over the tempfiles in the
same order, then by putting new items at the end of the list
our removal search will always find its items at the
beginning of the list. And indeed, that would work for the
case of refs. But it creates a hidden dependency between
unrelated parts of the code. If anybody changes the ref code
(or if we add a new caller that opens multiple simultaneous
tempfiles) they may unknowingly introduce a performance
regression.
Another solution is to use a better data structure. A
doubly-linked list works fine, and we already have an
implementation in list.h. But there's one snag: the elements
of "struct tempfile" are all marked as "volatile", since a
signal handler may interrupt us and iterate over the list at
any moment (even if we were in the middle of adding a new
entry).
We can declare a "volatile struct list_head", but we can't
actually use it with the normal list functions. The compiler
complains about passing a pointer-to-volatile via a regular
pointer argument. And rightfully so, as the sub-function
would potentially need different code to deal with the
volatile case.
That leaves us with a few options:
1. Drop the "volatile" modifier for the list items.
This is probably a bad idea. I checked the assembly
output from "gcc -O2", and the "volatile" really does
impact the order in which it updates memory.
2. Use macros instead of inline functions. The irony here
is that list.h is entirely implemented as trivial
inline functions. So we basically are already
generating custom code for each call. But sadly there's no
way in C to declare the inline function to take a more
generic type.
We could do so by switching the inline functions to
macros, but it does make the end result harder to read.
And it doesn't fully solve the problem (for instance,
the declaration of list_head needs to change so that
its "prev" and "next" pointers point to other volatile
structs).
3. Don't use list.h, and just make our own ad-hoc
doubly-linked list. It's not that much code to
implement the basics that we need here. But if we're
going to do so, why not add the few extra lines
required to model it after the actual list.h interface?
We can even reuse a few of the macro helpers.
So this patch takes option 3, but actually implements a
parallel "volatile list" interface in list.h, where it could
potentially be reused by other code. This implements just
enough for tempfile.c's use, though we could easily port
other functions later if need be.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:00 +03:00
|
|
|
volatile struct volatile_list_head list;
|
2015-08-10 12:47:41 +03:00
|
|
|
volatile int fd;
|
|
|
|
FILE *volatile fp;
|
|
|
|
volatile pid_t owner;
|
|
|
|
struct strbuf filename;
|
2022-08-27 01:46:29 +03:00
|
|
|
char *directory;
|
2015-08-10 12:47:41 +03:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to create a temporary file at the specified `path`. Return
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* a tempfile (whose "fd" member can be used for writing to it), or
|
|
|
|
* NULL on error. It is an error if a file already exists at that path.
|
2020-04-27 19:27:54 +03:00
|
|
|
* Note that `mode` will be further modified by the umask, and possibly
|
|
|
|
* `core.sharedRepository`, so it is not guaranteed to have the given
|
|
|
|
* mode.
|
2015-08-10 12:47:41 +03:00
|
|
|
*/
|
2020-04-27 19:27:54 +03:00
|
|
|
struct tempfile *create_tempfile_mode(const char *path, int mode);
|
|
|
|
|
|
|
|
static inline struct tempfile *create_tempfile(const char *path)
|
|
|
|
{
|
|
|
|
return create_tempfile_mode(path, 0666);
|
|
|
|
}
|
2015-08-10 12:47:41 +03:00
|
|
|
|
2015-08-10 12:47:44 +03:00
|
|
|
/*
|
|
|
|
* Register an existing file as a tempfile, meaning that it will be
|
|
|
|
* deleted when the program exits. The tempfile is considered closed,
|
|
|
|
* but it can be worked with like any other closed tempfile (for
|
|
|
|
* example, it can be opened using reopen_tempfile()).
|
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
struct tempfile *register_tempfile(const char *path);
|
2015-08-10 12:47:44 +03:00
|
|
|
|
2015-08-10 12:47:43 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* mks_tempfile functions
|
|
|
|
*
|
|
|
|
* The following functions attempt to create and open temporary files
|
|
|
|
* with names derived automatically from a template, in the manner of
|
|
|
|
* mkstemps(), and arrange for them to be deleted if the program ends
|
|
|
|
* before they are deleted explicitly. There is a whole family of such
|
|
|
|
* functions, named according to the following pattern:
|
|
|
|
*
|
|
|
|
* x?mks_tempfile_t?s?m?()
|
|
|
|
*
|
|
|
|
* The optional letters have the following meanings:
|
|
|
|
*
|
|
|
|
* x - die if the temporary file cannot be created.
|
|
|
|
*
|
|
|
|
* t - create the temporary file under $TMPDIR (as opposed to
|
|
|
|
* relative to the current directory). When these variants are
|
|
|
|
* used, template should be the pattern for the filename alone,
|
|
|
|
* without a path.
|
|
|
|
*
|
|
|
|
* s - template includes a suffix that is suffixlen characters long.
|
|
|
|
*
|
|
|
|
* m - the temporary file should be created with the specified mode
|
|
|
|
* (otherwise, the mode is set to 0600).
|
|
|
|
*
|
|
|
|
* None of these functions modify template. If the caller wants to
|
|
|
|
* know the (absolute) path of the file that was created, it can be
|
|
|
|
* read from tempfile->filename.
|
|
|
|
*
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
* On success, the functions return a tempfile whose "fd" member is open
|
|
|
|
* for writing the temporary file. On errors, they return NULL and set
|
|
|
|
* errno appropriately (except for the "x" variants, which die() on
|
|
|
|
* errors).
|
2015-08-10 12:47:43 +03:00
|
|
|
*/
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2019-04-29 11:28:14 +03:00
|
|
|
struct tempfile *mks_tempfile_sm(const char *filename_template,
|
2019-04-29 11:28:23 +03:00
|
|
|
int suffixlen, int mode);
|
2015-08-10 12:47:43 +03:00
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *mks_tempfile_s(const char *filename_template,
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
int suffixlen)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return mks_tempfile_sm(filename_template, suffixlen, 0600);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *mks_tempfile_m(const char *filename_template, int mode)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return mks_tempfile_sm(filename_template, 0, mode);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *mks_tempfile(const char *filename_template)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return mks_tempfile_sm(filename_template, 0, 0600);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2019-04-29 11:28:14 +03:00
|
|
|
struct tempfile *mks_tempfile_tsm(const char *filename_template,
|
2019-04-29 11:28:23 +03:00
|
|
|
int suffixlen, int mode);
|
2015-08-10 12:47:43 +03:00
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *mks_tempfile_ts(const char *filename_template,
|
tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:
struct tempfile t;
create_tempfile(&t, ...);
...
if (!err)
rename_tempfile(&t, ...);
else
delete_tempfile(&t);
But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.
Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).
But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it. Let's
see if we can make it harder to get this wrong.
Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.
Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).
This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.
The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:
1. Storing a pointer instead of the struct itself.
2. Passing the pointer instead of taking the struct
address.
3. Handling a "struct tempfile *" return instead of a file
descriptor.
We could play games to make this less noisy. For example, by
defining the tempfile like this:
struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
}
Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:15:08 +03:00
|
|
|
int suffixlen)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return mks_tempfile_tsm(filename_template, suffixlen, 0600);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *mks_tempfile_tm(const char *filename_template, int mode)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return mks_tempfile_tsm(filename_template, 0, mode);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *mks_tempfile_t(const char *filename_template)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return mks_tempfile_tsm(filename_template, 0, 0600);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2019-04-29 11:28:14 +03:00
|
|
|
struct tempfile *xmks_tempfile_m(const char *filename_template, int mode);
|
2015-08-10 12:47:43 +03:00
|
|
|
|
|
|
|
/* See "mks_tempfile functions" above. */
|
2018-02-14 21:59:57 +03:00
|
|
|
static inline struct tempfile *xmks_tempfile(const char *filename_template)
|
2015-08-10 12:47:43 +03:00
|
|
|
{
|
2018-02-14 21:59:57 +03:00
|
|
|
return xmks_tempfile_m(filename_template, 0600);
|
2015-08-10 12:47:43 +03:00
|
|
|
}
|
|
|
|
|
2022-04-20 23:26:09 +03:00
|
|
|
/*
|
|
|
|
* Attempt to create a temporary directory in $TMPDIR and to create and
|
|
|
|
* open a file in that new directory. Derive the directory name from the
|
|
|
|
* template in the manner of mkdtemp(). Arrange for directory and file
|
|
|
|
* to be deleted if the program exits before they are deleted
|
|
|
|
* explicitly. On success return a tempfile whose "filename" member
|
|
|
|
* contains the full path of the file and its "fd" member is open for
|
|
|
|
* writing the file. On error return NULL and set errno appropriately.
|
|
|
|
*/
|
|
|
|
struct tempfile *mks_tempfile_dt(const char *directory_template,
|
|
|
|
const char *filename);
|
|
|
|
|
2015-08-10 12:47:41 +03:00
|
|
|
/*
|
|
|
|
* Associate a stdio stream with the temporary file (which must still
|
|
|
|
* be open). Return `NULL` (*without* deleting the file) on error. The
|
tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:
if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);
wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).
Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:
if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
}
already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.
Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:14:30 +03:00
|
|
|
* stream is closed automatically when `close_tempfile_gently()` is called or
|
2015-08-10 12:47:41 +03:00
|
|
|
* when the file is deleted or renamed.
|
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
FILE *fdopen_tempfile(struct tempfile *tempfile, const char *mode);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
|
|
|
static inline int is_tempfile_active(struct tempfile *tempfile)
|
|
|
|
{
|
tempfile: drop active flag
Our tempfile struct contains an "active" flag. Long ago, this flag was
important: tempfile structs were always allocated for the lifetime of
the program and added to a global linked list, and the active flag was
what told us whether a struct's tempfile needed to be cleaned up on
exit.
But since 422a21c6a0 (tempfile: remove deactivated list entries,
2017-09-05) and 076aa2cbda (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we actually remove items from the list, and the active flag
is generally always set to true for any allocated struct. We set it to
true in all of the creation functions, and in the normal code flow it
becomes false only in deactivate_tempfile(), which then immediately
frees the struct.
So the flag isn't performing that role anymore, and in fact makes things
more confusing. Dscho noted that delete_tempfile() is a noop for an
inactive struct. Since 076aa2cbda taught it to free the struct when
deactivating, we'd leak any struct whose active flag is unset. But in
practice it's not a leak, because again, we'll free when we unset the
flag, and never see the allocated-but-inactive state.
Can we just get rid of the flag? The answer is yes, but it requires
looking at a few other spots:
1. I said above that the flag only becomes false before we deallocate,
but there's one exception: when we call remove_tempfiles() from a
signal or atexit handler, we unset the active flag as we remove
each file. This isn't important for delete_tempfile(), as nobody
would call it anymore, since we're exiting.
It does in theory provide us some protection against racily
double-removing a tempfile. If we receive a second signal while we
are already in the cleanup routines, we'll start the cleanup loop
again, and may visit the same tempfile. But this race already
exists, because calling unlink() and unsetting the active flag
aren't atomic! And it's OK in practice, because unlink() is
idempotent (barring the unlikely event that some other process
chooses our exact temp filename in that instant).
So dropping the active flag widens the race a bit, but it was
already there, and is fairly harmless in practice. If we really
care about addressing it, the right thing is probably to block
further signals while we're doing our cleanup (which we could
actually do atomically).
2. The active flag is declared as "volatile sig_atomic_t". The idea is
that it's the final bit that gets set to tell the cleanup routines
that the tempfile is ready to be used (or not used), and it's safe
to receive a signal racing with regular code which adds or removes
a tempfile from the list.
In practice, I don't think this is buying us anything. The presence
on the linked list is really what tells the cleanup routines to
look at the struct. That is already marked as "volatile". It's not
a sig_atomic_t, so it's possible that we could see a sheared write
there as an entry is added or removed. But that is true of the
current code, too! Before we can even look at the "active" flag,
we'd have to follow a link to the struct itself. If we see a
sheared write in the pointer to the struct, then we'll look at
garbage memory anyway, and there's not much we can do.
This patch removes the active flag entirely, using presence on the
global linked list as an indicator that a tempfile ought to be cleaned
up. We are already careful to add to the list as the final step in
activating. On deactivation, we'll make sure to remove from the list as
the first step, before freeing any fields. The use of the volatile
keyword should mean that those things happen in the expected order.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-30 22:45:06 +03:00
|
|
|
return !!tempfile;
|
2015-08-10 12:47:41 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the path of the lockfile. The return value is a pointer to a
|
|
|
|
* field within the lock_file object and should not be freed.
|
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
const char *get_tempfile_path(struct tempfile *tempfile);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
2019-04-29 11:28:14 +03:00
|
|
|
int get_tempfile_fd(struct tempfile *tempfile);
|
|
|
|
FILE *get_tempfile_fp(struct tempfile *tempfile);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the temporary file is still open, close it (and the file pointer
|
|
|
|
* too, if it has been opened using `fdopen_tempfile()`) without
|
|
|
|
* deleting the file. Return 0 upon success. On failure to `close(2)`,
|
tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:
if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);
wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).
Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:
if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
}
already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.
Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:14:30 +03:00
|
|
|
* return a negative value. Usually `delete_tempfile()` or `rename_tempfile()`
|
|
|
|
* should eventually be called regardless of whether `close_tempfile_gently()`
|
|
|
|
* succeeds.
|
2015-08-10 12:47:41 +03:00
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
int close_tempfile_gently(struct tempfile *tempfile);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Re-open a temporary file that has been closed using
|
tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:
if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);
wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).
Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:
if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
}
already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.
Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:14:30 +03:00
|
|
|
* `close_tempfile_gently()` but not yet deleted or renamed. This can be used
|
2015-08-10 12:47:41 +03:00
|
|
|
* to implement a sequence of operations like the following:
|
|
|
|
*
|
|
|
|
* * Create temporary file.
|
|
|
|
*
|
tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:
if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);
wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).
Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:
if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
}
already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.
Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-05 15:14:30 +03:00
|
|
|
* * Write new contents to file, then `close_tempfile_gently()` to cause the
|
2015-08-10 12:47:41 +03:00
|
|
|
* contents to be written to disk.
|
|
|
|
*
|
|
|
|
* * Pass the name of the temporary file to another program to allow
|
|
|
|
* it (and nobody else) to inspect or even modify the file's
|
|
|
|
* contents.
|
|
|
|
*
|
reopen_tempfile(): truncate opened file
We provide a reopen_tempfile() function, which is in turn
used by reopen_lockfile(). The idea is that a caller may
want to rewrite the tempfile without letting go of the lock.
And that's what our one caller does: after running
add--interactive, "commit -p" will update the cache-tree
extension of the index and write out the result, all while
holding the lock.
However, because we open the file with only the O_WRONLY
flag, the existing index content is left in place, and we
overwrite it starting at position 0. If the new index after
updating the cache-tree is smaller than the original, those
final bytes are not overwritten and remain in the file. This
results in a corrupt index, since those cruft bytes are
interpreted as part of the trailing hash (or even as an
extension, if there are enough bytes).
This bug actually pre-dates reopen_tempfile(); the original
code from 9c4d6c0297 (cache-tree: Write updated cache-tree
after commit, 2014-07-13) has the same bug, and those lines
were eventually refactored into the tempfile module. Nobody
noticed until now for two reasons:
- the bug can only be triggered in interactive mode
("commit -p" or "commit -i")
- the size of the index must shrink after updating the
cache-tree, which implies a non-trivial deletion. Notice
that the included test actually has to create a 2-deep
hierarchy. A single level is not enough to actually cause
shrinkage.
The fix is to truncate the file before writing out the
second index. We can do that at the caller by using
ftruncate(). But we shouldn't have to do that. There is no
other place in Git where we want to open a file and
overwrite bytes, making reopen_tempfile() a confusing and
error-prone interface. Let's pass O_TRUNC there, which gives
callers the same state they had after initially opening the
file or lock.
It's possible that we could later add a caller that wants
something else (e.g., to open with O_APPEND). But this is
the only caller we've had in the history of the codebase.
Let's punt on doing anything more clever until another one
comes along.
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-05 02:36:43 +03:00
|
|
|
* * `reopen_tempfile()` to reopen the temporary file, truncating the existing
|
|
|
|
* contents. Write out the new contents.
|
2015-08-10 12:47:41 +03:00
|
|
|
*
|
|
|
|
* * `rename_tempfile()` to move the file to its permanent location.
|
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
int reopen_tempfile(struct tempfile *tempfile);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Close the file descriptor and/or file pointer and remove the
|
|
|
|
* temporary file associated with `tempfile`. It is a NOOP to call
|
|
|
|
* `delete_tempfile()` for a `tempfile` object that has already been
|
|
|
|
* deleted or renamed.
|
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
void delete_tempfile(struct tempfile **tempfile_p);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Close the file descriptor and/or file pointer if they are still
|
|
|
|
* open, and atomically rename the temporary file to `path`. `path`
|
|
|
|
* must be on the same filesystem as the lock file. Return 0 on
|
|
|
|
* success. On failure, delete the temporary file and return -1, with
|
|
|
|
* `errno` set to the value from the failing call to `close(2)` or
|
|
|
|
* `rename(2)`. It is a bug to call `rename_tempfile()` for a
|
|
|
|
* `tempfile` object that is not currently active.
|
|
|
|
*/
|
2019-04-29 11:28:14 +03:00
|
|
|
int rename_tempfile(struct tempfile **tempfile_p, const char *path);
|
2015-08-10 12:47:41 +03:00
|
|
|
|
|
|
|
#endif /* TEMPFILE_H */
|