Remove the call to memcpy() and strchr() for each path component
tested, and instead add each path component as we go forward inside
the while-loop.
Impact: small optimisation
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.
Also, add a note about this fact into the coding guidelines.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/lstat-cache:
lstat_cache(): introduce clear_lstat_cache() function
lstat_cache(): introduce invalidate_lstat_cache() function
lstat_cache(): introduce has_dirs_only_path() function
lstat_cache(): introduce has_symlink_or_noent_leading_path() function
lstat_cache(): more cache effective symlink/directory detection
The create_directories() function in entry.c currently calls stat()
or lstat() for each path component of the pathname 'path' each and every
time. For the 'git checkout' command, this function is called on each
file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get
lots and lots of calls.
To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions. Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.
To support it we must update the lstat_cache() function to be able to
say that "please test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new inline function is_dot_or_dotdot is used to check if the
directory name is either "." or "..". It returns a non-zero value if
the given string is "." or "..". It's applicable to a lot of Git
source code.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a mechanical conversion of all '*.c' files with:
s/((?:die|error|warning)\("git)-(\S+:)/$1 $2/;
The result was manually inspected and no false positive was found.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently when checking out an entry "path", we try to unlink(2) it first
(because there could be stale file), and if there is a directory there,
try to deal with it (typically we run recursive rmdir). We ignore the
error return from this unlink because there may not even be any file
there.
However if you are root on Solaris, you can unlink(2) a directory
successfully and corrupt your filesystem.
This moves the code around and check the directory first, and then
unlink(2). Also we check the error code from it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.
In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.
This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jc/maint-add-sync-stat:
t2200: test more cases of "add -u"
git-add: make the entry stat-clean after re-adding the same contents
ce_match_stat, run_diff_files: use symbolic constants for readability
Conflicts:
builtin-add.c
ce_match_stat() can be told:
(1) to ignore CE_VALID bit (used under "assume unchanged" mode)
and perform the stat comparison anyway;
(2) not to perform the contents comparison for racily clean
entries and report mismatch of cached stat information;
using its "option" parameter. Give them symbolic constants.
Similarly, run_diff_files() can be told not to report anything
on removed paths. Also give it a symbolic constant for that.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.
strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.
as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.
API changes:
* strbuf_setlen now always works, so just make strbuf_reset a convenience
macro.
* strbuf_detatch takes a size_t* optional argument (meaning it can be
NULL) to copy the buffer's len, as it was needed for this refactor to
make the code more readable, and working like the callers.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes .gitattributes files to be read from the index when
they are not checked out to the work tree. This is in line with
the way we always allowed low-level tools to operate in sparsely
checked out work tree in a reasonable way.
It swaps the order of new file creation and converting the blob
to work tree representation; otherwise when we are in the middle
of checking out .gitattributes we would notice an empty but
unwritten .gitattributes file in the work tree and will ignore
the copy in the index.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.
We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).
Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.
As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'jc/attr': (28 commits)
lockfile: record the primary process.
convert.c: restructure the attribute checking part.
Fix bogus linked-list management for user defined merge drivers.
Simplify calling of CR/LF conversion routines
Document gitattributes(5)
Update 'crlf' attribute semantics.
Documentation: support manual section (5) - file formats.
Simplify code to find recursive merge driver.
Counto-fix in merge-recursive
Fix funny types used in attribute value representation
Allow low-level driver to specify different behaviour during internal merge.
Custom low-level merge driver: change the configuration scheme.
Allow the default low-level merge driver to be configured.
Custom low-level merge driver support.
Add a demonstration/test of customized merge.
Allow specifying specialized merge-backend per path.
merge-recursive: separate out xdl_merge() interface.
Allow more than true/false to attributes.
Document git-check-attr
Change attribute negation marker from '!' to '-'.
...
This actually allows us to check out a supermodule after cloning, although
the submodules themselves will obviously not be checked out, and will just
be empty directories.
Checking out the submodules will be up to higher levels - we may not even
want to!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some file systems that can host git repositories and their working copies
do not support symbolic links. But then if the repository contains a symbolic
link, it is impossible to check out the working copy.
This patch enables partial support of symbolic links so that it is possible
to check out a working copy on such a file system. A new flag
core.symlinks (which is true by default) can be set to false to indicate
that the filesystem does not support symbolic links. In this case, symbolic
links that exist in the trees are checked out as small plain files, and
checking in modifications of these files preserve the symlink property in
the database (as long as an entry exists in the index).
Of course, this does not magically make symbolic links work on such defective
file systems; hence, this solution does not help if the working copy relies
on that an entry is a real symbolic link.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Here's a patch that I think we can merge right now. There may be
other places that need this, but this at least points out the
three places that read/write working tree files for git
update-index, checkout and diff respectively. That should cover
a lot of it [jc: git-apply uses an entirely different codepath
both for reading and writing].
Some day we can actually implement it. In the meantime, this
points out a place for people to start. We *can* even start with
a really simple "we do CRLF conversion automatically, regardless
of filename" kind of approach, that just look at the data (all
three cases have the _full_ file data already in memory) and
says "ok, this is text, so let's convert to/from DOS format
directly".
THAT somebody can write in ten minutes, and it would already
make git much nicer on a DOS/Windows platform, I suspect.
And it would be totally zero-cost if you just make it a config
option (but please make it dynamic with the _default_ just being
0/1 depending on whether it's UNIX/Windows, just so that UNIX
people can _test_ it easily).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked write() calls. Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes. Switch to using
the new write_in_full(). Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().
Note, the changes to config handling are much larger and handled
in the next patch in the sequence.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
According to sys/paramh.h it's a "BSD name" for values defined in
<limits.h>. Besides PATH_MAX seems to be more commonly used.
Signed-off-by: Jonas Fonseca <fonseca@diku.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Sometimes it is convient for a Porcelain to be able to checkout all
unmerged files in all stages so that an external merge tool can be
executed by the Porcelain or the end-user. Using git-unpack-file
on each stage individually incurs a rather high penalty due to the
need to fork for each file version obtained. git-checkout-index -a
--stage=all will now do the same thing, but faster.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "assume unchanged" logic, started by this message in the list
discussion recently:
<Pine.LNX.4.64.0601311807470.7301@g5.osdl.org>
This is a workaround for filesystems that do not have lstat()
that is quick enough for the index mechanism to take advantage
of. On the paths marked as "assumed to be unchanged", the user
needs to explicitly use update-index to register the object name
to be in the next commit.
You can use two new options to update-index to set and reset the
CE_VALID bit:
git-update-index --assume-unchanged path...
git-update-index --no-assume-unchanged path...
These forms manipulate only the CE_VALID bit; it does not change
the object name recorded in the index file. Nor they add a new
entry to the index.
When the configuration variable "core.ignorestat = true" is set,
the index entries are marked with CE_VALID bit automatically
after:
- update-index to explicitly register the current object name to the
index file.
- when update-index --refresh finds the path to be up-to-date.
- when tools like read-tree -u and apply --index update the working
tree file and register the current object name to the index file.
The flag is dropped upon read-tree that does not check out the index
entry. This happens regardless of the core.ignorestat settings.
Index entries marked with CE_VALID bit are assumed to be
unchanged most of the time. However, there are cases that
CE_VALID bit is ignored for the sake of safety and usability:
- while "git-read-tree -m" or git-apply need to make sure
that the paths involved in the merge do not have local
modifications. This sacrifices performance for safety.
- when git-checkout-index -f -q -u -a tries to see if it needs
to checkout the paths. Otherwise you can never check
anything out ;-).
- when git-update-index --really-refresh (a new flag) tries to
see if the index entry is up to date. You can start with
everything marked as CE_VALID and run this once to drop
CE_VALID bit for paths that are modified.
Most notably, "update-index --refresh" honours CE_VALID and does
not actively stat, so after you modified a file in the working
tree, update-index --refresh would not notice until you tell the
index about it with "git-update-index path" or "git-update-index
--no-assume-unchanged path".
This version is not expected to be perfect. I think diff
between index and/or tree and working files may need some
adjustment, and there probably needs other cases we should
automatically unmark paths that are marked to be CE_VALID.
But the basics seem to work, and ready to be tested by people
who asked for this feature.
Signed-off-by: Junio C Hamano <junkio@cox.net>
If the index records an insanely long symbolic link, copying
into the temporary would overflow the buffer (noticed by Mark
Wooding).
Because read_sha1_file() terminates the returned buffer with NUL
since late May 2005, there is no reason to copy it anymore.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Without -f flag, 'git-checkout-index foo.c' issued an error message
when foo.c already existed in the working tree and did not match index.
However it did not return an error from the underlying checkout_entry()
function and resulted in a successful exit(0).
Signed-off-by: Junio C Hamano <junkio@cox.net>
As promised, this is the "big tool rename" patch. The primary differences
since 0.99.6 are:
(1) git-*-script are no more. The commands installed do not
have any such suffix so users do not have to remember if
something is implemented as a shell script or not.
(2) Many command names with 'cache' in them are renamed with
'index' if that is what they mean.
There are backward compatibility symblic links so that you and
Porcelains can keep using the old names, but the backward
compatibility support is expected to be removed in the near
future.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The symlink case had never worked, and the file case was broken by the
O_EXCL change because the error return changed from EISDIR to EEXIST.
Fix both problems by just moving the test for an existing directory to a
more logical place.
We should always have unlinked any old ones before, but this just makes
sure that we never over-write any old file.
A quick "grep" now shows that all the core tools that open files for
writing use O_EXCL, ie we never overwrite an existing file in place.
IIRC our strategy was to let the users' umask take care of the
final mode bits. This patch fixes places that deviate from it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>