LIBCURL-STRUCTS: new document
This is the first version of this new document, detailing the seven perhaps most important internal structs in libcurl source code: 1.1 SessionHandle 1.2 connectdata 1.3 Curl_multi 1.4 Curl_handler 1.5 conncache 1.6 Curl_share 1.7 CookieInfo
This commit is contained in:
Родитель
785749405f
Коммит
96749554fd
113
docs/INTERNALS
113
docs/INTERNALS
|
@ -111,6 +111,9 @@ Windows vs Unix
|
|||
Library
|
||||
=======
|
||||
|
||||
(See LIBCURL-STRUCTS for a separate document describing all major internal
|
||||
structs and their purposes.)
|
||||
|
||||
There are plenty of entry points to the library, namely each publicly defined
|
||||
function that libcurl offers to applications. All of those functions are
|
||||
rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
|
||||
|
@ -135,16 +138,18 @@ Library
|
|||
options is documented in the man page. This function mainly sets things in
|
||||
the 'SessionHandle' struct.
|
||||
|
||||
curl_easy_perform() does a whole lot of things:
|
||||
curl_easy_perform() is just a wrapper function that makes use of the multi
|
||||
API. It basically curl_multi_init(), curl_multi_add_handle(),
|
||||
curl_multi_wait(), and curl_multi_perform() until the transfer is done and
|
||||
then returns.
|
||||
|
||||
It starts off in the lib/easy.c file by calling Curl_perform() and the main
|
||||
work then continues in lib/url.c. The flow continues with a call to
|
||||
Curl_connect() to connect to the remote site.
|
||||
Some of the most important key functions in url.c are called from multi.c
|
||||
when certain key steps are to be made in the transfer operation.
|
||||
|
||||
o Curl_connect()
|
||||
|
||||
... analyzes the URL, it separates the different components and connects to
|
||||
the remote host. This may involve using a proxy and/or using SSL. The
|
||||
Analyzes the URL, it separates the different components and connects to the
|
||||
remote host. This may involve using a proxy and/or using SSL. The
|
||||
Curl_resolv() function in lib/hostip.c is used for looking up host names
|
||||
(it does then use the proper underlying method, which may vary between
|
||||
platforms and builds).
|
||||
|
@ -160,10 +165,7 @@ Library
|
|||
o Curl_do()
|
||||
|
||||
Curl_do() makes sure the proper protocol-specific function is called. The
|
||||
functions are named after the protocols they handle. Curl_ftp(),
|
||||
Curl_http(), Curl_dict(), etc. They all reside in their respective files
|
||||
(ftp.c, http.c and dict.c). HTTPS is handled by Curl_http() and FTPS by
|
||||
Curl_ftp().
|
||||
functions are named after the protocols they handle.
|
||||
|
||||
The protocol-specific functions of course deal with protocol-specific
|
||||
negotiations and setup. They have access to the Curl_sendf() (from
|
||||
|
@ -182,10 +184,9 @@ Library
|
|||
be called with some basic info about the upcoming transfer: what socket(s)
|
||||
to read/write and the expected file transfer sizes (if known).
|
||||
|
||||
o Transfer()
|
||||
o Curl_readwrite()
|
||||
|
||||
Curl_perform() then calls Transfer() in lib/transfer.c that performs the
|
||||
entire file transfer.
|
||||
Called during the transfer of the actual protocol payload.
|
||||
|
||||
During transfer, the progress functions in lib/progress.c are called at a
|
||||
frequent interval (or at the user's choice, a specified callback might get
|
||||
|
@ -207,33 +208,11 @@ Library
|
|||
used. This function is only used when we are certain that no more transfers
|
||||
is going to be made on the connection. It can be also closed by force, or
|
||||
it can be called to make sure that libcurl doesn't keep too many
|
||||
connections alive at the same time (there's a default amount of 5 but that
|
||||
can be changed with the CURLOPT_MAXCONNECTS option).
|
||||
connections alive at the same time.
|
||||
|
||||
This function cleans up all resources that are associated with a single
|
||||
connection.
|
||||
|
||||
Curl_perform() is the function that does the main "connect - do - transfer -
|
||||
done" loop. It loops if there's a Location: to follow.
|
||||
|
||||
When completed, the curl_easy_cleanup() should be called to free up used
|
||||
resources. It runs Curl_disconnect() on all open connections.
|
||||
|
||||
A quick roundup on internal function sequences (many of these call
|
||||
protocol-specific function-pointers):
|
||||
|
||||
Curl_connect - connects to a remote site and does initial connect fluff
|
||||
This also checks for an existing connection to the requested site and uses
|
||||
that one if it is possible.
|
||||
|
||||
Curl_do - starts a transfer
|
||||
Curl_handler::do_it() - transfers data
|
||||
Curl_done - ends a transfer
|
||||
|
||||
Curl_disconnect - disconnects from a remote site. This is called when the
|
||||
disconnect is really requested, which doesn't necessarily have to be
|
||||
exactly after curl_done in case we want to keep the connection open for
|
||||
a while.
|
||||
|
||||
HTTP(S)
|
||||
|
||||
|
@ -316,48 +295,38 @@ Persistent Connections
|
|||
hold connection-oriented data. It is meant to hold the root data as well as
|
||||
all the options etc that the library-user may choose.
|
||||
o The 'SessionHandle' struct holds the "connection cache" (an array of
|
||||
pointers to 'connectdata' structs). There's one connectdata struct
|
||||
allocated for each connection that libcurl knows about. Note that when you
|
||||
use the multi interface, the multi handle will hold the connection cache
|
||||
and not the particular easy handle. This of course to allow all easy handles
|
||||
in a multi stack to be able to share and re-use connections.
|
||||
pointers to 'connectdata' structs).
|
||||
o This enables the 'curl handle' to be reused on subsequent transfers.
|
||||
o When we are about to perform a transfer with curl_easy_perform(), we first
|
||||
check for an already existing connection in the cache that we can use,
|
||||
otherwise we create a new one and add to the cache. If the cache is full
|
||||
already when we add a new connection, we close one of the present ones. We
|
||||
select which one to close dependent on the close policy that may have been
|
||||
previously set.
|
||||
o When the transfer operation is complete, we try to leave the connection
|
||||
open. Particular options may tell us not to, and protocols may signal
|
||||
closure on connections and then we don't keep it open of course.
|
||||
o When libcurl is told to perform a transfer, it first checks for an already
|
||||
existing connection in the cache that we can use. Otherwise it creates a
|
||||
new one and adds that the cache. If the cache is full already when a new
|
||||
conncetion is added added, it will first close the oldest unused one.
|
||||
o When the transfer operation is complete, the connection is left
|
||||
open. Particular options may tell libcurl not to, and protocols may signal
|
||||
closure on connections and then they won't be kept open of course.
|
||||
o When curl_easy_cleanup() is called, we close all still opened connections,
|
||||
unless of course the multi interface "owns" the connections.
|
||||
|
||||
You do realize that the curl handle must be re-used in order for the
|
||||
persistent connections to work.
|
||||
The curl handle must be re-used in order for the persistent connections to
|
||||
work.
|
||||
|
||||
multi interface/non-blocking
|
||||
============================
|
||||
|
||||
We make an effort to provide a non-blocking interface to the library, the
|
||||
multi interface. To make that interface work as good as possible, no
|
||||
low-level functions within libcurl must be written to work in a blocking
|
||||
manner.
|
||||
The multi interface is a non-blocking interface to the library. To make that
|
||||
interface work as good as possible, no low-level functions within libcurl
|
||||
must be written to work in a blocking manner. (There are still a few spots
|
||||
violating this rule.)
|
||||
|
||||
One of the primary reasons we introduced c-ares support was to allow the name
|
||||
resolve phase to be perfectly non-blocking as well.
|
||||
|
||||
The ultimate goal is to provide the easy interface simply by wrapping the
|
||||
multi interface functions and thus treat everything internally as the multi
|
||||
interface is the single interface we have.
|
||||
|
||||
The FTP and the SFTP/SCP protocols are thus perfect examples of how we adapt
|
||||
and adjust the code to allow non-blocking operations even on multi-stage
|
||||
protocols. They are built around state machines that return when they could
|
||||
block waiting for data. The DICT, LDAP and TELNET protocols are crappy
|
||||
examples and they are subject for rewrite in the future to better fit the
|
||||
libcurl protocol family.
|
||||
The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
|
||||
the code to allow non-blocking operations even on multi-stage command-
|
||||
response protocols. They are built around state machines that return when
|
||||
they would otherwise block waiting for data. The DICT, LDAP and TELNET
|
||||
protocols are crappy examples and they are subject for rewrite in the future
|
||||
to better fit the libcurl protocol family.
|
||||
|
||||
SSL libraries
|
||||
=============
|
||||
|
@ -408,12 +377,12 @@ API/ABI
|
|||
Client
|
||||
======
|
||||
|
||||
main() resides in src/main.c together with most of the client code.
|
||||
main() resides in src/tool_main.c.
|
||||
|
||||
src/tool_hugehelp.c is automatically generated by the mkhelp.pl perl script
|
||||
to display the complete "manual" and the src/urlglob.c file holds the
|
||||
functions used for the URL-"globbing" support. Globbing in the sense that
|
||||
the {} and [] expansion stuff is there.
|
||||
to display the complete "manual" and the src/tool_urlglob.c file holds the
|
||||
functions used for the URL-"globbing" support. Globbing in the sense that the
|
||||
{} and [] expansion stuff is there.
|
||||
|
||||
The client mostly messes around to setup its 'config' struct properly, then
|
||||
it calls the curl_easy_*() functions of the library and when it gets back
|
||||
|
@ -425,8 +394,8 @@ Client
|
|||
curl_easy_getinfo() function to extract useful information from the curl
|
||||
session.
|
||||
|
||||
Recent versions may loop and do all this several times if many URLs were
|
||||
specified on the command line or config file.
|
||||
It may loop and do all this several times if many URLs were specified on the
|
||||
command line or config file.
|
||||
|
||||
Memory Debugging
|
||||
================
|
||||
|
|
|
@ -0,0 +1,245 @@
|
|||
_ _ ____ _
|
||||
___| | | | _ \| |
|
||||
/ __| | | | |_) | |
|
||||
| (__| |_| | _ <| |___
|
||||
\___|\___/|_| \_\_____|
|
||||
|
||||
Structs in libcurl
|
||||
|
||||
This document should cover 7.32.0 pretty accurately, but will make sense even
|
||||
for older and later versions as things don't change drastically that often.
|
||||
|
||||
1. The main structs in libcurl
|
||||
1.1 SessionHandle
|
||||
1.2 connectdata
|
||||
1.3 Curl_multi
|
||||
1.4 Curl_handler
|
||||
1.5 conncache
|
||||
1.6 Curl_share
|
||||
1.7 CookieInfo
|
||||
|
||||
==============================================================================
|
||||
|
||||
1. The main structs in libcurl
|
||||
|
||||
1.1 SessionHandle
|
||||
|
||||
The SessionHandle handle struct is the one returned to the outside in the
|
||||
external API as a "CURL *". This is usually known as an easy handle in API
|
||||
documentations and examples.
|
||||
|
||||
Information and state that is related to the actual connection is in the
|
||||
'connectdata' struct. When a transfer is about to be made, libcurl will
|
||||
either create a new connection or re-use an existing one. The particular
|
||||
connectdata that is used by this handle is pointed out by
|
||||
SessionHandle->easy_conn.
|
||||
|
||||
Data and information that regard this particular single transfer is put in
|
||||
the SingleRequest sub-struct.
|
||||
|
||||
When the SessionHandle struct is added to a multi handle, as it must be in
|
||||
order to do any transfer, the ->multi member will point to the Curl_multi
|
||||
struct it belongs to. The ->prev and ->next members will then be used by the
|
||||
multi code to keep a linked list of SessionHandle structs that are added to
|
||||
that same multi handle. libcurl always uses multi so ->multi *will* point to
|
||||
a Curl_multi when a transfer is in progress.
|
||||
|
||||
->mstate is the multi state of this particular SessionHandle. When
|
||||
multi_runsingle() is called, it will act on this handle according to which
|
||||
state it is in. The mstate is also what tells which sockets to return for a
|
||||
speicific SessionHandle when curl_multi_fdset() is called etc.
|
||||
|
||||
The libcurl source code generally use the name 'data' for the variable that
|
||||
points to the SessionHandle.
|
||||
|
||||
|
||||
1.2 connectdata
|
||||
|
||||
A general idea in libcurl is to keep connections around in a connection
|
||||
"cache" after they have been used in case they will be used again and then
|
||||
re-use an existing one instead of creating a new as it creates a significant
|
||||
performance boost.
|
||||
|
||||
Each 'connectdata' identifies a single physical conncetion to a server. If
|
||||
the connection can't be kept alive, the connection will be closed after use
|
||||
and then this struct can be removed from the cache and freed.
|
||||
|
||||
Thus, the same SessionHandle can be used multiple times and each time select
|
||||
another connectdata struct to use for the connection. Keep this in mind, as
|
||||
it is then important to consider if options or choices are based on the
|
||||
connection or the SessionHandle.
|
||||
|
||||
Functions in libcurl will assume that connectdata->data points to the
|
||||
SessionHandle that uses this connection.
|
||||
|
||||
As a special complexity, some protocols supported by libcurl require a
|
||||
special disconnect procedure that is more than just shutting down the
|
||||
socket. It can involve sending one or more commands to the server before
|
||||
doing so. Since connections are kept in the connection cache after use, the
|
||||
original SessionHandle may no longer be around when the time comes to shut
|
||||
down a particular connection. For this purpose, libcurl holds a special
|
||||
dummy 'closure_handle' SessionHandle in the Curl_multi struct to
|
||||
|
||||
FTP uses two TCP connections for a typical transfer but it keeps both in
|
||||
this single struct and thus can be considered a single connection for most
|
||||
internal concerns.
|
||||
|
||||
The libcurl source code generally use the name 'conn' for the variable that
|
||||
points to the connectdata.
|
||||
|
||||
|
||||
1.3 Curl_multi
|
||||
|
||||
Internally, the easy interface is implemented as a wrapper around multi
|
||||
interface functions. This makes everything multi interface.
|
||||
|
||||
Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs.
|
||||
|
||||
This struct holds a list of SessionHandle structs that have been added to
|
||||
this handle with curl_multi_add_handle(). The start of the list is ->easyp
|
||||
and ->num_easy is a counter of added SessionHandles.
|
||||
|
||||
->msglist is a linked list of messages to send back when
|
||||
curl_multi_info_read() is called. Basically a node is added to that list
|
||||
when an individual SessionHandle's transfer has completed.
|
||||
|
||||
->hostcache points to the name cache. It is a hash table for looking up name
|
||||
to IP. The nodes have a limited life time in there and this cache is meant
|
||||
to reduce the time for when the same name is wanted within a short period of
|
||||
time.
|
||||
|
||||
->timetree points to a tree of SessionHandles, sorted by the remaining time
|
||||
until it should be checked - normally some sort of timeout. Each
|
||||
SessionHandle has one node in the tree.
|
||||
|
||||
->sockhash is a hash table to allow fast lookups of socket descriptor to
|
||||
which SessionHandle that uses that descriptor. This is necessary for the
|
||||
multi_socket API.
|
||||
|
||||
->conn_cache points to the connection cache. It keeps track of all
|
||||
connections that are kept after use. The cache has a maximum size.
|
||||
|
||||
->closure_handle is described in the 'connectdata' section.
|
||||
|
||||
The libcurl source code generally use the name 'multi' for the variable that
|
||||
points to the Curl_multi struct.
|
||||
|
||||
|
||||
1.4 Curl_handler
|
||||
|
||||
Each unique protocol that is supported by libcurl needs to provide at least
|
||||
one Curl_handler struct. It defines what the protocol is called and what
|
||||
functions the main code should call to deal with protocol specific issues.
|
||||
In general, there's a source file named [protocol].c in which there's a
|
||||
"struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's
|
||||
then the main array with all individual Curl_handler structs pointed to from
|
||||
a single array which is scanned through when a URL is given to libcurl to
|
||||
work with.
|
||||
|
||||
->scheme is the URL scheme name, usually spelled out in uppercase. That's
|
||||
"HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler
|
||||
setup so HTTPS separate from HTTP.
|
||||
|
||||
->setup_connection is called to allow the protocol code to allocate protocol
|
||||
specific data that then gets associated with that SessionHandle for the rest
|
||||
of this transfer. It gets freed again at the end of the transfer. It will be
|
||||
called before the 'connectdata' for the transfer has been selected/created.
|
||||
Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
|
||||
SessionHandle->req.protop to point to it.
|
||||
|
||||
->connect_it allows a protocol to do some specific actions after the TCP
|
||||
connect is done, that can still be considered part of the connection phase.
|
||||
|
||||
Some protocols will alter the connectdata->recv[] and connectdata->send[]
|
||||
function pointers in this function.
|
||||
|
||||
->connecting is similarly a function that keeps getting called as long as the
|
||||
protocol considers itself still in the connecting phase.
|
||||
|
||||
->do_it is the function called to issue the transfer request. What we call
|
||||
the DO action internally. If the DO is not enough and things need to be kept
|
||||
getting done for the entier DO sequence to complete, ->doing is then usually
|
||||
also provided. Each protocol that needs to do multiple commands or similar
|
||||
for do/doing need to implement their own state machines (see SCP, SFTP,
|
||||
FTP). Some protocols (only FTP and only due to historical reasons) has a
|
||||
separate piece of the DO state called DO_MORE.
|
||||
|
||||
->doing keeps getting called while issudeing the transfer request command(s)
|
||||
|
||||
->done gets called when the transfer is complete and DONE. That's after the
|
||||
main data has been transferred.
|
||||
|
||||
->do_more gets called doring the DO_MORE state. The FTP protocol uses this
|
||||
state when setting up the second connection.
|
||||
|
||||
->proto_getsock
|
||||
->doing_getsock
|
||||
->domore_getsock
|
||||
->perform_getsock
|
||||
Functions that return socket information. Which socket(s) to wait for which
|
||||
action(s) during the particular multi state.
|
||||
|
||||
->disconnect is called immediately before the TCP connection is shutdown.
|
||||
|
||||
->readwrite gets called during transfer to allow the protocol to do extra
|
||||
reads/writes
|
||||
|
||||
->defport is the default report TCP or UDP port this protocol uses
|
||||
|
||||
->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have
|
||||
their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS".
|
||||
|
||||
->flags is a bitmask with additional information about the protocol that will
|
||||
make it get treated differently by the generic engine:
|
||||
|
||||
PROTOPT_SSL - will make it connect and negotiate SSL
|
||||
|
||||
PROTOPT_DUAL - this protocol uses two connections
|
||||
|
||||
PROTOPT_CLOSEACTION - this protocol has actions to do before closing the
|
||||
connection. This flag is no longer used by code, yet still set for a bunch
|
||||
protocol handlers.
|
||||
|
||||
PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to
|
||||
limit which "direction" of socket actions that the main engine will
|
||||
concern itself about.
|
||||
|
||||
PROTOPT_NONETWORK - a protocol that doesn't use network (read file:)
|
||||
|
||||
PROTOPT_NEEDSPWD - this protocol needs a password and will use a default
|
||||
one unless one is provided
|
||||
|
||||
PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL
|
||||
(?foo=bar)
|
||||
|
||||
|
||||
1.5 conncache
|
||||
|
||||
Is a hash table with connections for later re-use. Each SessionHandle has
|
||||
a pointer to its connection cache. Each multi handle sets up a connection
|
||||
cache that all added SessionHandles share by default.
|
||||
|
||||
|
||||
1.6 Curl_share
|
||||
|
||||
The libcurl share API allocates a Curl_share struct, exposed to the external
|
||||
API as "CURLSH *".
|
||||
|
||||
The idea is that the struct can have a set of own versions of caches and
|
||||
pools and then by providing this struct in the CURLOPT_SHARE option, those
|
||||
specific SessionHandles will use the caches/pools that this share handle
|
||||
holds.
|
||||
|
||||
Then individual SessionHandle structs can be made to share specific things
|
||||
that they otherwise wouldn't, such as cookies.
|
||||
|
||||
The Curl_share struct can currently hold cookies, DNS cache and the SSL
|
||||
session cache.
|
||||
|
||||
|
||||
1.7 CookieInfo
|
||||
|
||||
This is the main cookie struct. It holds all known cookies and related
|
||||
information. Each SessionHandle has its own private CookieInfo even when
|
||||
they are added to a multi handle. They can be made to share cookies by using
|
||||
the share API.
|
|
@ -5,7 +5,7 @@
|
|||
# | (__| |_| | _ <| |___
|
||||
# \___|\___/|_| \_\_____|
|
||||
#
|
||||
# Copyright (C) 1998 - 2012, Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
# Copyright (C) 1998 - 2013, Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
#
|
||||
# This software is licensed as described in the file COPYING, which
|
||||
# you should have received as part of this distribution. The terms
|
||||
|
@ -36,7 +36,7 @@ EXTRA_DIST = MANUAL BUGS CONTRIBUTE FAQ FEATURES INTERNALS SSLCERTS \
|
|||
README.win32 RESOURCES TODO TheArtOfHttpScripting THANKS VERSIONS \
|
||||
KNOWN_BUGS BINDINGS $(man_MANS) $(HTMLPAGES) HISTORY INSTALL \
|
||||
$(PDFPAGES) LICENSE-MIXING README.netware DISTRO-DILEMMA INSTALL.devcpp \
|
||||
MAIL-ETIQUETTE HTTP-COOKIES
|
||||
MAIL-ETIQUETTE HTTP-COOKIES LIBCURL-STRUCTS
|
||||
|
||||
MAN2HTML= roffit < $< >$@
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче