This commit is contained in:
kestes%walrus.com 2001-12-31 20:02:05 +00:00
Родитель 0749982b94
Коммит e9e256356f
2 изменённых файлов: 430 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,153 @@
Overview of the Tinderbox System
--------------------------------
Tinderbox is an information display system. It runs on a machine with
a webserver and will periodically write static HTML files to the disk
so that the webserver can serve these documents. Tinderbox is run out
of cron every five minutes. It gathers up information from various
databases including: CVS Logs, Bonsai, and Perforce. It will also
process mail which is sent to it. Mail is sent from Bug Ticketing
software and Build/Test Machines. All this information is combined to
produce the HTML pages.
Since no two companies will structure their development processes the
same way, the tinderbox code has to be highly configurable to account
for most possible uses. There is a main configuration file which
allows most of the major user configurable variables to be set.
Novice users can expect to edit only this file and get a working
tinderbox system. Additionally each library has been broken into two
parts. One part is the library specific configurations. This file is
expected to need modifications in some installations. I have put all
the library configurations into one directory to make it easy to find
the parts of tinderbox which are easy to modify. Each configuration
library can be thought of as a table which might need to be edited or
extended for use at your company. I have provided a working system
but the defaults may not suit your needs. These tables can be easily
changed in small ways by simply looking at the file and making obvious
changes. I have also allowed for the possibility of making complex
changes that only a competent perl programmer could define. Changes
are not made to the files which I have provided. Rather the changes
are made to copies of the files which are stored in a local
configuration directory. This ensures that you can easily version the
Tinderbox code as it is provided to you from the official distribution
and you can separately version the local configurations which you
make. It is also easy to see the local configurations since you have
both the original and the modified code on the same server and can
difference the two. As an example you might need to change the
BuildStatus I assume that you have the following possible build
outcomes (Build in progress, Build failed, Build succeded but tests
failed, Build and all tests were successful) You may have additional
outcomes to specifiy which kind of tests failed (unit test failed, not
enough unit test coverage, performance tests failed). Similarly you may have unusual requirements for how the filesystem should be laied out. I provide a
I suggest that you read through the files to see
how they are laid out and what types of changes are possible.
The build machines are not considered part of the tinderbox server.
They are clients just like Bug Ticketing systems and Version control
systems are clients. Build machines mail their build logs to the
server in a special format. This format specifies that name/value
pairs must appear at the top of the mail message followed by the
complete build log. Scripts for setting up a tinderbox build client
can be found in the clientbin directory but you may have other build
needs and may use any build methods you choose.
The central concept of the Tinderbox system is the notion of a 'Tree'.
When several different groups are working out of the same version
control system often the files are partitioned into separate modules
with each group working on one or more disjoint modules. Over time
the developers need to branch their code because several different
versions of the files are under development at the same time. A tree
is a module/branch pair. This corresponds to a set of files which can
be checked out and built. Tinderbox makes one page for each tree and
displays what work is being done on that tree. CVS has a notion of
branches and of modules but not of trees. It is not possible to give
a branch/module pair a name. The tinderbox TreeData provides the
mappings between treenames and branch/module pairs. Tinderbox
displays the updates to bug tickets on the appropriate tree page.
This requires an easy mapping between bug tickets and trees. One
example of a complex function to determine tree name would be if each
of the product product types listed in the bug tracking data base
refers to one development project, except for a particular
feature/platform of one particular project which is being developed by
a separate group of developers. So the version control notion of
trees (a set of modules on a branch) may not have a direct map into
the bug tracking database at all times. In large projects it is
sometimes convenient to have a tree called 'ALL' which is used to
display all checkins performed on any trees and all bug tickets worked
on by any programmers. It is not possible to build or test the 'ALL'
tree and neither the version control nor bug ticketing system knows of
its existence.
The Bug Tracking code was intended to be as general as possible.
Most bug ticketing systems send mail when tickets change state. The
mail is often of the same form. It is a name/value pair which the
separator being the string ": ". Tinderbox will parse mail of this
form and display the interesting fields on the appropriate tree page.
The configuration of this module involves specifying which bug ticket
names are interesting and should be displayed. Also you will need to
specify how to map a bug ticket into a a tree. This could be very
simple if each bug ticket has a field which represents the tree it is
applicable to (in this case tree could equal project) or can be very
complex if the tree must be computed by the values of a set of fields.
Also tinderbox keeps track of which bugs are "reopened" and displays
them in a different column. The idea is that some bugs are moving
backwards and creating duplicate work. These tickets are particularly
troublesome and should be watched specially. So possible all ticket
status are partitioned into "progress" or "slippage" categories. You
will need to specify what status values are possible for your ticket
system and you will also need to specify the set of columns which you
would like to see on the status page.
The heart of the tinderbox system is the 'status table'. This is an
HTML table which graphically shows how the changes made to the
development databases. It will show what is going on in the version
control system, the bug tracking system, the build system, automatic
regression tests and provide a notice board for developers to inform
each other of current news. By placing all this information in the
same table it is possible to correlate and cross check how different
types of changes effected each other and what was going on with the
whole project at different times in the day. The rows of the table
represent time with the most current events at the top of the page.
There are different sets of columns for each database which needs to
be displayed. The sets of columns are managed by independent modules.
There is one module for each version control system and each bug
tracking system which tinderbox knows how to interface with. It is
easy to port the system to new databases by just adding a new module
using the same style as the existing modules. Modules never share or
peek at each others data all combining of data is done by the humans
who stare at the table and interpret what is going on. The main
tinderbox system does not know how many columns the final table will
have. It only knows about a list of table modules. Each module in the
list is called in turn to generate the complete row then the entire
row is displayed. The user must configure tinderbox with the list of
modules which are of important to their own environment. There is no
restriction on the number of modules which may be configured, though
due to implementation details each module can only appear once in the
table. There are many pop up windows embedded in the status table
these will provide extra level of detail when a mouse is placed over
the link. By moving your mouse around the page you may effectively
drill down into an item of interest and learn more about it without
leaving the page. Most of the links will click through to the
appropriate database. Thus if you need more data about an item you
can click on the link and query the database directly.
Besides the status table there is one other feature of the status
page. The page displays some information which is not correlated
through time and with other data. This information is called status
table headers. The main headers are the message of the day (MOTD),
and the Tree State though there are a few others headers of mainly
historical interest. The important issue with the headers is that
they are not optional. Tinderbox can render a table with as little or
as many columns in the status table as you wish but each of the
headers has a particular place on the status page and needs to be
rendered in a particular way (font size, font type, etc) thus the
tinderbox server must know where each header must go and how to
specify the appropriate html context for this header. Users may set
null defaults for headers that they do not need but it is much harder
for a user to add new headers to the code in a modular fashion.

Просмотреть файл

@ -0,0 +1,277 @@
Preparations you will need to make and
policies you will need to set:
-----------------------------------
To install tinderbox you will need some information about your
existing computer systems and some idea about what your goals are.
Here is a list of questions to help get you started, some of these
ideas may not be apropriate for your environment.
The webserver will serve the tinderbox pages.
Webserver configuration is a bit of an art and you will need to
understand the policies which are used to administer your webserver.
*) You will need to decide the directory where tinderbox should write
the static HTML pages. This will depend on how your webserver is
configured. The default location is based on the RedHat 7.1
(apache-1.3.19-5) installation and is: /var/www/html/tinderbox2. You
will also need to know what the URL browsers will need to use to find
this directory. Since tinderbox generates static web pages, it is
possible to run tinderbox and not run a web server. One way this
could be done is if you have a network file system and all users have
browsers which can read from the HTML directories. In this case all
URL's should begin with "file:/" instead of the usual "http://".
*) Project level administration is done via cgi scripts. These
scripts allow administrators to set the message of the day, and the
state of the tree (open, closed, restricted). Also all users can post
notices to the web pages via a cgi script. CGI programs are often
restricted to a portion of the file system which is disjoint from the
HTML files. You will need to figure out where the CGI programs will
go. Tinderbox takes its defaults from RedHat 7.1 and uses:
/var/www/cgi-bin/tinderbox2. You will also need to know what the URL
browsers will need to use to find this directory.
*) CGI scripts will run as an unauthenticated user on your system.
You will need to decide which user will run the tinderbox CGI scripts.
The same user id must be used for running the scripts as for tinderbox
mail delivery. The Tinderbox Configuration files will define this
user id and as a security precaution check that it is running as the
required id. It is suggested that this id not be a privileged id
(higher ids are better, please make this number be grater then 10 and
bigger then 100 is recommended). Smaller ids are often assumed to
have more privileges on a Unix box then larger ids. It is not a good
idea for an unauthenticated user to have any privileges so a large id
is recommended. It is also recommended that you not use the id 'nobody'
as this id is over used and it would be better to partition the
unauthenticated user into separate ids in case of security problems.
RedHat runs all its CGI scripts as the user 'apache', this is an
acceptable user. I would prefer to have a separate user to run the
tinderbox CGI scripts but this would require recompiling apache to
enable suEXEC, and it is more effort then most groups can afford.
*) Tinderbox Files. There are other tinderbox files which need to be
placed on the webserver. These include libraries and non-cgi
programs. You will need to decide where to place these files. Most
users put them in /home/tinderbox2.
*) Tinderbox Data. Tinderbox stores its data in the file system. For
security it is often a good idea to keep this data out of the HTML and
CGI directories so that malicious users can not directly access this
data. The compressed build logs can grow quite large, so it is
recommended to put the data on a file system with room. The default
is to put them in the directory /home/tinderbox2/data.
Mail
----
*) Many of the tinderbox modules (Bug Ticket, Build, CVS) receive
their data via mail. The mail system on you web server machine must
be configured to deliver the mail into the tinderbox mail processing
programs. You should spend some time understanding how your mail
delivery system can be configured to allow user mail to be delivered
into a program and how to set the user id under which this delivery
occurs. If you do not wish to configure your mail delivery program
then you can use fetchmail to pull the mail out of a mail box and push
it into the programs on a periodic basis. See the install page for
details on what I have learned about mailing systems.
Production Version Control
-------------------------
One of the biggest responsibilities which a "buildmaster" has is the
requirement that all code should be reproducible. That is that at
any point in the future, even more than one year later, the current
binaries should be able to be rebuilt byte for byte from sources.
This requirement can be broken down as follows:
1) The build machine must be reproducible.
We must be able to get back the same build machine we had at any point
in the past. This means that all OS libraries, all header files, all
compilers, all build tools (make, grep, sed) must have some mechanism
to roll back. It is common to use a backup of the build machine to
reconstruct it. Most OS will give you a list of the software packages
which are installed on the machine and their version numbers. I like
to keep the list of software packages which are installed on the
machine checked into version control. This allows me to compare the
state of the build machine at any two points in time. I have tools to
recreate the build-machine from just a list of packages with version
numbers. It is considered a best practice to limit the amount of
software which is available on the build machine. A build machine
with too much installed will only make it difficult to reproduce older
builds should the need arise. I recommend not installing any
web servers or graphical window managers on your build machine. It
should be clear that the build machine should not be the same machine
where the tinderbox server runs.
2) The build process must be reproducible. That is all the steps
which are used to create the application must be reproducible.
*) Build Interface: We must be able to run exactly the same build
process in the future including: all commands with command line
arguments, all environmental variables. I recommend that the entire
build process be viewed as something outside of the build master
control. Developers are responsible for ensuring that there is a
simple build master interface to construct all the software products
which go into a build. Typically there is a makefile in a standard
place where the buildmaster can run something like "make all; make
install;" and be guaranteed that this will build the product. The
build interface should be viewed as something which never changes and
are part of the build machine, like the OS and are changed only
rarely. It is hard enough to track all the parts of the build process
which we expect to change, we should not need to track complex build
procedures. The build procedures should have a standard interface.
By keeping the build instructions in one makefile which is checked
into the same version control system as the sources it is easy to
recreate any previous build even if the commands used to build the
software fluctuate rapidly between releases. There must be a simple
interface to construct the software which will hide all the complexity
of the actual construction.
*) Build Environment: The makefile will code all the build commands
and all the environmental variables (PATH, UMASK, LD_LIBRARY_PATH,
CLASSPATH) needed to build the software though it may rely on some
well defined command line arguments (PREFIX, CCFLAGS, JAVA_LIBS) to
make these prematurely. These command line arguments should not
change between versions of the software but should be a fixed set of
build parameters. The parameters may be needed to specify where some
files are found on the build machine (Ideally the build machine is set
up the same as developers machines so these directories can be
hard-coded into the makefiles but often there is a need for some
directories to be specified at build time) or where files are to be
created/installed on the build machine (typically a subdirectory of
/var/tmp but there may be several builds running at once and each will
need a different directory) or what kind of build is being created.
Each part of the build which needs a particular environmental variable
set or a special header file in some path should have tests which
ensure that the build environment is valid. I keep my build scripts
installed on the build machine and they are always started by running
/etc/rc.d/init.d/build start this ensures that I am not relying on any
build environmental variables which are set by logging into the build
account and are thus not tracked and versioned.
*) Environmental safety issues:
If the build environment can not be used to build the software then a
human readable error message should be generated. My makefiles often
run various checks on the environmental variables before they
construct the code. They check that all required environmental
variables are set, that the required libraries are found, that
directories which must be disjoint (build and install directories) do
not overlap. This test suite becomes a build regression test and as I
discover additional possible build problems I add new tests to the
makefile. I make it a habit to explicit set all environmental
variables so that there is no doubt as to their expected values. It
is important for the QA group to only use Builds which were created by
an automated process so that we are sure that there are no
undocumented steps in either the test builds or the released build.
3) Track the Build numbers. Given a clean install of your product you
should have all the information necessary to reproduce the executable
from sources. If a customer shows you the application binaries you
must be able to get the source code which build the application,
reconstruct the build machine which created the application and
possibly rerun the build exactly the same way as the application was
created before, this may include making some minor source code changes
before the build is run. I like to keep a file which contains:
The product release name
The sources 'as of date'. (I always checkout my sources using
cvs -D 'date time' so that exactly the same sources
can be recovered knowing only the 'data time' which
was used to check them out. I am sure a similar trick
could be used with a perforce 'change set number'.)
The branch name.
The module name.
This can be stored as a file in the product (encrypted if necessary)
or may be stored in some secure build master database where the data
can be looked up by release name. My preference is to keep all data
necessary to reproduce a build in the build output and delivered as
part of the product. This means that I can generate as many builds as
I want automatically and not need to keep track of any of them. When
the QA team deems that a certain build is 'important', by making a
particular build the official released copy then I can take a look at
its contents and tag/branch the code at the sources which I used to
build it.
4) Build Prefix: It is a good idea to familiarize yourself with the
makefile conventions regarding the make variable PREFIX. It is
easiest to understand if you think about what RedHat does when they
build their distribution of RPM's but this will apply in many
different systems including the Andrew File System (AFS) and most
packaging systems. This variable is used during the build process
"make all PREFIX=/home/apache" to tell the package where it will be
installed (examples include /usr, /usr/local, /home/apache). I
suggest reading a few RedHat Spec files to see how this works in
practice. The application may need to hard-code this value into its
object code. When the application is installed it must not be
installed into its proper place on the build machine. The package we
are constructing could cause the build machine to stop working
correctly if it is a buggy version of a system library or major OS
application. Instead the makefile will install "make install
PREFIX=/var/tmp/build-root/home/apache" the package into some other
directory with a similar tree structure to its final destination. The
packaging system will then move the files into the correct place
during an installation step on the target machine. The installation
step only moves files and sets permissions. The makefile is not
supposed to use the installation directories to hard code values into
the application since the application will never be run from this
installation directory. The hard part of the build including any
PREFIX magic is in the build section. Notice the clear separation
between build machine / target machine and installation on the build
machine and installation on the target machine and construction of the
application binaries and installation of the application binaries.
This is one of the reasons why building an application on a build
machine is different from the way in which developers build their code
on their personal development machines. This PREFIX issue will arise
when you try and build the Tinderbox system and also when you
construct the makefiles for your own application. Since the build
machine is not the target machine it can not be assumed that files
will always be in the same places on both (for example perl).
5) Application Architecture:
*) The build process should mimic the architecture of the code. It should
be a final test that the code was coded to the same specifications
that it was designed. It is a common problem for code to turn into
spaghetti with each piece of code using functions and creating
dependencies on every other piece of code. For example it is probably
a mistake for code in the database abstraction layer to be implemented
in terms of code in the HTML generation layer. These two libraries
should probably be independent of each other, though they both might
depend on a common string library. The code architecture should limit
the dependency graph between code modules. The BuildMaster must
enforce the restrictions on information flow between components. Thus
no libraries should be in the path unless the architecture allows this
module to depend on those libraries.
*) The architecture must not have circular dependencies. Circular
dependencies not only make upgrading individual libraries difficult
but also make testing components nearly impossible. That is it should
be possible to build some set of libraries L0 which depend on no
libraries and then build some other set of libraries L1 which depend
only on L0 libraries then build L2 which depend only on the L0 and L1
libraries. This "build chain" will prevent circular dependencies and
help keep your code testable and the dependencies understandable.
More information about why this is a good practice is available in
"Large-Scale C++ Software Design" (Addison-Wesley Professional
Computing Series) by John Lakos
*) I enforce the convention that developers are not allowed to overload
standard system libraries. I always put standard libraries in the
path before any library our company develops. I build the application
in stages to ensure that parts of the application which are not
intended to depend on other code will not have other header files on
the build machine at the time that they are constructed. Build
dependencies between modules which are expected are explicitly
controlled with build scripts and version numbers.