Buildbot

This is a draft document describing the proposed deployment of an automatic build system at OpenCSW. The issue is being discussed on the maintainers mailing list.

buildbot on the buildfarm

Background

Current project's policy is that all the code for package builds should be stored in a common code repository. At the time of writing, there exists no automatic way of validating the builds. In the case of an error - a missing file, or a wrong file checksum, or a build which fails on some specific Solaris version on one of the architectures - the incorrect code may to stay undetected in the code repository, until another human being tries to build the package and it crashes on them. It's best if such issues are being detected quickly after the code is committed to the repository.

Overview

Buildbot is an automated system which monitors the source code repository and automatically builds all updated packages. When a maintainer commits new code to a package, buildbot will check out the updated code and attempt to build the package. If the build fails, buildbot will send an e-mail to the maintainer or a mailing list.

Buildbot also features an IRC interface. A proof of concept is currently present on #opencsw-build on Freenode. The bot name is 'cswbb'. An example output from the IRC bot:

09:13 <@automaciej> cswbb: notify on successToFailure
09:13 < cswbb> The following events are being notified: ['started', 'failure',
               'successToFailure', 'success']
09:16 < cswbb> build #28 of Solaris-sparc started including ['5762']
09:18 < cswbb> build #28 of Solaris-sparc is complete: Success
               Build details are at
               http://netra.chopin.edu.pl/buildbot/builders/Solaris-10-sparc/builds/28

The complete buildbot documentation can be found at http://djmitche.github.com/buildbot/docs/0.7.11/

Implementation details

Buildbot architecture

Buildbot architecture consists of a build master and build slaves. The master and slaves will run as headless daemons, as non-root users. Slaves will be running as a 'buildbot' user on the build{8,10}{s,x} hosts. Build master, which is serving status via http, will be running on login.opencsw.org or bender.opencsw.org. It's important that the build slaves can connect to the build master via tcp.

The build master has a http server on a high port. The domain http://buildbot.opencsw.org/ will be created and pointed at bender, which will act as a reverse proxy to forward the requests to the build master, potentially running on a different machine.

Common interface to different build systems

Buildbot expects any package to build after issuing a command at the package level directory. What that command should be, is currently open for discussion. The following procedure is expected to work, regardless of the build system:

rm -rf foo
svn co https://gar.svn.sourceforge.net/svnroot/gar/csw/mgar/pkg/foo foo
cd foo
./build

It's important that the 'build' binary from the top package directory returns an error code of the build fails. Otherwise buildbot will have no way of knowing that the build procedure has failed.

./build
(...)
if [[ $? -ne 0 ]]; then echo "The build has failed."; fi

It does not matter where is the resulting package put. Buildbot only cares about the 'build' command succeeding.

There might be situations in which more elaborate procedure will be necessary, for instance when dealing with 64-bit and 32-bit ISAs in a single package. It's open for discussion.

Potential issues

Hardware capacity

Maintainers are expressing concerns over the capacity of the buildfarm: buildbot might make life more difficult for package maintainers, by causing high I/O traffic and system load. Possible solutions or workarounds might be:

  • Providing separate hardware for automated builds
    • Who is going to pay for it?
  • Not compiling large packages such as OpenOffice or gcc
    • Isn't that against the whole idea of automated builds?
  • Compilation on new tags only (as opposed to all code updates)
    • Probably not going to work, as it would require an additional manual step for maintainers solely for this purpose.
  • Allow maintainers to stop a buildbot slave when they're working. A buildbot-stop script would be provided. The script would allow to disable a slave on a given host for a given period of time, let's say an hour to few hours. After this time, buildbot slave would be automatically started again.
  • Schedule buildbot slaves to run at specific periods of time of day.
  • Monitor system load and automatically disable buildbot slaves when the load rises above X.

To make an informed decision, buildfarm capacity estimation is necessary:

  • What is the current load of the buildfarm?
  • Are there any times of day when the buildfarm usually has spare capacity?
  • How many packages / hours a day of spare capacity is there?
  • How large is the impact of packages being built on the buildfarm?
    • What is the load increase?
    • How much longer does it take to build another package in parallel?

Such data could be obtained by running a monitoring system such as munin.

Picking up new packages automatically

There's no clear way to make the buildbot automatically pick up new packages. Currently, the list of branches to build is hard-coded in the build master configuration. There is a need for a procedure of automatic assembly of a package list, based on the source code repository content.

Packages with multiple ISAs

Building a package with multiple ISAs can be tricky, because the package needs to be built on more than one host, sharing the same filesystem, executing specific commands in a specific order.

It could be implemented with the Dependent-Scheduler.

Package dependencies

When building packages with dependencies it's necessary to install the required packages before building the dependent package. It can be implemented by building packages in a chroot environment. pbuilder could be used.

Deployment plan

  • Install munin-master on bender and munin-node on all the build servers, monitor system and I/O load for a week or two
  • Estimate spare capacity
  • Agree on the scheduling plan: when to start/stop the build slaves

When consensus is achieved:

  • Install zope, twisted and buildbot on the buildfarm and the host which will run the build master (login or bender)
  • Create 'buildbot' account on the buildfarm and audit its permissions
  • Allow buildbot maintainers to sudo into buildbot
  • Create the domain buildbot.opencsw.org
  • Configure Apache on bender to reverse-proxy requests to buildbot.opencsw.org to the build master (running on a high port)
  • Configure build master and build slaves
  • Ensure that daemons (master and slaves) will be started after reboots
  • Document the installation on one of the wikis
  • Optional: Set up monitoring for the build master

Security

Since packages are built automatically, write access to the repository allows to effectively execute any code in the buildfarm. That's why it's important to run buildbot as a separate 'buildbot' user and keep an eye on who has access to the Subversion repository.

Deployment

pkgutil -y -i buildbot
# Make sure to have SQLAlchemy 0.7.10 as buildbot won't work with newer versions
pkgrm CSWpy-sqlalchemy
pkgadd -d /home/experimental/buildbot/py_sqlalchemy-0.7.10,REV=2013.10.16-SunOS5.10-sparc-CSW.pkg
pkgadd -d /home/experimental/buildbot/py_sqlalchemy_migrate-0.7.2,REV=2013.10.16-SunOS5.10-all-CSW.pkg
pkgutil -y -i CSWpython27
pkgutil -y -i CSWpy-jinja2 # Added to CSWbuildbot as dependency
pkgutil -y -i CSWpy-tempita # Added to CSWbuildbot as dependency
pkgutil -y -i CSWpy-decorator # Added to CSWbuildbot as dependency
pkgutil -y -i CSWpy-setuptools # Does buildbot really need this?
pkgutil -y -i CSWgit # Good idea
# Slave
pkgutil -y -i buildbot_slave
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License