checkpkg

To see how to set up a checkpkg database, refer to the buildfarm setup documentation in the OpenCSW manual.

checkpkg overview

About

checkpkg is a tool which checks Solaris packages for problems.

It analyzes files contained within package files. It's related to GAR, but it can be also used with any packages.

checkpkg started off as a monolithic Korn shell script. After it reached certain size, people started loathing to update or maintain it. In January 2010, checkpkg has gained a new modular architecture, making it possible to develop isolated checks. In December 2010, a shared database was introduced, allowing to examine each package in the context of the whole OpenCSW catalog.

The general approach is to find as many issues as possible, running the risk that some might be false positives. When this happens, overrides can be used to silence the errors and make checkpkg pass. With time, checkpkg modules get smarter, and there is less and less of a need to use overrides.

The package database

When examining packages, checkpkg is referring to a database which caches information about system files (such as libc.so.1) and OpenCSW catalogs.

Checkpkg stores complete package information in JSON, in a single table. This effectively uses a relational database as a key-value store. Some information is also represented in a relational way.

Removing a catalog release from the database

These instructions as of November 2012. Be extremely careful when doing this, as you can seriously screw up the package database.

mysqldump --max_allowed_packet=64M [ ... ] checkpkg | pv | gzip > checkpkg-2012-11-29.sql.gz

A couple of tables are linked with the catalog release table. We need to drop these first, and then drop the main entry.

mysql> select name, id from catalog_release where name = 'stable';
+--------+----+
| name   | id |
+--------+----+
| stable |  5 | 
+--------+----+
1 row in set (0.00 sec)

Tables from which to remove rows, are:

  1. checkpkg_error_tag
  2. srv4_file_in_catalog
  3. catalog_release
mysql> delete from checkpkg_error_tag where catrel_id = (select id from catalog_release where name = 'stable');
Query OK, 0 rows affected (0.00 sec)

mysql> delete from srv4_file_in_catalog where catrel_id = (select id from catalog_release where name = 'stable');
Query OK, 19938 rows affected (5.94 sec)

mysql> delete from catalog_release where name = 'stable';
Query OK, 1 row affected (0.01 sec)

Updating package metadata schema

If you need to add a new kind of package information in the database that are going to be used by a new checkpkg check, you need to take some precaution not to break checkpkg on the buildfarm and to minimize the disruption.

As soon as your new checkpkg tests are committed, the new uploads will try to execute them and access the new information, so the database must be first updated before anything.
Fortunately, you can do this online and progressively using the importpkg command.

Two precautions before:

  • create a separate GAR subversion branch to commit your modifications so you can work on it without disrupting the main branch, the local path of this branch will be refered as ~/opencsw/.buildsys/v2-yourbranch in this procedure.
  • ask or create a copy of the checkpkg production database so you can completely test the whole import procedure before applying it in production. You can simply create a ~/.checkpkg/checkpkg.ini file to tell pkgdb that it should use another database (see "Bootstrapping a shared database").

Now let's define a shell function to be more comfortable for the next steps:

update_checkpkg_database () {
  GAR_BRANCH=$1; shift
  CATS="$*"
  PKGDB_OPTIONS="importpkg --force-unpack --replace"
  i386_host=unstable10x
  sparc_host=unstable10s
  mkdir -p ~/importpkg/
  for ARCH in i386 sparc; do
    eval HOST=\$${ARCH}_host
    for CAT in $CATS; do
      for OS in 5.8 5.9 5.10 5.11; do
        for P in $($GAR_BRANCH/bin/pkgdb show cat -a $ARCH -c $CAT -r SunOS$OS | awk '{ print $2 }'); do
          [[ ! -f ~/importpkg/$P ]] || continue
          echo $P
          ssh $HOST "'$GAR_BRANCH/bin/pkgdb' $PKGDB_OPTIONS '/home/mirror/opencsw-official/allpkgs/$P'" || return 1 
          touch ~/importpkg/$P
        done
      done
    done
  done
}

The function remembers that a package was already analyzed by creating an empty file named after the package file in the ~/importpkg/ directory. If you need to re-analyze every package, just wipe out the directory:

rm ~/importpkg/*

Let's make sure your branch got the last modifications from the main branch:

# DO NOT RUN THIS:
# cd ~/opencsw/.buildsys/v2-yourbranch
# svn merge https://gar.svn.sourceforge.net/svnroot/gar/csw/mgar/gar/v2

Because it'll screw up your branch and you will have a hard time merging back. If you want to have the latest changes, follow this instead: http://automatthias.wordpress.com/2013/03/20/merging-from-trunk-to-a-branch/

Now let's run the new stats collection code on all packages referenced in the catalogs. This operation can take a long time.

update_checkpkg_database ~/opencsw/.buildsys/v2-yourbranch dublin unstable kiel bratislava

The database is now updated with the new data. 

You must now increase the schema version in your code and in the database, to make sure no one will accidentally use the old code, and commit your branch to the trunk.
You can get the current schema version by having a look at the DB_SCHEMA_VERSION variable in the database.py file:

awk '$1 == "DB_SCHEMA_VERSION" { print $3 }' "$GAR_BRANCH/lib/python/database.py"

Increase the version number by one in this file in your GAR subversion branch, let's call the new value: NEW_VALUE, and then, immediately update the database so that maintainers will get an error if they try to upload a new package using the old code:
mysql checkpkg -e "update csw_config set int_value = NEW_VALUE where option_key = 'db_schema_version';"

Then commit your code and apply your modifications to the main branch so maintainers will be able to check out the new code:

cd ~/opencsw/.buildsys/v2-yourbranch
svn commit -m "schema version update" $GAR_BRANCH/lib/python/database.py"
cd /path/to/gar/trunk
svn update
svn merge --reintegrate ^/project/branches/your_branch 
svn commit -m "merged new checkpkg tests..."

That's almost done, you just need to run the import a last time in the case a set packages was uploaded during the package import step and before the schema version was updated.

update_checkpkg_database ~/opencsw/.buildsys/v2-yourbranch dublin unstable kiel bratislava

Updating cswutils on login

A copy of the whole checkpkg and csw-upload-pkg code base lives on the login host. You need to rebuild cswutils from the updated sources, wait until the updated package is released to the (unstable) catalog and update the package on login.

That's all ! Don't forget to send a message to maintainers so they know they have to update their gar tree with the new code.

Overrides

If one of the module detects an error, you will see a message similar to this one:

# Tags reported by license presence module
CSWloosefilesexa: license-missing

The CSWloosefilesexa: license-missing bit is an error tag which denotes the kind of error that was detected, together with parameters, such as offending file name. Most of the time, it'll be a genuine problem with the package, but sometime, it'll be a false positive. When this happens, you need to suppress the error by creating an override.

In GAR

To do that, you can add the following line to your GAR Makefile:

CHECKPKG_OVERRIDES_CSWfoo = license-missing

If you want your override to be more specific, you can add a parameter:

CHECKPKG_OVERRIDES_CSWfoo = symbol-not-found|bar.so

This parameter will only override the symbol-not-found error if the parameter (bar.so) matches as well.

After you update override declarations in the Makefile, you need to issue gmake remerge repackage or gmake platforms-remerge platforms-repackage for your changes to take effect.

Overrides under the hood

This part describes how overrides are handled on the lower level. The override file is of the "i" type, so the prototype needs to contain a line like the following:

i checkpkg_override=checkpkg_override.<pkgname>

Inside this file, you can place a list of overrides, which can suppress the problematic tag. The format of the overrides file is the following:

# a comment
[<pkgname>: ]<checkpkg-tag>[ <checkpkg-info>]

In case of the above error, the line would be:

CSWloosefilesexa: license-missing

It could also be simply:

license-missing

Make sure the file is present in the package (you might need to add it to PKGFILES) and build your package again. The override will kick in and silence the error. If you're building multiple package from a single GAR Makefile, make sure that you put the right files into the right packages.

Errors about package dependencies

  • Missing dependencies are no longer suggestions, they are errors.
  • Any SUNW or *SUNW packages are never reported as missing dependencies.

See the following change for an example how to use overrides (without GAR integration):

Gotchas

  • The overrides file has to be inside the actual package. Putting it inside pkgroot is not enough. Verify that the file is present in the prototype, for instance work/solaris8-sparc/build-global/CSWkrb5lib.prototype-sparc

Development

Design Overview

Many checkpkg features are based on lintian1 (kudos to lintian guys for sharing the design description on the web).

Code location

The same as http://gar.opencsw.org code.

Overrides

Error tag file format

# a comment
<pkgname>: <lintian-tag> [<lintian-info>]

Overrides file format

Based on lintian overrides2.

# a comment
[<package>[ <type>]: ]<lintian-tag>[ [*]<lintian-info>[*]]

Overrides file location

The overrides file needs to be listed in pkgmap as an "i" entry and named checkpkg_override:

1 i checkpkg_override 105 9880 1285368604

Stats collection

Statistics collection is separated from the analysis. After the package is unpacked, information from the package is extracted and saved. The separation of stats collection and analysis has significant advantages:

  • It's possible to re-run checkpkg without unpacking the srv4 file, which speeds up the execution
  • It's easier to write unit tests for checkpkg modules, since it's easy to provide testing data
  • The stats data structures can be easily inspected manually, so it's possible to find out whether the problem is in stats collection or the later analysis
  • It's easier to separate the original source data from any derived or optimized data

The stats data are saved in JSON3.

Known issues in the code

  • Package statistics has now 3 kinds of representation: simple Python data structure, a PackageStats object with functionality, and m.Srv4FileStats used for persistence. It should be reduced to at most 2 kinds of representation.

Setting up on your own build host

Prerequisites:

  • A local mirror of OpenCSW package catalog in e.g. /export/mirror/opencsw.
  • Installed packages:
    • CSWmysql5
    • CSWgar-dev
    • CSWmgar
    • CSWpy-cjson
    • CSWpy-dateutil
    • CSWpy-pyelftools
    • CSWpy-webpy
    • CSWpy-paste
    • CSWpy-lockfile

MySQL has to be configured with max_allowed_packet set to 64M.

The code is in ~/opencsw/.buildsys/v2.

Create a checkpkg configuration file in /etc/opt/csw/checkpkg.ini (or ~/.checkpkg/checkpkg_auto.ini).

[database]
type = mysql
name = checkpkg
host =
user = checkpkg
password = <dbpassword>

[rest]
releases = http://localhost:8080
pkgdb = http://localhost:8081/rest

[buildfarm]
opencsw_root = /export/mirror/opencsw

[releases_app]
log_file = /var/tmp/releases.log

[pkgdb_app]
log_file = /var/tmp/pkgdb_web.log

In MySQL, create the database and grant permissions:

CREATE DATABASE checkpkg;
GRANT ALL PRIVILEGES ON checkpkg.* TO checkpkg@localhost IDENTIFIED BY '<dbpassword>';

Start two RESTful apps. They are part of the application, they need to be running. Here's the first one:

cd ~/opencsw/.buildsys/v2
export PYTHONPATH=$(pwd)
cd lib/web
./releases_web.py

Leave the app running. Start the second app, which will be necessary during package checking.

cd ~/opencsw/.buildsys/v2
export PYTHONPATH=$(pwd)
cd lib/web
./pkgdb_web.py 8081

Leave the second app running. Initialize the database and import packages.

cd ~/opencsw/.buildsys/v2
export PYTHONPATH=$(pwd)
bin/pkgdb initdb
bin/pkgdb system-metadata-to-disk
bin/pkgdb import-system-metadata SunOS5.10 i386
bin/pkgdb sync-catalogs-from-tree unstable /export/mirror/opencsw/unstable

If you got an error:

lib.python.rest.RestCommunicationError: URL HEAD 'http://localhost:8080/blob/pkgstats/8e540ee30195ca6f55dc86d3ac1631d1/' HTTP code: 502

then check if you have set environment variable http_proxy and if so run unset http_proxy

If you have more platforms: sparc/intel, Solaris 9/10/11, you need to run system-metadata-to-disk on each of them, and import the metadata from each host. The "system-metadata-to-disk" must be run on each indexed host, but "import-system-metadata" can be run on any host. The first command indexes the local filesystem. The second one only loads data from .marshal files into the database.

Make sure you have plenty of space on the hard disk. The textlive_common package is over 1GB in size (compressed) and requires more than 2GB of additional disk space to be extracted and analyzed.

If you have a cron job updating your mirror, you might want to stop it while running sync-catalogs-from-tree. If you don't, and there are catalog changes on the mirror, catalog indexing might fail and you'll have to restart it. One scenario is that a package is removed from the catalog while pkgdb is running.

See also

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License