checkpkg

Usage

About

checkpkg is a tool which checks Solaris packages for problems.

It analyzes files contained within package files. It's related to GAR, but it can be also used with any packages.

checkpkg started off as a monolithic Korn shell script. After it reached certain size, people started loathing to update or maintain it. In January 2010, checkpkg has gained a new modular architecture, making it possible to develop isolated checks. In December 2010, a shared database was introduced, allowing to examine each package in the context of the whole OpenCSW catalog.

The general approach is to find as many issues as possible, running the risk that some might be false positives. When this happens, overrides can be used to silence the errors and make checkpkg pass. With time, checkpkg modules get smarter, and there is less and less of a need to use overrides.

The cache database

When examining packages, checkpkg is referring to a database which caches information about system files (such as libc.so.1) and OpenCSW catalogs. On the buildfarm, it's a shared MySQL database, while outside the buildfarm a sqlite database is automatically created in ~/.checkpkg.

Checkpkg stores complete package information in pickled form, in a single table. (This effectively uses a relational database as a key-value store.) Some information is also represented in other tables, but they don't contain much detail. The idea is that some operations require checking the contents of the whole catalog, and you don't want to unpickle 2500 objects on every checkpkg run. Instead, we rely on database indexes.

Buildfarm

There's a shared database on the mysql host. It is periodically refreshed from the mirror with the following command:

bin/pkgdb sync-catalogs-from-tree current /home/mirror/opencsw/current

Note: Please only refresh the common catalog, and don't refresh 'unstable', 'testing' or 'stable'.

Outside the buildfarm

The first time you run checkpkg after the /var/sadm/install/contents file has been modified, the cache database will be purged and repopulated. This can take a long time and requires an exclusive lock on the database. If at the same time another of instance of checkpkg is running on the same host, it might fail with:

sqlite3.OperationalError: database is locked

Wait until the database has been updated and run checkpkg again.

You can inspect the database by running:

sqlite3 ~/.checkpkg/<file-name>

When running checkpkg outside the buildfarm, you need to either run it on a host named e.g. "current10s" or "current10x", or set CATALOG_RELEASE to 'current' in ~/.garrc.

Bootstrapping a shared database

It can be any database which is supported by sqlobject. MySQL and sqlite have been tested. If MySQL is used, the database has to be created, and a user which has access to that database must be created.

The database access configuration is held in ~/.checkpkg/checkpkg.ini or, in the shared config scenario, in /etc/opt/csw/checkpkg.ini. If there is no configuration, checkpkg automatically creates a configuration file in the ~/.checkpkg directory. The format is as follows:

[database]

type = mysql
name = checkpkg
host = mysql
user = checkpkg
password = yourpassword

Initialize your database.

bin/pkgdb initdb

You'll also need a mirror of the OpenCSW catalog (e.g. in /home/mirror/opencsw). It will take time to index it, be prepared and patient. Please note that you might need to change 'SunOS5.10' and 'sparc' to match your file.

bin/pkgdb system-files-to-file
bin/pkgdb import-system-file install-contents-SunOS5.10-sparc.pickle

Import your OpenCSW catalog.

bin/pkgdb sync-catalogs-from-tree current /home/mirror/opencsw/current

Importing the whole catalog takes time, and depending on the speed of your machine, it can take anything from half an hour to many hours. The good news is that you only need to import each package once, and once catalog updates come in, pkgdb only imports the new packages and the process is much faster.

You will need to perform this operation each time the OpenCSW catalog is updated. Otherwise your packages will be checked against an old state of the catalog.

Known issue: There seems to be a problem with indexing i386 files on a sparc machine. If you see the “hachoir_core.field.field.MissingField: Can't get field "header" from /” error, restart the process on a i386 box.

As a workaround, you can also import an individual catalog on a i386 machine:

bin/pkgdb sync-cat-from-file SunOS5.10 i386 current /home/mirror/opencsw/current/i386/5.10/catalog

Known issue: Libmagic stops functioning after indexing about half of the sparc catalog. You have to restart the process; the import procedure will pick up where it left. The root cause is unknown, it could be a bug in libmagic or libmagic Python bindings.

Your database is now ready!

Overrides

If one of the module detects an error, you will see a message similar to this one:

# Tags reported by license presence module
CSWloosefilesexa: license-missing

The CSWloosefilesexa: license-missing bit is an error tag which denotes the kind of error that was detected, together with parameters, such as offending file name. Most of the time, it'll be a genuine problem with the package, but sometime, it'll be a false positive. When this happens, you need to suppress the error by creating an override.

In GAR

To do that, you can add the following line to your GAR Makefile:

CHECKPKG_OVERRIDES_CSWfoo = license-missing

If you want your override to be more specific, you can add a parameter:

CHECKPKG_OVERRIDES_CSWfoo = symbol-not-found|bar.so

This parameter will only override the symbol-not-found error if the parameter (bar.so) matches as well.

After you update override declarations in the Makefile, you need to issue gmake remerge repackage or gmake platforms-remerge platforms-repackage for your changes to take effect.

Overrides under the hood

This part describes how overrides are handled on the lower level. The override file is of the "i" type, so the prototype needs to contain a line like the following:

i checkpkg_override=checkpkg_override.<pkgname>

Inside this file, you can place a list of overrides, which can suppress the problematic tag. The format of the overrides file is the following:

# a comment
[<pkgname>: ]<checkpkg-tag>[ <checkpkg-info>]

In case of the above error, the line would be:

CSWloosefilesexa: license-missing

It could also be simply:

license-missing

Make sure the file is present in the package (you might need to add it to PKGFILES) and build your package again. The override will kick in and silence the error. If you're building multiple package from a single GAR Makefile, make sure that you put the right files into the right packages.

Errors about package dependencies

  • Missing dependencies are no longer suggestions, they are errors.
  • Any SUNW or *SUNW packages are never reported as missing dependencies.

See the following change for an example how to use overrides (without GAR integration):

Gotchas

  • The overrides file has to be inside the actual package. Putting it inside pkgroot is not enough. Verify that the file is present in the prototype, for instance work/solaris8-sparc/build-global/CSWkrb5lib.prototype-sparc

Development

Design Overview

Many checkpkg features are based on lintian1 (kudos to lintian guys for sharing the design description on the web).

Code location

http://gar.svn.sourceforge.net/svnroot/gar/csw/mgar/gar/v2/bin/

Overrides

Error tag file format

# a comment
<pkgname>: <lintian-tag> [<lintian-info>]

Overrides file format

Based on lintian overrides2.

# a comment
[<package>[ <type>]: ]<lintian-tag>[ [*]<lintian-info>[*]]

Overrides file location

The overrides file needs to be listed in pkgmap as an "i" entry and named checkpkg_override:

1 i checkpkg_override 105 9880 1285368604

Stats collection

Statistics collection is separated from the analysis. After the package is unpacked, information from the package is extracted and saved. The separation of stats collection and analysis has significant advantages:

  • It's possible to re-run checkpkg without unpacking the srv4 file, which speeds up the execution
  • It's easier to write unit tests for checkpkg modules, since it's easy to provide testing data
  • The stats data structures can be easily inspected manually, so it's possible to find out whether the problem is in stats collection or the later analysis
  • It's easier to separate the original source data from any derived or optimized data

The stats data are saved in Python pickle format3.

Configuration issues

max_allowed_packet in MySQL

Statistics are inserted into a database, either MySQL or sqlite inside ~/.checkpkg.

Since checkpkg stores pickled objects, it sometimes stores values bigger than 1MB. For this to work with MySQL, the following needs to be present in /opt/csw/mysql5/my.cnf:

[mysqld]
   max_allowed_packet=32M

There are packages which, when pickled, are larger than 16MB.

Case-insensitive string comparison

MySQL documentation says4:

For nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands. For binary strings (BINARY, VARBINARY, BLOB), comparisons use the numeric values of the bytes in the operands; this means that for alphabetic characters, comparisons will be case sensitive.

In SQLObject, the UnicodeCol column type is translated into VARCHAR, which results in case-insensitive comparisons. This makes checkpkg throw file collision errors between files such as "Zcat.1" and "zcat.1". In order to work around this, a case-sensitive collation needs to be used; for example, latin1_bin. Collation setting can be altered for certain columns, as follows:

ALTER TABLE csw_file MODIFY COLUMN path VARCHAR(900) NOT NULL COLLATE latin1_bin;
ALTER TABLE csw_file MODIFY COLUMN basename VARCHAR(255) NOT NULL COLLATE latin1_bin;

Before applying these changes, make sure that you're using the same column settings as the ones in the database.

Known issues

  • Package statistics has now 3 kinds of representation: simple Python data structure, a PackageStats object with functionality, and m.Srv4FileStats used for persistence. It should be reduced to at most 2 kinds of representation.
  • Importing of hachoir_parser takes a long time on the buildfarm. Startup time of checkpkg and pkgdb could be reduced if module dependencies were refactored. Some work towards this was already done, but it's not complete.
  • Sparc package imports have to be done on a sparc host, the same with i386.
  • The file collision detection branch breaks compatibility with builds external to the buildfarm.
  • Relocatable packages: checkpkg does not analyze them correctly - it thinks ther are no binaries in them.

Package statistics inspection tool

Writing tests requires the knowledge of the data structure you traverse during checks.

Here's an example on how to inspect package's data using a helper tool. It shows you how checkpkg sees your package. The below example just prints the data to the screen, but you can also start the tool in an interactive mode to inspect your data.

The usage:

rlwrap gar/bin/checkpkg_inspect_stats.py <pkg1> [<pkg2> [ ... ] ]

An example interactive session:

$ rlwrap bin/checkpkg_inspect_stats.py /home/mirror/opencsw/current/sparc/5.9/*cups*
(...)
>>> import pprint
>>> pprint.pprint(pkgstats)
(...)
# Looking for bad paths
>>> pprint.pprint([x["bad_paths"] for x in pkgstats])
# Displaying the pkgname and the bad paths
>>> pprint.pprint([(x["basic_stats"]["pkgname"], x["bad_paths"]) for x in pkgstats])

If you want to see the statistics, run:

bin/checkpkg_inspect_stats.py -p tree-1.5.3,REV\=2010.07.05-SunOS5.9-sparc-CSW.pkg.gz | less

NOTE: you can find packages in /home/mirror/opencsw/current/sparc/5.9

See also

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License