next up previous
Next: 3. Accessing and transforming Up: 3. Realizing the Goals Previous: 1. Collecting and maintaining

2. Running benchmark computations

SYMBOLICDATA's Compute environment is set out to realize the following three goals:

  1. To facilitate automated and trusted benchmark computations, that is, benchmark computations whose results w.r.t. time and correctness are repeatable, comparable, and trusted by the community.
  2. To serve as a test-bed for developers, that is, as a tool with which developers of Computer Algebra software can conveniently and reliably evaluate new algorithms and implementation techniques.
  3. To provide a repository of computational results which can be used for further development, like computing invariants of the original example, correctness verifications and timing comparisons of other computations, etc.

In this section, we present the main principles of the realization of these ambitious goals. See [9] for details, further explanations, examples and complete on-line documentation.

Analyzing the general nature of benchmark computations reveals dependencies on the following parameters[*]:

The example which is to be computed, i.e., an sd-record which provides the object of the computation.
The actual computation to be performed, i.e., an sd-record of type COMP which describes the computation and serves as an interface to (Perl) routines, which examine an example for suitability for this computation, and, where applicable, check the syntactical and semantical correctness of the result of the computation.
A configuration of a Computer Algebra software which realizes the computation, i.e., an sd-record of type CASCONFIG which on the one hand, identifies the software, its version, and its implemented benchmark capabilities, and, on the other hand, serves as an interface to (Perl) routines which generate the input file and shell command to run the computation, which check the output of the computation for run-time errors, like out of memory, segmentation violations, syntax errors, and, if necessary, which perform (syntactic) transformations on the result such that it is suitable for further processing independent of the examined Computer Algebra software.
A description of the computer used for the computation. Such an sd-record of type MACHINE can automatically be generated by means of the action symbolicdata ThisMachine and further be used to specify the executables of particular CASCONFIGs.
Dynamic parameters:
This includes specifications of: intervals for the run-time of a computation; which error, resp. verification, checks should be performed on the result; what to do with the output of the computation.

The benchmark computations of SYMBOLICDATA are facilitated by the Perl module Compute and realized using

symbolicdata Compute [options] sd-file(s)
Parameter specifications are given either by command-line options, or, often more suitably, by init-files. A benchmark run consists of the following stages:
  1. Check of correctness and completeness of input parameters.
  2. Set-up of the computation.
  3. Run of the computation.
  4. Evaluation of the computation.

The set-up and evaluation stage require communications between the Compute module and the Perl routines specified by the input COMP and CASCONFIG records. The given input and expected output of these external routines is well-defined and documented. To ease the addition of new computations and systems to the available benchmark computations, as much functionality is provided by first, the Compute module; second, the routines of the COMP record, and, third, by the routines of the CASCONFIG record. For example, the run-time error check specification of a CASCONFIG can be as simple as specifying a regular expression.

Based on the input file and shell command returned by the CASCCONFIG routines, the actual run of the computation itself is fully controlled by the routines of the Compute module. For reliability reasons, timings are measured externally based on the GNU time program. While the actual computation is running, the symbolicdata program ``sleeps'' until either the computation finished, or the maximal (user plus system) time allowed for a computation expired. In the latter case, the running computation is unconditionally interrupted (killed) such that a following evaluation of the computation recognizes a ``maxtime violation''. Furthermore, if a run of the computation took less than a minimal (user plus system) time required, the computation is repeated until the sum of the times of all runs exceeds the bound, and the reported time is then averaged. Notice that the measured computation times include the times a system needs for start-up, input parsing, and output of result. While one could argue that these operations do not really contribute to the time of the actual computations, we did not separate out these timings (at least for the time being) for the following reasons:

The information about a particular benchmark computation is collected into a record of the type COMPREPORT which stores all input parameters and results, i.e., error and verification status, timings, output, etc., of the computation. Where applicable and requested, records of the COMPRESULT table are used to collect system independent, verified, and ``trusted'' results of computations. These COMPRESULT records may be extracted from one or more COMPREPORTs and may be used for further verifications and computations of invariants.

Running automated benchmark computations may quickly produce voluminous amounts of output data[*]. Hence, we need mechanisms which effectively maintain and evaluate this data:

First, note that this is a classical data base application. We are in the process of developing tools to translate benchmark data to SQL and to store them in a classical data base. However, even as data base application, the management of benchmark data is still rather challenging since benchmark data combines records, software, machines, algorithms, implementations, etc. into a high dimensional ``state space'' which needs to be analyzed.

Second, note that only tools to analyze benchmark data are not enough. To effectively compare benchmark runs we need standardized and widely accepted concepts and methods to statistically evaluate this data under various aspects. The EvalComputation module provides a first solution attempt. Since a detailed discussion of the involved aspects would go beyond the scope (and frame) of this paper we refer to for a starting point for further thoughts and discussions.

next up previous
Next: 3. Accessing and transforming Up: 3. Realizing the Goals Previous: 1. Collecting and maintaining
| ZCA Home | Reports |