SYMBOLICDATA's ` Compute` environment is set out to realize the following
three goals:

- To facilitate automated and trusted benchmark computations, that is, benchmark computations whose results w.r.t. time and correctness are repeatable, comparable, and trusted by the community.
- To serve as a test-bed for developers, that is, as a tool with which developers of Computer Algebra software can conveniently and reliably evaluate new algorithms and implementation techniques.
- To provide a repository of computational results which can be used for further development, like computing invariants of the original example, correctness verifications and timing comparisons of other computations, etc.

In this section, we present the main principles of the realization of these ambitious goals. See [9] for details, further explanations, examples and complete on-line documentation.

Analyzing the general nature of benchmark computations reveals
dependencies on the following parameters^{}:

**Example:**- The example which is to be computed, i.e., an sd-record which provides the object of the computation.
**COMP:**- The actual computation to be performed, i.e., an
sd-record of type
`COMP`which describes the computation and serves as an interface to (Perl) routines, which examine an example for suitability for this computation, and, where applicable, check the syntactical and semantical correctness of the result of the computation. **CASCONFIG:**- A configuration of a Computer Algebra software which
realizes the computation, i.e., an sd-record of type
`CASCONFIG`which on the one hand, identifies the software, its version, and its implemented benchmark capabilities, and, on the other hand, serves as an interface to (Perl) routines which generate the input file and shell command to run the computation, which check the output of the computation for run-time errors, like out of memory, segmentation violations, syntax errors, and, if necessary, which perform (syntactic) transformations on the result such that it is suitable for further processing independent of the examined Computer Algebra software. **MACHINE:**- A description of the computer used for the
computation. Such an sd-record of type
`MACHINE`can automatically be generated by means of the action`symbolicdata ThisMachine`and further be used to specify the executables of particular`CASCONFIG`s. **Dynamic parameters:**- This includes specifications of: intervals for the run-time of a computation; which error, resp. verification, checks should be performed on the result; what to do with the output of the computation.

The benchmark computations of SYMBOLICDATA are facilitated by the Perl module
` Compute` and realized using

symbolicdata Compute [options] sd-file(s)Parameter specifications are given either by command-line options, or, often more suitably, by init-files. A benchmark run consists of the following stages:

- Check of correctness and completeness of input parameters.
- Set-up of the computation.
- Run of the computation.
- Evaluation of the computation.

The set-up and evaluation stage require communications between the
` Compute` module and the Perl routines specified by the input `
COMP` and ` CASCONFIG` records. The given input and expected
output of these external routines is well-defined and documented.
To ease the addition of new computations and systems to
the available benchmark computations, as much functionality is
provided by first, the ` Compute` module; second, the routines of
the ` COMP` record, and, third, by the routines of the `
CASCONFIG` record. For example, the run-time error check
specification of a ` CASCONFIG` can be as simple as specifying a
regular expression.

Based on the input file and shell command returned by the `
CASCCONFIG` routines, the actual run of the computation itself is
fully controlled by the routines of the ` Compute` module. For
reliability reasons, timings are measured externally based on the GNU
` time` program. While the actual computation is running, the `
symbolicdata` program ``sleeps'' until either the computation
finished, or the maximal (user plus system) time allowed for a
computation expired. In the latter case, the running computation is
unconditionally interrupted (killed) such that a following evaluation
of the computation recognizes a ``maxtime violation''. Furthermore, if
a run of the computation took less than a minimal (user plus system)
time required, the computation is repeated until the
sum of the times of all runs exceeds the bound, and the reported time
is then averaged. Notice that the measured computation times include
the times a system needs for start-up, input parsing, and output of
result. While one could argue that these operations do not really
contribute to the time of the actual computations, we did not separate
out these timings (at least for the time being) for the following
reasons:

- Mechanisms which isolate the pure computation time and do not rely on a system's internal facilities to measure timings are cumbersome to implement and would very much complicate the control and set-up of benchmark computations.
- Time measurements for computations which are not dominated by the pure computation time are mostly meaningless since start-up is a constant and I/O usually a linear operation w.r.t. the size of the input and output data.

The information about a particular benchmark computation is collected
into a record of the type ` COMPREPORT` which stores all input
parameters and results, i.e., error and verification status, timings,
output, etc., of the computation. Where applicable and requested,
records of the ` COMPRESULT` table are used to collect system
independent, verified, and ``trusted'' results of computations. These
` COMPRESULT` records may be extracted from one or more `
COMPREPORT`s and may be used for further verifications and
computations of invariants.

Running automated benchmark computations may quickly produce
voluminous amounts of output data^{}. Hence, we need
mechanisms which effectively maintain and evaluate this data:

First, note that this is a classical data base application. We are in the process of developing tools to translate benchmark data to SQL and to store them in a classical data base. However, even as data base application, the management of benchmark data is still rather challenging since benchmark data combines records, software, machines, algorithms, implementations, etc. into a high dimensional ``state space'' which needs to be analyzed.

Second, note that only tools to analyze benchmark data are not enough.
To effectively compare benchmark runs we need standardized
and widely accepted concepts and methods to statistically evaluate
this data under
various aspects. The ` EvalComputation` module provides a first
solution attempt. Since a detailed discussion of the involved aspects
would go beyond the scope (and frame) of this paper we refer to
`www.SymbolicData.org/doc/EvalComputations/` for a starting point
for further thoughts and discussions.