mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-01-16 14:31:59 +00:00
0df5f9fb7e
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12875 91177308-0d34-0410-b5e6-96231b3b80d8
820 lines
25 KiB
Plaintext
820 lines
25 KiB
Plaintext
.ll 6i
|
|
.nr PO 1.15i
|
|
.nr HM 0i
|
|
.nr FM 1.05i
|
|
.TL
|
|
SPIFF -- A Program for Making Controlled Approximate Comparisons of Files
|
|
.AU
|
|
Daniel Nachbar
|
|
.AI
|
|
Software Engineering Research Group
|
|
Bell Communications Research
|
|
Morristown, New Jersey
|
|
.AB
|
|
The well known program
|
|
.B
|
|
diff
|
|
.R
|
|
[1]
|
|
is inappropriate for some
|
|
common tasks such as comparing the output of floating
|
|
point calculations where roundoff errors
|
|
lead
|
|
.B
|
|
diff
|
|
.R
|
|
astray and comparing program source code
|
|
where some differences in the text (such as white space and comments)
|
|
have no effect on the operation of the compiled code. A new program,
|
|
named
|
|
.B
|
|
spiff,
|
|
.R
|
|
addresses these and other similar cases
|
|
by lexical parsing of the input files and then applying
|
|
a differencing algorithm to the token sequences.
|
|
.B
|
|
Spiff
|
|
.R
|
|
ignores differences
|
|
between floating point numbers that are below a user settable tolerance.
|
|
Other features include user settable commenting and literal string
|
|
conventions and a choice of differencing algorithm.
|
|
There is also an interactive mode wherein the input texts are displayed
|
|
with differences highlighted. The user can change numeric tolerances
|
|
"on the fly" and
|
|
.B
|
|
spiff
|
|
.R
|
|
will adjust the highlighting accordingly.
|
|
.AE
|
|
.SH
|
|
Some Troubles With Diff
|
|
.PP
|
|
Over the past several years, it has been fairly easy to tell when
|
|
a new type of computer arrived at a nearby computer center.
|
|
The best clue was the discordant chorus of
|
|
groaning, sighing, gnashing of teeth, pounding of foreheads on desks,
|
|
and other sounds of distress. Tracing these noises to their source, one
|
|
would find some poor soul in the process of installing
|
|
a numerical analysis package on the new machine.
|
|
.PP
|
|
One might expect that "moving up" to a new machine
|
|
would be a cause for celebration.
|
|
After all, new machines are typically bigger, faster,
|
|
and better than old machines.
|
|
However, the floating point arithmetic on any new machine is frequently
|
|
slightly different from any old machine.
|
|
As a consequence,
|
|
software package test routines produce output that is slightly different,
|
|
but still correct, on the new machines.
|
|
Serious troubles appear when the person installing the software package
|
|
attempts to compare the test output files from two different machines
|
|
by using a difference finding program such as
|
|
.B
|
|
diff.
|
|
.R
|
|
Programs such as
|
|
.B
|
|
diff
|
|
.R
|
|
do a character by character comparison.
|
|
.B
|
|
Diff
|
|
.R
|
|
finds a great many differences, most of which
|
|
are due to roundoff errors in the least significant digits of floating point
|
|
numbers. Others are the result of differences in the way
|
|
in which the two test runs
|
|
had printed a number (3.4e-1 vs. 0.34).
|
|
In one case, the test suite for the S statistical analysis package[2],
|
|
over 1700 floating point numbers are produced
|
|
(per machine). In the eyes of
|
|
.B
|
|
diff,
|
|
.R
|
|
roughly 1200 of these numbers are different.
|
|
However, none of the "differences" are important ones.
|
|
Nonetheless, software installers wind up inspecting the output by eye.
|
|
.PP
|
|
A similar problem arises when one attempts to
|
|
look for differences between two versions
|
|
of the same C program.
|
|
.B
|
|
Diff
|
|
.R
|
|
reports many differences that are not of interest. In
|
|
particular, white space (except inside quotation marks) and
|
|
anything inside a comment have no effect on the operation of the compiled
|
|
program and are usually not of interest.
|
|
.B
|
|
Diff
|
|
.R
|
|
does have a mode of operation where white space
|
|
within a line (spaces and tabs) can be ignored.
|
|
However, differences in the placement of newlines cannot be ignored.
|
|
This is particularly annoying since C programming
|
|
styles differ on whether to place a newline character before or after the '{'
|
|
characters that start blocks.
|
|
.SH
|
|
The Problem in General Terms
|
|
.PP
|
|
As already mentioned, programs such as
|
|
.B
|
|
diff
|
|
.R
|
|
do
|
|
a character-by-character comparison of the input files.
|
|
However, when it comes to interpreting
|
|
the contents of a file (either by a human or by a program)
|
|
it is almost never the case that characters
|
|
are treated individually. Rather, characters make up tokens such
|
|
as words and numbers, or act as separators between these tokens.
|
|
When comparing files, one is usually looking for
|
|
differences between these tokens, not the characters that make them up
|
|
or the characters that separate them.
|
|
.PP
|
|
What is needed is a program that first parses the input files
|
|
into tokens, and then applies a differencing algorithm to the token
|
|
sequences.
|
|
In addition to finding differences in terms of tokens,
|
|
it is possible to interpret the tokens and
|
|
compare different types of tokens in different ways. Numbers, for example,
|
|
can differ by a lot or a little.\**
|
|
.FS
|
|
Current differencing programs do not have such a notion because
|
|
the difference between two characters is a binary function.
|
|
Two characters are the same or they are not.
|
|
.FE
|
|
It is possible to use a tolerance when comparing two number tokens and
|
|
report only those differences that exceed the tolerance.
|
|
.SH
|
|
Design Issues
|
|
.PP
|
|
A serious design issue for such a program is how
|
|
complex to make the parse. The
|
|
.I
|
|
deeper
|
|
.R
|
|
one goes in the parsing the larger
|
|
the unit of text that can be manipulated. For instance, if one is looking
|
|
for differences in C code, a complete parse tree can be produced and
|
|
the differencing algorithm could examine insertion and deletion of entire
|
|
branches of the tree. However, deep parsing requires much more
|
|
complex parsing and slower differencing algorithms.
|
|
.PP
|
|
Another design issue is deciding how to interpret the tokens.
|
|
Closer interpretation may lead to greater flexibility in comparing tokens, but
|
|
also results in a more cumbersome and error-prone implementation.
|
|
.PP
|
|
In the program described here, we attempt to keep both the depth
|
|
of the parse and the semantics of the tokens to a minimum.
|
|
The parse is a simple
|
|
lexical parse with the input files broken up into one dimensional
|
|
sequences of numbers, literal strings and white space.
|
|
Literal strings and white space are not interpreted. Numbers
|
|
are treated as representing points on the real number line.
|
|
.SH
|
|
Default Operation
|
|
.PP
|
|
.B
|
|
Spiff\**
|
|
.R
|
|
.FS
|
|
We picked the name as a way to pay a small tribute to that famous intergalactic
|
|
adventurer Spaceman Spiff[3].
|
|
.B
|
|
Spiff
|
|
.R
|
|
is also a contraction of "spiffy diff".
|
|
.FE
|
|
works very much like
|
|
.B
|
|
diff.
|
|
.R
|
|
It reads two files, looks
|
|
for differences, and prints a listing of the
|
|
differences in the form of
|
|
an edit script.\**
|
|
.FS
|
|
An edit script is a sequence of insertions and deletions
|
|
that will transform the first file into the second.
|
|
.FE
|
|
As already suggested,
|
|
.B
|
|
spiff
|
|
.R
|
|
parses the files into
|
|
literal strings and real numbers.
|
|
The definition of these tokens can be altered somewhat by the user
|
|
(more on this later). For now, suffice it
|
|
to say that literals are strings like "cow", "sit",
|
|
"into", etc. Real numbers look like "1.3", "1.6e-4" and so on.
|
|
All of the common formats for real numbers are recognized.
|
|
The only requirements for a string to be
|
|
treated as a real number is the presence
|
|
of a period and at least one digit.
|
|
By default, a string of digits without a decimal point
|
|
(such as "1988") is not considered to be a real number,
|
|
but rather a literal string.\**
|
|
Each non-alphanumeric character (such as #$@^&*)
|
|
is parsed into a separate literal token.
|
|
.FS
|
|
Integer numbers are often used as indices, labels, and so on.
|
|
Under these circumstances, it is more appropriate to treat them as literals.
|
|
Our choice of default was driven by a design goal
|
|
of having
|
|
.B
|
|
spiff
|
|
.R
|
|
be very conservative
|
|
when choosing to ignore differences.
|
|
.FE
|
|
.PP
|
|
Once
|
|
.B
|
|
spiff
|
|
.R
|
|
determines the two sequences of tokens,
|
|
it compares members of the first sequence with
|
|
members of the second sequence.
|
|
If two tokens are of different types,
|
|
.B
|
|
spiff
|
|
.R
|
|
deems them to be different, regardless of their content.
|
|
If both tokens are literal tokens,
|
|
.B
|
|
spiff
|
|
.R
|
|
will deem them
|
|
to be different if any of their characters differ.
|
|
When comparing two real numbers,
|
|
.B
|
|
spiff
|
|
.R
|
|
will deem them to be different only if
|
|
the difference in their values exceeds a user settable tolerance.
|
|
.SH
|
|
Altering Spiff's Operation
|
|
.PP
|
|
To make
|
|
.B
|
|
spiff
|
|
.R
|
|
more generally useful, the user can control:
|
|
.IP \(bu
|
|
how text strings are parsed into tokens
|
|
.IP \(bu
|
|
how tokens of the same type are compared
|
|
.IP \(bu
|
|
the choice of differencing algorithm used
|
|
.IP \(bu
|
|
and the granularity of edit considered by the differencing algorithm.
|
|
.LP
|
|
.PP
|
|
These features are described next.
|
|
.SH
|
|
Altering the Parse
|
|
.PP
|
|
The operation of the parser can be altered in several ways.
|
|
The user can specify that delimited sections of text are to be ignored
|
|
completely. This is useful for selectively ignoring the contents of
|
|
comments in programs. Similarly, the user can specify that
|
|
delimited sections of text (including white space)
|
|
be treated as a single literal token. So, literal strings in program
|
|
text can be treated appropriately.
|
|
Multiple sets of
|
|
delimiters may be specified at once (to handle cases such as the
|
|
Modula-2 programming language
|
|
where there are two ways to specify quoted strings). At present,
|
|
the delimiters must be fixed string (possibly restricted to the
|
|
beginning of the line) or end of line.
|
|
As a consequence of the mechanism for specifying literal strings,
|
|
multicharacter operators (such as the += operator in C)
|
|
can be parsed into a single token.
|
|
.PP
|
|
As yet, no provision is made for allowing delimiter
|
|
specification in terms of regular expressions. This omission
|
|
was made for the sake of simplifying the parser.
|
|
Nothing prevents the addition of regular expressions in the
|
|
future. However, the simple mechanism
|
|
already in place handles the literal string and commenting conventions
|
|
for most well known programming languages.\**
|
|
.FS
|
|
See the manual page in the appendix for examples of handling
|
|
C, Bourne Shell, Fortran, Lisp, Pascal, and Modula-2. The only
|
|
cases that are known not to work are comments in BASIC and
|
|
Hollerith strings in Fortran.
|
|
.FE
|
|
.PP
|
|
In addition to controlling literal string and comments, the user
|
|
may also specify whether to treat white space characters as any other
|
|
non-alphanumeric character (in other words, parse each white space
|
|
character into its own literal token),
|
|
whether to parse sign markers as part
|
|
of the number that they precede or as separate tokens, whether
|
|
to treat numbers without printed decimal markers (e.g. "1988")
|
|
as real numbers rather than as literal strings, and whether
|
|
to parse real numbers into literal tokens.
|
|
.SH
|
|
Altering the Comparison of Individual Tokens
|
|
.PP
|
|
As mentioned earlier, the user can set a tolerance below which differences
|
|
between real numbers are ignored.
|
|
.B
|
|
Spiff
|
|
.R
|
|
allows two kinds of tolerances:
|
|
absolute and relative.
|
|
Specifying an absolute tolerance will cause
|
|
.B
|
|
spiff
|
|
.R
|
|
to ignore differences
|
|
that are less than the specified value.
|
|
For instance, specifying an absolute tolerance of 0.01 will
|
|
cause only those differences greater than or equal to 0.01 to be reported.
|
|
Specifying a relative tolerance will cause
|
|
.B
|
|
spiff
|
|
.R
|
|
to ignore differences that are
|
|
smaller than some fraction of the number of larger magnitude.
|
|
Specifically, the value of the tolerance is interpreted
|
|
as a fraction of the larger (in absolute terms)
|
|
of the two floating point numbers being compared.
|
|
For example,
|
|
specifying a relative tolerance of 0.1
|
|
will cause the two floating point numbers 1.0 and 0.91 to be deemed within
|
|
tolerance. The numbers 1.0 and 0.9 will be outside the tolerance.
|
|
Absolute and relative tolerances can be OR'ed together. In fact,
|
|
the most effective way to ignore differences that are due to roundoff errors
|
|
in floating point calculations is to use both
|
|
a relative tolerance (to handle limits in precision) as well as an absolute
|
|
tolerance (to handle cases when one number is zero and the other number is
|
|
almost zero).\**
|
|
.FS
|
|
All numbers differ from zero by 100% of their magnitude. Thus, to handle
|
|
numbers that are near zero, one would have to specify a relative tolerance
|
|
of 100% which would be unreasonably large when both numbers are non-zero.
|
|
.FE
|
|
In addition, the user can specify an infinite tolerance. This is useful
|
|
for checking the format of output while ignoring the actual numbers
|
|
produced.
|
|
.SH
|
|
Altering the Differencing Algorithm
|
|
.PP
|
|
By default,
|
|
.B
|
|
spiff
|
|
.R
|
|
produces a minimal edit sequence (using the Miller/Myers differencing algorithm[4])
|
|
that will convert the first file into the second.
|
|
However, a minimal edit sequences is not always desirable.
|
|
For example, for the following two tables of numbers:
|
|
.DS
|
|
0.1 0.2 0.3 0.2 0.3 0.4
|
|
0.4 0.5 0.6 0.5 0.6 0.7
|
|
.DE
|
|
a minimal edit sequence to convert the table on
|
|
the left into the table on the right be to
|
|
would delete the first number (0.1) and insert 0.7 at the end.\**
|
|
.FS
|
|
The problem of having the elements of tables become misaligned when
|
|
the differencing algorithm is trying
|
|
to find a minimal number of edits can be reduced somewhat
|
|
by retaining newlines and not using tolerances.
|
|
Unfortunately, it does not go away.
|
|
.FE
|
|
Such a result, while logically correct, does not provide a good picture
|
|
of the differences between the two files.
|
|
In general, for text with a very definite structure (such as tables),
|
|
we may not want to consider insertions and deletions at all, but
|
|
only one-to-one changes.\**
|
|
.FS
|
|
A "change" can be expressed as one deletion and one insertion at the same
|
|
point in the text.
|
|
.FE
|
|
So, rather than look for a minimal edit script, we
|
|
merely want to compare each token in the first file with
|
|
the corresponding token in the second file.
|
|
.PP
|
|
The user can choose which differencing algorithm to use
|
|
(the default Miller/Myers or
|
|
the alternative one-to-one comparison)
|
|
based upon what is known about the input files. In general,
|
|
files produced mechanically
|
|
(such the output from test suites) have a very regular structure
|
|
and the one-to-one comparison works surprisingly well.
|
|
For files created by humans, the Miller/Myers
|
|
algorithm is more appropriate.
|
|
There is nothing in
|
|
.B
|
|
spiff's
|
|
.R
|
|
internal design that limits
|
|
the number of differencing algorithms that it can run.
|
|
Other differencing algorithms,
|
|
in particular the one used in
|
|
.B
|
|
diff,
|
|
.R
|
|
will probably be added later.
|
|
.SH
|
|
Altering the Granularity of the Edit Sequence
|
|
.PP
|
|
By default,
|
|
.B
|
|
spiff
|
|
.R
|
|
produces an edit sequence
|
|
in terms of insertions and deletions of individual tokens.
|
|
At times it may be more useful to
|
|
treat the contents of the files as tokens when looking for differences
|
|
but
|
|
express the edit script in terms of entire lines of the files rather
|
|
than individual tokens.\**
|
|
.FS
|
|
For instance, if one wants to have
|
|
.B
|
|
spiff
|
|
.R
|
|
produce output that can be fed into
|
|
the
|
|
.B
|
|
ed
|
|
.R
|
|
editor.
|
|
.FE
|
|
.B
|
|
Spiff
|
|
.R
|
|
provides a facility for restricting the edits to entire lines.
|
|
.SH
|
|
Treating Parts of the Files Differently
|
|
.PP
|
|
For complex input files, it is important that different parts of the
|
|
file be treated in different ways. In other words, it may be impossible
|
|
to find one set of parsing/differencing rules that work well for the
|
|
entire file.
|
|
.B
|
|
Spiff
|
|
.R
|
|
can differentiate between parts of the input files on two bases:
|
|
within a line and between lines.
|
|
Within a line, a different tolerance can be applied to each real number.
|
|
The tolerances are specified in terms of the ordinal position of the
|
|
numbers on the line (i.e. one tolerance is applied to the first real number
|
|
on each line, a different tolerance is applied to the second number on
|
|
each line, a third tolerance is applied to the third, and so on). If more
|
|
numbers appear on a line than there are tolerances specified, the last
|
|
tolerance is applied to all subsequent numbers on the line (i.e., if the user
|
|
specifies three tolerances, the third is applied to the third, fourth
|
|
fifth, . . . number on each line). This feature is useful for applying
|
|
different tolerances to the different columns of a table of numbers.
|
|
.PP
|
|
Between lines, the user can place "embedded commands" in the input files.
|
|
These commands
|
|
are instructions to parser that can change what tolerances are attached
|
|
to real numbers and the commenting and literal string conventions used by the
|
|
parser. Embedded commands are flagged to the parser
|
|
by starting the line with a user-specified
|
|
escape string. By combining within line and between line differentiation,
|
|
it is possible for the user to specify a different tolerance
|
|
for every single real number in the input files.
|
|
.SH
|
|
Visual Mode
|
|
.PP
|
|
So far,
|
|
.B
|
|
spiff's
|
|
.R
|
|
operation as an intelligent filter has been described.
|
|
.B
|
|
Spiff
|
|
.R
|
|
also has an interactive mode.
|
|
When operating in interactive mode,
|
|
.B
|
|
spiff
|
|
.R
|
|
places corresponding sections of the input files
|
|
side by side on user's screen.\**
|
|
.FS
|
|
Although the current implementation of
|
|
.B
|
|
spiff
|
|
.R
|
|
runs in many environments,
|
|
interactive mode works only under the MGR window manager.[5]
|
|
Other graphics interfaces will probably be added over time.
|
|
.FE
|
|
Tokens are compared using a one-to-one ordinal comparison, and any tokens that
|
|
are found to be different are highlighted in reverse video.
|
|
The user can interactively change the tolerances and
|
|
.B
|
|
spiff
|
|
.R
|
|
will alter the display
|
|
to reflect which real numbers exceed the new tolerances.
|
|
Other commands allow the user to page through the file and exit.
|
|
.SH
|
|
Performance
|
|
.PP
|
|
Two components of
|
|
.B
|
|
spiff,
|
|
.R
|
|
the parser and the differencing algorithm,
|
|
account for most of the execution time. Miller and Myers compare their
|
|
algorithm to the one used in the diff program. To restate their results,
|
|
the Miller/Myers algorithm is faster for files
|
|
that have relatively few differences but much
|
|
slower (quadratic time) for files with a great many differences.
|
|
.PP
|
|
For cases where the files do not differ greatly,
|
|
parsing the input files takes most of the time (around 80% of the total).\**
|
|
.FS
|
|
No effort has yet been made to make the parser run more quickly.
|
|
A faster parser could no doubt be written by generating a special state machine.
|
|
.FE
|
|
The performance of the parser is roughly similar to programs that do a similar
|
|
level of parsing (i.e. programs that must examine each character in the file).
|
|
For files where roughly half of the tokens are real numbers,
|
|
.B
|
|
spiff
|
|
.R
|
|
takes about twice as long to parse the input files
|
|
as an
|
|
.B
|
|
awk
|
|
.R
|
|
program that counts the number of words in a file:\**
|
|
.FS
|
|
For
|
|
.B
|
|
awk,
|
|
.R
|
|
a word is any string separated by white space.
|
|
.FE
|
|
.B
|
|
.DS
|
|
awk '{total += NF}' firstfile secondfile
|
|
.DE
|
|
.R
|
|
.PP
|
|
The time that it takes
|
|
.B
|
|
spiff
|
|
.R
|
|
to parse a file is substantially
|
|
increased if scanning is done for comments
|
|
and delimited literal strings. The precise effect depends upon the length of
|
|
the delimiters, whether they are restricted to appear at beginning of line, and
|
|
the frequency with which literals and comments appear in the input files.
|
|
As an example, adding the 12 literal conventions\**
|
|
.FS
|
|
One literal convention is for C literal strings. The rest enumerate multicharacter
|
|
operators.
|
|
.FE
|
|
and 1 commenting convention
|
|
required for C code roughly doubles the time required to parse input files.\**
|
|
.FS
|
|
So in total, it takes
|
|
.B
|
|
spiff
|
|
.R
|
|
about 4 times longer to parse a C program than it takes
|
|
.B
|
|
awk
|
|
.R
|
|
to count the number of words in the same file.
|
|
.FE
|
|
.PP
|
|
A more complete approach to evaluating
|
|
.B
|
|
spiff's
|
|
.R
|
|
performance must measure the total time that it takes for the user to complete a
|
|
differencing task. For example, consider one of the
|
|
test suites for the S statistical
|
|
analysis package mentioned at the beginning of this paper.
|
|
The output file for each machine is 427 lines long and contains
|
|
1090 floating point numbers. It takes
|
|
.B
|
|
diff
|
|
.R
|
|
approximately 2 seconds on one of our "6 MIPS"\** computers
|
|
.FS
|
|
We will not comment on the usefulness of "MIPS" as a measure
|
|
of computing speed. The numbers provided are only intended to
|
|
give the reader some vague idea of how fast these programs run.
|
|
.FE
|
|
to compare the two files and produce
|
|
an edit script that is 548 lines long containing 1003 "differences"
|
|
in the floating point numbers. It takes the average tester
|
|
5 minutes to print out the edit script and roughly 2 hours to examine
|
|
the output by hand to determine that the machines are, in fact,
|
|
both giving nearly identical answers. The total time needed is
|
|
2 hours 5 minutes and 2 seconds.
|
|
.PP
|
|
In contrast, it takes
|
|
.B
|
|
spiff
|
|
.R
|
|
approximately 6 seconds on one of our "6 MIPS" computers to
|
|
produce an output file that is 4 lines long.\**
|
|
.FS
|
|
The output would be zero length except that the output of the
|
|
.B
|
|
time
|
|
.R
|
|
command is built into the S tests.
|
|
The timing information could easily be ignored using
|
|
.B
|
|
spiff's
|
|
.R
|
|
embedded commands. But, as we shall see, it hardly seems worth the trouble.
|
|
.FE
|
|
It takes the average tester 30 seconds to examine
|
|
.B
|
|
spiff's
|
|
.R
|
|
output. The total for
|
|
.B
|
|
spiff
|
|
.R
|
|
is 36 seconds. Therefore for this case,
|
|
.B
|
|
spiff
|
|
.R
|
|
will get the job done roughly 208.88 times faster than
|
|
.B
|
|
diff.
|
|
.R
|
|
.PP
|
|
In general, it is misleading to compare
|
|
.B
|
|
spiff's
|
|
.R
|
|
speed with that of
|
|
.B
|
|
diff.
|
|
.R
|
|
While both programs are looking for differences between files,
|
|
they operate on very different types of data (tokens vs. bytes).
|
|
An analogous comparison could be made between the speed of an assembler
|
|
and the speed of a C compiler. They are both language translators.
|
|
One runs much faster than the other.
|
|
None the less, most programmers use the slower program
|
|
whenever possible.
|
|
.SH
|
|
Using Spiff For Making Regression Tests Of Software
|
|
.PP
|
|
We envision
|
|
.B
|
|
spiff
|
|
.R
|
|
to be the first of several tools for aiding in the now
|
|
arduous task of making regression tests.\**
|
|
.FS
|
|
In software engineering parlance, a "regression test" is the process by
|
|
which a tester checks to make sure that the new version of a piece of
|
|
software still performs the same way as the older versions
|
|
on overlapping tasks.
|
|
.FE
|
|
Given
|
|
.B
|
|
spiff's
|
|
.R
|
|
current capabilities, the regression test designer can
|
|
take the output of an older version of software and through
|
|
the use of literal string and commenting conventions,
|
|
specify what parts of the output must remain identical and
|
|
what sections can change completely. By specifying tolerances, the test
|
|
designer can take into account how much of a difference in floating
|
|
point calculations is acceptable.
|
|
.PP
|
|
The test designer is also free to
|
|
edit the output from the older version of the software and add embedded
|
|
commands that can instruct
|
|
.B
|
|
spiff
|
|
.R
|
|
to treat various parts of the output
|
|
differently. The newly edited output can then serve as a template for
|
|
the output of later versions of the software.
|
|
.PP
|
|
Obviously, editing output by hand is a very low level mechanism for adding
|
|
specification information. It is our intention that
|
|
.B
|
|
spiff
|
|
.R
|
|
will become
|
|
the last element in a pipeline of programs. Programs (as yet unwritten) located
|
|
earlier in the pipeline
|
|
can implement a higher level representation of the specification information.
|
|
They read in the old and new input files, add the appropriate embedded commands,
|
|
and then pass the results to
|
|
.B
|
|
spiff
|
|
.R
|
|
which will do the actual differencing.
|
|
.SH
|
|
Future Work
|
|
.PP
|
|
There are many features that could be added to
|
|
.B
|
|
spiff
|
|
.R
|
|
(if there are not
|
|
too many already). Some of these include:
|
|
.IP \(bu
|
|
Using separate differencing algorithms on separate sections of the file
|
|
and/or limiting the scope of an edit sequence (fencing)
|
|
.IP \(bu
|
|
Providing a more general mechanism for specifying comments and literals
|
|
(perhaps allowing specification in terms of regular expressions).
|
|
As yet, we have not encountered any important cases where regular expressions
|
|
have been needed. Until such a case is encountered, we will leave regular
|
|
expressions out in the name of simplicity.
|
|
.IP \(bu
|
|
Allowing for a more general specification of what lines should look like.
|
|
At present, the user can only specify tolerances for numbers as a function
|
|
of their ordinal position on a line. The difficulty in expanding the
|
|
specification abilities of
|
|
.B
|
|
spiff
|
|
.R
|
|
is knowing when to stop. In the extreme,
|
|
we might add all of the functionality of a program such as
|
|
.B
|
|
awk.\**
|
|
.R
|
|
.FS
|
|
Imagine handling the case such as
|
|
"apply this tolerance to all numbers that appear
|
|
on a line starting with the word `foo' but only if the number is between 1.9
|
|
and 3.6 and the word `bar' does not appear on the line".
|
|
.FE
|
|
We hope to keep
|
|
.B
|
|
spiff
|
|
.R
|
|
as simple as possible. Our first efforts in
|
|
this direction will try to implement higher level specification functions
|
|
outside of
|
|
.B
|
|
spiff.
|
|
.R
|
|
.SH
|
|
Acknowledgements
|
|
.PP
|
|
First and foremost, we thank Stu Feldman for his endless patience, constant encouragement
|
|
and numerous good ideas. We also extend thanks to Doug McIlroy for bringing the Miller/Myers
|
|
algorithm to our attention, Nat Howard for a key insight
|
|
and for his editorial comments
|
|
and Steve Uhler and Mike Bianchi for their editorial comments.
|
|
.SH
|
|
References
|
|
.IP [1]
|
|
Hunt,J.W. and M.D. McIlroy.
|
|
.I
|
|
An Algorithm For Differential File Comparisons,
|
|
.R
|
|
.B
|
|
Bell Labs Computer Science Technical Report,
|
|
.R
|
|
Number 41, 1975.
|
|
.IP [2]
|
|
Becker,R.A. and J.M. Chambers (1984).
|
|
.B
|
|
S \- An Interactive Environment For Data Analysis And
|
|
Graphics.
|
|
.R
|
|
Belmont, CA: Wadsworth Inc.
|
|
.IP [3]
|
|
Watterson, B. (1987).
|
|
.B
|
|
Calvin and Hobbes.
|
|
.R
|
|
New York: Andrews, McMeel & Parker.
|
|
.IP [4]
|
|
Miller, W. and E.W. Myers.
|
|
.I
|
|
A File Comparison Program,
|
|
.R
|
|
.B
|
|
Software \-
|
|
Practice and Experience
|
|
.R
|
|
15, 11, 1025-1040, 1985.
|
|
.IP [5]
|
|
Uhler, S.A.
|
|
.I
|
|
MGR -- A Window Manager For UNIX,
|
|
.R
|
|
Sun User's Group Meeting. September 1986.
|
|
.LP
|