gno/lib/libc/regex/GNO.notes
gdr 8590ffb646 These are Dave Tribby's changes that were necessary to make
sed (which uses these routines) work correctly.  See the file
GNO.notes for more details.
1997-11-17 04:13:08 +00:00

72 lines
2.5 KiB
Plaintext

Notes on the porting of regex to GNO
Dave Tribby * November 7, 1997
Devin Reade did the initial conversion of the BSD sources to compile
under GNO 2.0.6 headers with ORCA/C on the Apple IIGS.
I completed the porting to the extent that the program sed (which
uses regex) works for many different test cases.
The most time-comsuming aspect of the port was finding all the places
where long integers should be used instead of int or unsigned int.
DEBUGGING
The file engine.c is used in an interesting way by regexec.c:
it includes engine.c *twice*, after muchos fiddling with the
macros that code uses. This lets the same code operate on two
different representations for state sets.
When it was necessary to do source-level debugging, Splat! got
very confused about line numbers until I added the following at
the beginning of the file:
#ifdef __ORCAC__
#line 2 "engine.c"
#endif
The regex code had an compilation macro, REDEBUG, that would
turn on diagnostic messages when defined. I added additional
output statements that are turned on by REDEBUG.
PERFORMANCE ENHANCEMENT
The program sed took a long time to execute a compilcated sed
program (provided with the BSD source code) that evaluates
expressions. I attempted to speed up the regex routines by
recoding two routines in assembly language (using the asm {}
construct available in ORCA/C). The two routines are
isinsets() and samesets() in regcomp.c. The C code was left
in place and can be turned on by compiling with the macro
__NOASM__ set.
I also noticed an instance in regcomp.c where many fields of
a structure were individually set to 0 at initialization. I
recoded to set all the fields to 0 by a call to memset() and
then set only the non-zero fields individually.
The following script was used to time the results:
cd /src/gno/usr.bin/sed
echo '(4+4)*3' | time ./sed -f tests/math.sed
The unmodified code took 51 seconds to run on my 8MHz Apple IIGS.
It took 49 seconds after I recoded isinsets() in asm, and 48 seconds
after I recoded samesets(). Initializing fields via memset() brought
it down to 47 seconds.
I saw no other obvious candidates for recoding. The modest results
from these changes (even though they were made in routines that
ranked high in the profile) did not warrent further efforts.
BUILDING
Because I did not rebuild all of libc, I created a library called
regex that only includes the regex routines. The commands I used
to build regex are included in the file make.cmds. I will leave it
to Devin to incorporate regex into the full libc build structure.