mirror of
https://github.com/GnoConsortium/gno.git
synced 2025-01-06 19:30:34 +00:00
8590ffb646
sed (which uses these routines) work correctly. See the file GNO.notes for more details.
72 lines
2.5 KiB
Plaintext
72 lines
2.5 KiB
Plaintext
Notes on the porting of regex to GNO
|
|
|
|
Dave Tribby * November 7, 1997
|
|
|
|
|
|
Devin Reade did the initial conversion of the BSD sources to compile
|
|
under GNO 2.0.6 headers with ORCA/C on the Apple IIGS.
|
|
|
|
I completed the porting to the extent that the program sed (which
|
|
uses regex) works for many different test cases.
|
|
|
|
The most time-comsuming aspect of the port was finding all the places
|
|
where long integers should be used instead of int or unsigned int.
|
|
|
|
|
|
DEBUGGING
|
|
|
|
The file engine.c is used in an interesting way by regexec.c:
|
|
it includes engine.c *twice*, after muchos fiddling with the
|
|
macros that code uses. This lets the same code operate on two
|
|
different representations for state sets.
|
|
|
|
When it was necessary to do source-level debugging, Splat! got
|
|
very confused about line numbers until I added the following at
|
|
the beginning of the file:
|
|
#ifdef __ORCAC__
|
|
#line 2 "engine.c"
|
|
#endif
|
|
|
|
The regex code had an compilation macro, REDEBUG, that would
|
|
turn on diagnostic messages when defined. I added additional
|
|
output statements that are turned on by REDEBUG.
|
|
|
|
|
|
PERFORMANCE ENHANCEMENT
|
|
|
|
The program sed took a long time to execute a compilcated sed
|
|
program (provided with the BSD source code) that evaluates
|
|
expressions. I attempted to speed up the regex routines by
|
|
recoding two routines in assembly language (using the asm {}
|
|
construct available in ORCA/C). The two routines are
|
|
isinsets() and samesets() in regcomp.c. The C code was left
|
|
in place and can be turned on by compiling with the macro
|
|
__NOASM__ set.
|
|
|
|
I also noticed an instance in regcomp.c where many fields of
|
|
a structure were individually set to 0 at initialization. I
|
|
recoded to set all the fields to 0 by a call to memset() and
|
|
then set only the non-zero fields individually.
|
|
|
|
The following script was used to time the results:
|
|
|
|
cd /src/gno/usr.bin/sed
|
|
echo '(4+4)*3' | time ./sed -f tests/math.sed
|
|
|
|
The unmodified code took 51 seconds to run on my 8MHz Apple IIGS.
|
|
It took 49 seconds after I recoded isinsets() in asm, and 48 seconds
|
|
after I recoded samesets(). Initializing fields via memset() brought
|
|
it down to 47 seconds.
|
|
|
|
I saw no other obvious candidates for recoding. The modest results
|
|
from these changes (even though they were made in routines that
|
|
ranked high in the profile) did not warrent further efforts.
|
|
|
|
|
|
BUILDING
|
|
|
|
Because I did not rebuild all of libc, I created a library called
|
|
regex that only includes the regex routines. The commands I used
|
|
to build regex are included in the file make.cmds. I will leave it
|
|
to Devin to incorporate regex into the full libc build structure.
|