mirror of
https://github.com/GnoConsortium/gno.git
synced 2025-01-22 18:30:47 +00:00
72 lines
2.5 KiB
Plaintext
72 lines
2.5 KiB
Plaintext
|
Notes on the porting of regex to GNO
|
||
|
|
||
|
Dave Tribby * November 7, 1997
|
||
|
|
||
|
|
||
|
Devin Reade did the initial conversion of the BSD sources to compile
|
||
|
under GNO 2.0.6 headers with ORCA/C on the Apple IIGS.
|
||
|
|
||
|
I completed the porting to the extent that the program sed (which
|
||
|
uses regex) works for many different test cases.
|
||
|
|
||
|
The most time-comsuming aspect of the port was finding all the places
|
||
|
where long integers should be used instead of int or unsigned int.
|
||
|
|
||
|
|
||
|
DEBUGGING
|
||
|
|
||
|
The file engine.c is used in an interesting way by regexec.c:
|
||
|
it includes engine.c *twice*, after muchos fiddling with the
|
||
|
macros that code uses. This lets the same code operate on two
|
||
|
different representations for state sets.
|
||
|
|
||
|
When it was necessary to do source-level debugging, Splat! got
|
||
|
very confused about line numbers until I added the following at
|
||
|
the beginning of the file:
|
||
|
#ifdef __ORCAC__
|
||
|
#line 2 "engine.c"
|
||
|
#endif
|
||
|
|
||
|
The regex code had an compilation macro, REDEBUG, that would
|
||
|
turn on diagnostic messages when defined. I added additional
|
||
|
output statements that are turned on by REDEBUG.
|
||
|
|
||
|
|
||
|
PERFORMANCE ENHANCEMENT
|
||
|
|
||
|
The program sed took a long time to execute a compilcated sed
|
||
|
program (provided with the BSD source code) that evaluates
|
||
|
expressions. I attempted to speed up the regex routines by
|
||
|
recoding two routines in assembly language (using the asm {}
|
||
|
construct available in ORCA/C). The two routines are
|
||
|
isinsets() and samesets() in regcomp.c. The C code was left
|
||
|
in place and can be turned on by compiling with the macro
|
||
|
__NOASM__ set.
|
||
|
|
||
|
I also noticed an instance in regcomp.c where many fields of
|
||
|
a structure were individually set to 0 at initialization. I
|
||
|
recoded to set all the fields to 0 by a call to memset() and
|
||
|
then set only the non-zero fields individually.
|
||
|
|
||
|
The following script was used to time the results:
|
||
|
|
||
|
cd /src/gno/usr.bin/sed
|
||
|
echo '(4+4)*3' | time ./sed -f tests/math.sed
|
||
|
|
||
|
The unmodified code took 51 seconds to run on my 8MHz Apple IIGS.
|
||
|
It took 49 seconds after I recoded isinsets() in asm, and 48 seconds
|
||
|
after I recoded samesets(). Initializing fields via memset() brought
|
||
|
it down to 47 seconds.
|
||
|
|
||
|
I saw no other obvious candidates for recoding. The modest results
|
||
|
from these changes (even though they were made in routines that
|
||
|
ranked high in the profile) did not warrent further efforts.
|
||
|
|
||
|
|
||
|
BUILDING
|
||
|
|
||
|
Because I did not rebuild all of libc, I created a library called
|
||
|
regex that only includes the regex routines. The commands I used
|
||
|
to build regex are included in the file make.cmds. I will leave it
|
||
|
to Devin to incorporate regex into the full libc build structure.
|