Notes on the porting of regex to GNO Dave Tribby * November 7, 1997 Devin Reade did the initial conversion of the BSD sources to compile under GNO 2.0.6 headers with ORCA/C on the Apple IIGS. I completed the porting to the extent that the program sed (which uses regex) works for many different test cases. The most time-comsuming aspect of the port was finding all the places where long integers should be used instead of int or unsigned int. DEBUGGING The file engine.c is used in an interesting way by regexec.c: it includes engine.c *twice*, after muchos fiddling with the macros that code uses. This lets the same code operate on two different representations for state sets. When it was necessary to do source-level debugging, Splat! got very confused about line numbers until I added the following at the beginning of the file: #ifdef __ORCAC__ #line 2 "engine.c" #endif The regex code had an compilation macro, REDEBUG, that would turn on diagnostic messages when defined. I added additional output statements that are turned on by REDEBUG. PERFORMANCE ENHANCEMENT The program sed took a long time to execute a compilcated sed program (provided with the BSD source code) that evaluates expressions. I attempted to speed up the regex routines by recoding two routines in assembly language (using the asm {} construct available in ORCA/C). The two routines are isinsets() and samesets() in regcomp.c. The C code was left in place and can be turned on by compiling with the macro __NOASM__ set. I also noticed an instance in regcomp.c where many fields of a structure were individually set to 0 at initialization. I recoded to set all the fields to 0 by a call to memset() and then set only the non-zero fields individually. The following script was used to time the results: cd /src/gno/usr.bin/sed echo '(4+4)*3' | time ./sed -f tests/math.sed The unmodified code took 51 seconds to run on my 8MHz Apple IIGS. It took 49 seconds after I recoded isinsets() in asm, and 48 seconds after I recoded samesets(). Initializing fields via memset() brought it down to 47 seconds. I saw no other obvious candidates for recoding. The modest results from these changes (even though they were made in routines that ranked high in the profile) did not warrent further efforts. BUILDING Because I did not rebuild all of libc, I created a library called regex that only includes the regex routines. The commands I used to build regex are included in the file make.cmds. I will leave it to Devin to incorporate regex into the full libc build structure.