This currently disables the support for skipping invisible pixels. It is generally not very important, since raw drawing speed usually is not the bottleneck.
This sets the tcpTUNEIPUSERPOLLCT and tcpTUNEIPRUNQCT tuning parameters to 10 (the maximum) instead of the default of 2. This makes Marinetti process more incoming data at once and significantly increases throughput. (Actually, current versions of Marinetti only seem to use tcpTUNEIPUSERPOLLCT, but we set both for compatibility with any future versions that actually use tcpTUNEIPRUNQCT.)
Specifically, use a union rather than pointer-based type punning to convert the Long value returned by GetContentOrigin() to a Point. This makes for more readable code and should also generate somewhat better assembly code.
This should be faster, as in DoReadTCP, and it simplifies the code a little.
Also, check for returned data size of 0 in DoReadTCP. I don't think this should happen, but in case it did, the returned handle could be invalid.
DoReadTCP automatically disposes the old read buffer handle before doing the read, and then locks the new read buffer handle before returning. This removes the need for most explicit locking/unlocking of the handle. A macro is provided to dispose of the handle earlier (useful in a few cases where it may be big, and another read isn't immediately performed).
Also, set displayInProgress to FALSE in NextRect, rather than in code for each encoding.
This should be faster than the other options (reusing an existing handle or using a pointer) because it avoids an extra copy of the data inside Marinetti. For the other options, Marinetti will create a new handle and copy the data to it anyway, but then copy it again into the requested destination.
The logic for making sure the color table was complete before connecting was slightly broken due to wraparound, so it would cause all entries to be generated, re-generating any that already had been. This fixes it, which should make the initial connection process slightly faster on slow systems.
They are now called only every 16 lines, instead of every line. This is enough to maintain reasonable responsiveness, while providing a measurable speedup.
The buffer from the last read is still left around until the next one, but this should usually be fairly small, and it's the same behavior as the main DoReadTCP routine.
This is only a win if we can use the optimized case a reasonable proportion of the time (~40% or more), but that should be the case for most real screen images.
The equality comparisons are written with XORs because that produces better assembly code.