This is easier to debug (printing out a lambda doesn't show what values
it checks against) and makes it easier to check that the filter values
are valid.
With the new implementation, each filter is converted to a function,
then all resources are checked if they match any of the filter
functions. This is simpler than the old implementation, where the
resource lookup code was slightly different for some filter forms.
The _references map now stores Resource objects directly, instead of
constructing them only when they are looked up. Resource objects are
now lazy themselves, so the previous lazy resource creation mechanism
is redundant.
_LazyResourceMap is now a simple read-only wrapper around an existing
map. The custom class is now only used to provide a specialized repr.
The reading of resource name and data is now performed in the Resource
class (lazily, when the respective attributes are accessed) instead of
in ResourceFile._LazyResourceMap.
The new syntax supports the same operations as the old syntax, but is
clearer to understand and more extensible in the future. The old syntax
is still supported for now.
The old names ("system" and "application" compression) were not really
accurate in all cases, so the compression types are now referred to by
their number.
The non-stream-based APIs still exist as before and are not deprecated,
they just act as thin wrappers around the stream-based API.
The main rsrcfork module doesn't use the stream-based APIs yet, because
it reads each resource's data all at once and not incrementally.
This is a little more complex than with the other decompressors,
because .dcmp2 has to behave differently when at the byte before EOF.
Checking whether this is the case requires lookahead, which is not easy
to do with a plain IO stream.
Some buffered IO streams provide a peek method for lookahead, but
others don't (such as io.BytesIO). There is no standard way to wrap an
already buffered IO stream to add a peek method, so we need a custom
wrapper class and helper function for this purpose.
The decompression code is more readable this way, because the
compressed data needs to be processed sequentially. It also allows
moving the length check and some debug logging into an outer generator.
This also allows incremental decompression, but this doesn't have any
practical advantage, because the compressed resource data is all read
at once (there is no API for opening resources as streams), and
resources are not very large anyway.
The leading underscore is meant to distinguish private (for internal
use only) APIs from public (for external use) APIs. One can argue about
where the line between public and private should be, but if something
is used from other modules (as with read_variable_length_integer) it's
not really private IMHO.
In scripts (like __main__) it also doesn't make much sense to use
leading underscores, because the entire file is never meant to be used
by external code.
Previously, the types of instance attributes were annotated with the
first assignment of each attribute. The standard way to annotate
instance attributes is to do so at class level without assigning any
value.
According to https://bugs.python.org/issue35089, typing.io should not
be used anymore, and the types that it contains should be accessed
through the main typing module instead.
For compressed resources, the value of the length attribute can be
accessed much more quickly than the data itself (because it only
requires parsing the header, rather than decompressing the entire
data). This is used to speed up listing of compressed resources on the
command line.
The length_raw attribute is added for symmetry, although it is not
specifically optimized in any case yet.