Commit Graph

65 Commits

Author SHA1 Message Date
dgelessus
f3b3de496e Change naming of compression types
The old names ("system" and "application" compression) were not really
accurate in all cases, so the compression types are now referred to by
their number.
2019-10-07 10:08:32 +02:00
dgelessus
6d69d0097d Update rsrcfork.compress.__all__ 2019-10-02 16:29:32 +02:00
dgelessus
8db1b22bdc Make the generic decompression API stream-based
The non-stream-based APIs still exist as before and are not deprecated,
they just act as thin wrappers around the stream-based API.

The main rsrcfork module doesn't use the stream-based APIs yet, because
it reads each resource's data all at once and not incrementally.
2019-10-02 16:28:40 +02:00
dgelessus
6559cbc337 Refactor .dcmp2 to be stream-based
This is a little more complex than with the other decompressors,
because .dcmp2 has to behave differently when at the byte before EOF.
Checking whether this is the case requires lookahead, which is not easy
to do with a plain IO stream.

Some buffered IO streams provide a peek method for lookahead, but
others don't (such as io.BytesIO). There is no standard way to wrap an
already buffered IO stream to add a peek method, so we need a custom
wrapper class and helper function for this purpose.
2019-10-02 10:26:03 +02:00
dgelessus
1e79dc3c50 Refactor .dcmp0 and .dcmp1 to be stream-based
The decompression code is more readable this way, because the
compressed data needs to be processed sequentially. It also allows
moving the length check and some debug logging into an outer generator.

This also allows incremental decompression, but this doesn't have any
practical advantage, because the compressed resource data is all read
at once (there is no API for opening resources as streams), and
resources are not very large anyway.
2019-10-01 21:26:41 +02:00
dgelessus
db48212ade Fix a typo in a .compress.dcmp0 debug message 2019-10-01 21:26:41 +02:00
dgelessus
3a72bd3406 Remove leading underscores where they don't make much sense
The leading underscore is meant to distinguish private (for internal
use only) APIs from public (for external use) APIs. One can argue about
where the line between public and private should be, but if something
is used from other modules (as with read_variable_length_integer) it's
not really private IMHO.

In scripts (like __main__) it also doesn't make much sense to use
leading underscores, because the entire file is never meant to be used
by external code.
2019-10-01 21:26:41 +02:00
dgelessus
cb868b8005 Bump version to 1.4.1.dev 2019-09-29 19:27:43 +02:00
dgelessus
2f2472cfe9 Release version 1.4.0 2019-09-29 19:20:37 +02:00
dgelessus
e0f73d3220 Fix more issues reported by mypy 2019-09-29 16:28:07 +02:00
dgelessus
e5875ffe67 Fix various issues reported by mypy 2019-09-29 16:14:55 +02:00
dgelessus
449bf4dd71 Use parameterized typing.Mapping in ResourceFile definition
Previously the un-parameterized collections.abc.Mapping was used, which
makes type checking less accurate, as the exact key/value types are not
known.
2019-09-29 15:42:19 +02:00
dgelessus
0ac6e8a3c4 Fix misplaced parens in dcmp modules 2019-09-29 15:33:14 +02:00
dgelessus
29ddd21740 Add missing type annotations on some methods 2019-09-29 15:32:18 +02:00
dgelessus
add22b704a Fix ResourceFile.__enter__ not returning anything 2019-09-29 15:09:41 +02:00
dgelessus
fdd04c944b Remove __slots__ declaration from Resource class
It doesn't seem to have any noticeable performance benefit.
2019-09-29 15:00:45 +02:00
dgelessus
97c459bca7 Change attribute type annotations to standard format
Previously, the types of instance attributes were annotated with the
first assignment of each attribute. The standard way to annotate
instance attributes is to do so at class level without assigning any
value.
2019-09-29 14:58:18 +02:00
dgelessus
9ef084de58 Remove uses of the typing.io pseudo-module
According to https://bugs.python.org/issue35089, typing.io should not
be used anymore, and the types that it contains should be accessed
through the main typing module instead.
2019-09-28 01:40:34 +02:00
dgelessus
84f09d0b83 Display 'dcmp' IDs in command line listings of compressed resources 2019-09-24 00:27:54 +02:00
dgelessus
c108af60ca Add length and length_raw attributes to Resource (closes #3)
For compressed resources, the value of the length attribute can be
accessed much more quickly than the data itself (because it only
requires parsing the header, rather than decompressing the entire
data). This is used to speed up listing of compressed resources on the
command line.

The length_raw attribute is added for symmetry, although it is not
specifically optimized in any case yet.
2019-09-24 00:13:23 +02:00
dgelessus
0c942e26ec Fix hex number formatting in compressed header info reprs 2019-09-23 23:52:06 +02:00
dgelessus
868a322b8e Add Resource.compressed_info attribute
This allows accessing a compressed resource's header data, without
having to decompress it or parse the compressed data manually.
2019-09-23 23:50:29 +02:00
dgelessus
a23cd0fcb2 Simplify decompressor lookup
All decompressors now have exactly the same signature (as a result,
each decompressor now has to check itself that the header type is
correct). This allows the decompressors to be stored in a simple
dictionary, which makes the lookup process much simpler.
2019-09-23 23:32:38 +02:00
dgelessus
53e73be980 Pass complete header info to individual decompressors 2019-09-23 23:19:20 +02:00
dgelessus
9dbdf5b827 Move compressed header info constants/classes to .compress.common
This allows the constants/classes to be accessed from the individual
decompressor submodules.
2019-09-23 23:14:06 +02:00
dgelessus
87d4ae43d4 Refactor parsing of compressed resource headers
In preparation for #3, the compressed resource data headers are parsed
and stored as proper objects. For now these objects are only used
internally by the decompression code, but in the future they can be
exposed.
2019-09-23 23:10:55 +02:00
dgelessus
716ac30a53 Add release instructions in a comment in __init__.py 2019-09-16 17:09:47 +02:00
dgelessus
20991154d3 Bump version to 1.3.1.dev 2019-09-16 16:46:17 +02:00
dgelessus
7207b1d32b Release version 1.3.0 2019-09-16 16:34:40 +02:00
dgelessus
1de940d597 Enable --sort by default and add --no-sort to disable sorting
In most cases the file order is not important and the unsorted output
hurts readability. The performance impact of sorting is relatively
small and barely noticeable even with large resource files.
2019-09-16 15:25:41 +02:00
dgelessus
d7255bc977 Adjust --group=id output format slightly 2019-09-16 14:58:21 +02:00
dgelessus
c6337bdfbd Rename resource_type and resource_id attributes to type and id
The old names were chosen to avoid conflicts with Python's type and id
builtins, but for attribute names this is not necessary.
2019-09-15 15:56:03 +02:00
dgelessus
f4c2717720 Add command-line --group option 2019-09-15 15:38:01 +02:00
dgelessus
8ad0234633 Add command-line --sort option 2019-09-13 15:00:56 +02:00
dgelessus
7612322c43 Add dump-text output format on command line 2019-09-13 14:51:16 +02:00
dgelessus
51ae7c6a09 Refactor __main__.main into smaller functions 2019-09-13 14:17:21 +02:00
dgelessus
194c886472 Change hex dump output format to match hexdump -C 2019-09-13 10:51:27 +02:00
dgelessus
b2fa5f8b0f Collapse multiple subsequent identical lines in hex dumps 2019-09-13 10:40:03 +02:00
dgelessus
752ec9e828 Bump version to 1.2.1.dev 2019-09-13 10:22:43 +02:00
dgelessus
fb5708e6b4 Release version 1.2.0 2019-09-13 10:05:16 +02:00
dgelessus
d082f29238 Use MacRoman as the encoding for four-char codes and strings
Previously all non-ASCII characters were hex-escaped on output.
However, many resource files use MacRoman characters in resource names
and sometimes in resource types, so it makes sense to use MacRoman in
the interest of readability.
2019-09-03 02:10:04 +02:00
dgelessus
fb827e4073 Remove unused loop counter from _bytes_unescape 2019-09-03 01:35:47 +02:00
dgelessus
c373b9fe28 Clean up resource descriptions in listings and dumps
Previously, when some aspect of a resource's metadata was not present
(e. g. a resource with no name), the description would
explicitly point this out (and e. g. say "unnamed"). Now missing parts
of the metadata are simply omitted from the description, resulting in
cleaner output in many cases.

The resource description formats used by the listings and dumps have
also been unified. Previously the descriptions were structured slightly
differently in each case; this is now no longer the case.
2019-09-03 01:32:26 +02:00
dgelessus
e6779b021a Replace rsrcfork.open's rsrcfork parameter with a more usable version
The new fork parameter accepts strings, which are more understandable
than the old None/True/False values, and can be extended in the future.
2019-08-31 20:07:26 +02:00
dgelessus
c4fe09dbf0 Improve errors when filter doesn't match required number of resources 2019-08-31 14:48:01 +02:00
dgelessus
acdbbc89b2 Improve automatic fork selection when resource fork is invalid 2019-08-30 23:17:59 +02:00
dgelessus
d7fb67fac1 Add better error checking for invalid resource files 2019-08-30 23:17:59 +02:00
dgelessus
5ede8a351a Rework how non-seekable streams are handled by ResourceFile
The broken non-seeking read implementation of ResourceFile is removed,
and non-seekable streams are now handled by reading the entire stream
data first and wrapping it in a BytesIO to make it seekable.

The manual selection of seeking/non-seeking reading has been removed as
well, since it is no longer needed and was already nearly useless.
2019-08-30 23:17:18 +02:00
dgelessus
f798928270 Rewrite, update and expand project descriptions 2019-08-25 21:37:50 +02:00
dgelessus
3e28fa7fe0 Fix typos 2019-08-24 23:38:07 +02:00