Compare commits

...

134 Commits

Author SHA1 Message Date
dgelessus d2bbab1f5d Use Ubuntu 20.04 on GitHub Actions for Python 3.6 support (hopefully) 2023-02-14 22:58:28 +01:00
dgelessus a2663ae85d Allow compressed resource header length field to be 0 (see #10) 2023-02-14 21:31:45 +01:00
dgelessus 8f60fcdfc4 Add stacklevel to warnings.warn calls as recommended by flake8 2023-02-14 21:30:44 +01:00
dgelessus ff9377dc8d Reformat setup.cfg flake8 ignore option to make current flake8 happy 2023-02-14 21:30:10 +01:00
dgelessus 0624d4eae9 Mark Python 3.11 as supported 2023-02-14 21:09:17 +01:00
dgelessus 4ad1d1d6b9 Update GitHub Actions to current versions 2022-09-08 11:04:47 +02:00
dgelessus b95c4917cc Fix new mypy error about enum.Flag.name possibly being None 2022-09-08 11:03:52 +02:00
dgelessus ee767a106c Remove flake8-tabs plugin that is incompatible with flake8 5
And instead manually ignore the relevant indentation errors/warnings.
2022-09-08 10:46:45 +02:00
dgelessus 82951f5d8e Update usage examples in README.md 2022-09-08 10:42:42 +02:00
dgelessus b9fdac1c0b Wrap some long lines in README.md for better diffing/merging 2022-09-08 10:38:39 +02:00
dgelessus 70d51c2907 Convert README from reStructuredText to Markdown
Because it doesn't use any reStructuredText-specific features and
Markdown syntax is less annoying.
2022-09-08 10:30:21 +02:00
dgelessus f891a6ee00 Move main source code into src subdirectory
This avoids certain problems related to the entire repo root appearing
on the import path, usually when running Python/pip from the repo root
or when using an editable install.
2021-11-21 19:15:51 +01:00
dgelessus b1e5e7c96e Update actions/setup-python to v2 2021-11-21 18:50:34 +01:00
dgelessus 60709e386a Add Python 3.10 to test matrix and classifiers 2021-11-21 18:48:12 +01:00
dgelessus f437ee5f43 Reduce test matrix to just the oldest and newest Python versions
Testing the versions in between doesn't really bring much benefit, and
it becomes impractical when the range of supported versions grows.
2021-11-21 18:24:17 +01:00
dgelessus 5c3bc5d7e5 Remove custom stream types and read all resource data upfront again
The custom stream types were almost always slower than just reading the
entire data into memory, and there's no reason not to do that -
resources are small enough that memory usage and disk IO speed aren't a
concern (at least not for any machine that's modern enough to run
Python 3...).

Perhaps the only performance advantage was when reading a small amount
of data from the start of a compressed resource. In that case the
custom stream could incrementally decompress only the part of the data
that's actually needed, which was a bit faster than decompressing the
entire resource and then throwing away most of the data. But this
situation is rare enough that it's not worth handling in the rsrcfork
library. If this is a real performance issue for someone, they can
manually call the incremental decompression functions from
rsrcfork.compress where needed.
2020-11-01 19:28:25 +01:00
dgelessus d74dbc41ba Remove no longer needed type: ignore comment from .__main__ 2020-11-01 19:13:11 +01:00
dgelessus 0642b1e8bf Add Python 3.9 to test matrix and classifiers 2020-11-01 19:09:43 +01:00
dgelessus 54ccdb0a47 Add Typing :: Typed classifier 2020-08-08 19:36:23 +02:00
dgelessus f76817c389 Reorder classifiers alphabetically 2020-08-08 19:30:26 +02:00
dgelessus 9e6dfacff6 Add custom stream type for compressed resources 2020-08-01 14:11:06 +02:00
dgelessus 8d39469e6e Wrap _io_utils.SubStream in an io.BufferedReader for performance
Although the underlying stream is already buffered, the extra
BufferedReader wrapper around the SubStream results in a noticeable
performance improvement.
2020-07-23 15:49:28 +02:00
dgelessus 028be98e8d Suppress incorrect mypy shutil.copyfileobj error (python/mypy#8962) 2020-07-23 13:31:57 +02:00
dgelessus 98551263b3 Work around false mypy error about SubStream.__enter__ 2020-07-23 13:21:34 +02:00
dgelessus 0054d0f7b5 Fix variable naming conflict in show_filtered_resources
mypy does not like it when the same variable name is used for different
types, even if it is in unrelated if branches and could never actually
cause any problems.
2020-07-23 13:17:02 +02:00
dgelessus 126795239c Reimplement Resource.data_raw using a custom stream type (SubStream)
This way all reads performed on a resource data stream are forwarded
to the underlying resource file stream, with the read offsets and
lengths adjusted appropriately.
2020-07-23 02:42:32 +02:00
dgelessus 2907d9f9e8 Rewrite __main__ code to use stream-based resource reading 2020-07-21 14:45:48 +02:00
dgelessus 5c96baea29 Change (raw_)hexdump to yield lines instead of printing directly 2020-07-21 14:27:43 +02:00
dgelessus 664e992fa3 Rewrite Resource methods using stream API where appropriate 2020-07-21 14:20:50 +02:00
dgelessus 61247ec783 Add initial API and tests for stream-based resource reading
For now the stream-based API is a simple BytesIO wrapper around
data/data_raw, but it will be optimized in the future.
2020-07-21 14:12:09 +02:00
dgelessus 0f6018e4bf Move .compress.common.make_peekable and related code into ._io_utils 2020-07-19 23:16:36 +02:00
dgelessus 476a68916b Merge implementations of read_exact functions/methods
The old functions/methods still exist, so that they continue to raise
the same exceptions as before (which are different depending on
context), but they now use the same implementation internally.
2020-07-18 21:07:12 +02:00
dgelessus 4bbf2f7c14 Bump version to 1.8.1.dev 2020-07-18 17:47:48 +02:00
dgelessus 82b5926b4f Release version 1.8.0 2020-07-18 17:31:25 +02:00
dgelessus 5456013bf4 Use flake8 extend-exclude setting instead of exclude
This setting was added in flake8 3.8.0 and allows adding entries to the
exclude list without also removing the default entries.
2020-07-18 13:40:15 +02:00
dgelessus b595456a05 Switch back to using attr directive in setup.cfg version
As of setuptools 46.4.0, this extracts the attribute value statically
using the ast module, if possible. This allows it to work properly even
if the attribute is stored in a file that cannot be imported at setup
time (e. g. because of dependencies that might not be installed yet).
2020-07-18 13:13:19 +02:00
dgelessus d367a9238a Add bytes_quote helper function
bytes_quote does the same as bytes_escape, but automatically adds the
quote character around the escaped string.
2020-07-07 01:03:38 +02:00
dgelessus 33c4016124 Fix flake8 problems 2020-07-07 00:04:54 +02:00
dgelessus b01cfc77cf Don't pass required=True to add_subparsers
The required kwarg of add_subparsers was only added in Python 3.7, and
we currently still support Python 3.6.
2020-07-07 00:01:57 +02:00
dgelessus a9f54b678c Add py.typed marker file for PEP 561
This allows type checkers like mypy to use the type hints in our code
when type-checking another project that imports this library.
2020-07-06 23:57:25 +02:00
dgelessus b46018e666 Use is_printable in definition of _TRANSLATE_NONPRINTABLES 2020-07-06 18:01:34 +02:00
dgelessus b0eefe3889 Replace custom CLI subcommand system with standard argparse subparsers
This is a purely internal change and should have no visible effect on
the command-line interface.
2020-07-05 19:43:01 +02:00
dgelessus 3e0bbcee04 Add Python 3.8 classifier to metadata 2020-04-19 16:21:25 +02:00
dgelessus 13654c2560 Add pyproject.toml with PEP 517/PEP 518 metadata 2020-04-19 16:21:04 +02:00
dgelessus d5199bd503 Replace setup.cfg metadata license_file with license_files
license_file has been deprecated in wheel 0.32.0 in favor of
license_files.
2020-04-19 16:20:07 +02:00
dgelessus c5c3f24a10 Add tox environment for building and checking distributions 2020-04-19 16:16:21 +02:00
dgelessus 7c77c4ef20 Prepare setup.py/.cfg for additional import-time dependencies
Reading the version number using attr: rsrcfork.__version__ will no
longer work properly if rsrcfork has non-stdlib dependencies at import
time, because setuptools needs to be able to import rsrcfork and read
the version number before the dependencies are installed.

As a workaround, our setup.py now manually parses the version number
from rsrcfork/__init__.py using the ast module.
2020-04-03 22:32:10 +02:00
dgelessus f7b6080c0e Remove random execute bits from test data files 2020-04-01 00:01:50 +02:00
dgelessus 007d15eb3d Fix tox configuration breaking on spaces in the project path
The {envpython} substitution is not quoted, so spaces in the path are
treated as argument separators and cause the test runs to fail.
To work around this, we now always use an unqualified python command
instead of the {envpython} substitution. This is safe because the tox
commands are always run in a virtual environment, so the python command
is guaranteed to point to the environment's Python and not the system
default.
2020-03-30 01:46:10 +02:00
dgelessus 246b69e375 Remove accidental empty comment from test_rsrcfork.py 2020-01-21 22:32:44 +01:00
dgelessus d67ff64851 Add tests for reading from resource forks and fork auto-selection
These tests are only run on Mac, because they require native support
for resource forks.
2020-01-21 22:29:18 +01:00
dgelessus 5391d66a78 Add tests for reading resource files from streams instead of path 2020-01-21 15:20:46 +01:00
dgelessus 5b2700bf17 Add some missing asserts to test_compress_compare 2020-01-19 23:24:52 +01:00
dgelessus c41b25fea1 Add test case for compressed resource handling and decompression 2020-01-19 23:19:19 +01:00
dgelessus a45dbd8eca Remove upgrade of pip from CI workflow
The GitHub Actions environment clearly has a working pip pre-installed,
and it's unlikely that this project relies on any extremely new
features.
2020-01-19 19:59:42 +01:00
dgelessus 3401ce65dd Update actions/checkout to v2 2020-01-19 19:38:29 +01:00
dgelessus 890dd24f76 Also run CI workflow on pull requests 2020-01-19 19:36:53 +01:00
dgelessus 67c2b4acf0 Add test case for additional resource file and resource metadata 2020-01-19 19:22:59 +01:00
dgelessus 238c78a73e Simplify attribute asserts in tests 2020-01-19 19:05:05 +01:00
dgelessus fbd861edf4 Fix test_textclipping not checking resource ID lists properly
Because Python's zip terminates once *any* of the input iterables
terminates, the previous code would not detect if the file was missing
resources or contained extra ones.
2020-01-19 02:30:10 +01:00
dgelessus a7a407a1dd Add extra assertion to test_textclipping 2019-12-30 03:04:48 +01:00
dgelessus ecee2616cf Add flake8-bugbear plugin 2019-12-30 03:04:27 +01:00
dgelessus ba284d1800 Fix a bunch of flake8 violations 2019-12-30 03:00:12 +01:00
dgelessus f690caac24 Add flake8 configuration 2019-12-30 02:57:31 +01:00
dgelessus 3a805c3e56 Add GitHub Actions workflow for CI 2019-12-30 01:59:05 +01:00
dgelessus 6adf8eb88d Fix mypy errors about byte strings as format string parameters 2019-12-30 01:48:33 +01:00
dgelessus e132a91dea Fix missing sys.exit calls in CLI subcommand functions 2019-12-30 01:48:33 +01:00
dgelessus 4e1cd05412 Fix miscellaneous mypy errors 2019-12-30 01:48:33 +01:00
dgelessus 1a416defed Add tox configuration 2019-12-30 01:48:33 +01:00
dgelessus 1089a19c01 Add basic unit tests 2019-12-29 00:39:40 +01:00
dgelessus 8fc24040ea Add resource-info subcommand 2019-12-26 01:58:23 +01:00
dgelessus d492d9a6a8 Remove an incorrect assertion from describe_resource
red.compressed_info can be None here if decompress is False.
2019-12-26 01:50:34 +01:00
dgelessus d0e1eaf262 Add raw-compress-info subcommand (#6) 2019-12-26 00:34:27 +01:00
dgelessus 1e55569442 Add support for passing filters to the list subcommand 2019-12-25 01:47:03 +01:00
dgelessus 2abf6e2a06 Add class for resource filters in place of lambdas
This is easier to debug (printing out a lambda doesn't show what values
it checks against) and makes it easier to check that the filter values
are valid.
2019-12-25 00:15:35 +01:00
dgelessus 2b0bbb19ed Refactor filter_resources in __main__
With the new implementation, each filter is converted to a function,
then all resources are checked if they match any of the filter
functions. This is simpler than the old implementation, where the
resource lookup code was slightly different for some filter forms.
2019-12-25 00:15:35 +01:00
dgelessus c009e8f80f Support passing an empty filter list to filter_resources 2019-12-25 00:15:35 +01:00
dgelessus d67641d537 Remove compatibility code for old CLI syntax 2019-12-25 00:15:30 +01:00
dgelessus d6dbfdb149 Fix version number in changelog 2019-12-17 12:17:31 +01:00
dgelessus b2502c48a2 Bump version to 1.7.1.dev 2019-12-17 12:16:39 +01:00
dgelessus 158ca4884b Release version 1.7.0 2019-12-17 11:28:26 +01:00
dgelessus 8568f355c4 Remove incorrect outdated paragraph from list subcommand help 2019-12-10 16:15:18 +01:00
dgelessus 97d2dbe1b3 Change formatting of command help strings in source code
The automatic textwrap.dedent makes it impossible to cleanly extract
parts of the help strings into separate constants.
2019-12-10 15:58:20 +01:00
dgelessus a4b6328782 Fix 'dcmp' (0) jump table decompression for large segment numbers 2019-12-04 23:36:57 +01:00
dgelessus 393160b5da Add raw-decompress subcommand (#6) 2019-12-04 23:36:56 +01:00
dgelessus 476eaecd17 Fix typo in the help text for rsrcfork read 2019-12-04 21:16:29 +01:00
dgelessus 546edbc31a Update and improve resource and resource map reprs 2019-12-04 02:01:40 +01:00
dgelessus cf6ce3c2a6 Move _LazyResourceMap out of ResourceFile 2019-12-04 02:01:40 +01:00
dgelessus af2ac70676 Simplify ResourceFile._references and ._LazyResourceMap
The _references map now stores Resource objects directly, instead of
constructing them only when they are looked up. Resource objects are
now lazy themselves, so the previous lazy resource creation mechanism
is redundant.

_LazyResourceMap is now a simple read-only wrapper around an existing
map. The custom class is now only used to provide a specialized repr.
2019-12-04 02:01:40 +01:00
dgelessus 5af455992b Refactor resource reading internals
The reading of resource name and data is now performed in the Resource
class (lazily, when the respective attributes are accessed) instead of
in ResourceFile._LazyResourceMap.
2019-12-04 02:01:40 +01:00
dgelessus 2193c81518 Bump version to 1.6.1.dev 2019-12-04 01:45:15 +01:00
dgelessus 7dc0d980a3 Release version 1.6.0 2019-12-04 01:35:57 +01:00
dgelessus 2ce1d6b63a Move resource file format reference links to mac_file_format_docs repo
eebce6e7cc
2019-10-30 23:51:12 +01:00
dgelessus ec5eb3bcc1 Don't display header data and attributes in list output
This is redundant now that there is a dedicated info subcommand.
2019-10-22 13:26:03 +02:00
dgelessus 25bec2f93a Add info subcommand to display technical info/stats about resource file 2019-10-22 13:18:25 +02:00
dgelessus 6fbb919285 Display warnings when the old CLI syntax is used 2019-10-22 10:46:25 +02:00
dgelessus 3be4d9c969 Add a new subcommand-based CLI syntax
The new syntax supports the same operations as the old syntax, but is
clearer to understand and more extensible in the future. The old syntax
is still supported for now.
2019-10-22 10:25:22 +02:00
dgelessus f537fb3d37 Bump version to 1.5.1.dev 2019-10-22 10:23:33 +02:00
dgelessus d342614f55 Release version 1.5.0 2019-10-22 10:17:07 +02:00
dgelessus a5fb30e194 Fix broken handling of - (stdin) file name on command line 2019-10-16 23:29:20 +02:00
dgelessus f3b3de496e Change naming of compression types
The old names ("system" and "application" compression) were not really
accurate in all cases, so the compression types are now referred to by
their number.
2019-10-07 10:08:32 +02:00
dgelessus a71274d554 Document stream-based decompression in changelog 2019-10-02 16:36:54 +02:00
dgelessus 6d69d0097d Update rsrcfork.compress.__all__ 2019-10-02 16:29:32 +02:00
dgelessus 8db1b22bdc Make the generic decompression API stream-based
The non-stream-based APIs still exist as before and are not deprecated,
they just act as thin wrappers around the stream-based API.

The main rsrcfork module doesn't use the stream-based APIs yet, because
it reads each resource's data all at once and not incrementally.
2019-10-02 16:28:40 +02:00
dgelessus 6559cbc337 Refactor .dcmp2 to be stream-based
This is a little more complex than with the other decompressors,
because .dcmp2 has to behave differently when at the byte before EOF.
Checking whether this is the case requires lookahead, which is not easy
to do with a plain IO stream.

Some buffered IO streams provide a peek method for lookahead, but
others don't (such as io.BytesIO). There is no standard way to wrap an
already buffered IO stream to add a peek method, so we need a custom
wrapper class and helper function for this purpose.
2019-10-02 10:26:03 +02:00
dgelessus 1e79dc3c50 Refactor .dcmp0 and .dcmp1 to be stream-based
The decompression code is more readable this way, because the
compressed data needs to be processed sequentially. It also allows
moving the length check and some debug logging into an outer generator.

This also allows incremental decompression, but this doesn't have any
practical advantage, because the compressed resource data is all read
at once (there is no API for opening resources as streams), and
resources are not very large anyway.
2019-10-01 21:26:41 +02:00
dgelessus db48212ade Fix a typo in a .compress.dcmp0 debug message 2019-10-01 21:26:41 +02:00
dgelessus 3a72bd3406 Remove leading underscores where they don't make much sense
The leading underscore is meant to distinguish private (for internal
use only) APIs from public (for external use) APIs. One can argue about
where the line between public and private should be, but if something
is used from other modules (as with read_variable_length_integer) it's
not really private IMHO.

In scripts (like __main__) it also doesn't make much sense to use
leading underscores, because the entire file is never meant to be used
by external code.
2019-10-01 21:26:41 +02:00
dgelessus cb868b8005 Bump version to 1.4.1.dev 2019-09-29 19:27:43 +02:00
dgelessus 2f2472cfe9 Release version 1.4.0 2019-09-29 19:20:37 +02:00
dgelessus e0f73d3220 Fix more issues reported by mypy 2019-09-29 16:28:07 +02:00
dgelessus b77c85c295 Add mypy configuration section to setup.cfg 2019-09-29 16:27:37 +02:00
dgelessus e5875ffe67 Fix various issues reported by mypy 2019-09-29 16:14:55 +02:00
dgelessus 449bf4dd71 Use parameterized typing.Mapping in ResourceFile definition
Previously the un-parameterized collections.abc.Mapping was used, which
makes type checking less accurate, as the exact key/value types are not
known.
2019-09-29 15:42:19 +02:00
dgelessus 0ac6e8a3c4 Fix misplaced parens in dcmp modules 2019-09-29 15:33:14 +02:00
dgelessus 29ddd21740 Add missing type annotations on some methods 2019-09-29 15:32:18 +02:00
dgelessus add22b704a Fix ResourceFile.__enter__ not returning anything 2019-09-29 15:09:41 +02:00
dgelessus fdd04c944b Remove __slots__ declaration from Resource class
It doesn't seem to have any noticeable performance benefit.
2019-09-29 15:00:45 +02:00
dgelessus 97c459bca7 Change attribute type annotations to standard format
Previously, the types of instance attributes were annotated with the
first assignment of each attribute. The standard way to annotate
instance attributes is to do so at class level without assigning any
value.
2019-09-29 14:58:18 +02:00
dgelessus 9ef084de58 Remove uses of the typing.io pseudo-module
According to https://bugs.python.org/issue35089, typing.io should not
be used anymore, and the types that it contains should be accessed
through the main typing module instead.
2019-09-28 01:40:34 +02:00
dgelessus 6d03954784 Document setup.cfg options.packages fixes in changelog 2019-09-25 02:32:32 +02:00
dgelessus 343259049c Fix setup.cfg options.packages not including subpackages
This caused normal installs (i. e. without --editable) of this library
to not include the rsrcfork.compress subpackage, and made everything
unusable as a result. Oops.
2019-09-25 01:51:23 +02:00
dgelessus e75e88018e Add lots of additional Inside Macintosh-related links/info to README 2019-09-25 00:32:18 +02:00
dgelessus 0f72e8eb1f Document decompression improvements in changelog 2019-09-24 00:46:35 +02:00
dgelessus 84f09d0b83 Display 'dcmp' IDs in command line listings of compressed resources 2019-09-24 00:27:54 +02:00
dgelessus c108af60ca Add length and length_raw attributes to Resource (closes #3)
For compressed resources, the value of the length attribute can be
accessed much more quickly than the data itself (because it only
requires parsing the header, rather than decompressing the entire
data). This is used to speed up listing of compressed resources on the
command line.

The length_raw attribute is added for symmetry, although it is not
specifically optimized in any case yet.
2019-09-24 00:13:23 +02:00
dgelessus 0c942e26ec Fix hex number formatting in compressed header info reprs 2019-09-23 23:52:06 +02:00
dgelessus 868a322b8e Add Resource.compressed_info attribute
This allows accessing a compressed resource's header data, without
having to decompress it or parse the compressed data manually.
2019-09-23 23:50:29 +02:00
dgelessus a23cd0fcb2 Simplify decompressor lookup
All decompressors now have exactly the same signature (as a result,
each decompressor now has to check itself that the header type is
correct). This allows the decompressors to be stored in a simple
dictionary, which makes the lookup process much simpler.
2019-09-23 23:32:38 +02:00
dgelessus 53e73be980 Pass complete header info to individual decompressors 2019-09-23 23:19:20 +02:00
dgelessus 9dbdf5b827 Move compressed header info constants/classes to .compress.common
This allows the constants/classes to be accessed from the individual
decompressor submodules.
2019-09-23 23:14:06 +02:00
dgelessus 87d4ae43d4 Refactor parsing of compressed resource headers
In preparation for #3, the compressed resource data headers are parsed
and stored as proper objects. For now these objects are only used
internally by the decompression code, but in the future they can be
exposed.
2019-09-23 23:10:55 +02:00
dgelessus 716ac30a53 Add release instructions in a comment in __init__.py 2019-09-16 17:09:47 +02:00
dgelessus 20991154d3 Bump version to 1.3.1.dev 2019-09-16 16:46:17 +02:00
35 changed files with 2409 additions and 1117 deletions

View File

@ -8,3 +8,7 @@ insert_final_newline = true
[*.rst]
indent_style = space
indent_size = 4
[*.yml]
indent_style = space
indent_size = 2

17
.github/workflows/ci.yml vendored Normal file
View File

@ -0,0 +1,17 @@
on: [pull_request, push]
jobs:
test:
strategy:
matrix:
platform: [macos-latest, ubuntu-20.04, windows-latest]
runs-on: ${{ matrix.platform }}
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.6"
- uses: actions/setup-python@v4
with:
python-version: "3.11"
- run: python -m pip install --upgrade tox
- run: tox

6
.gitignore vendored
View File

@ -2,7 +2,13 @@
*.py[co]
__pycache__/
# tox
.tox/
# setuptools
*.egg-info/
build/
dist/
# mypy
.mypy_cache/

6
MANIFEST.in Normal file
View File

@ -0,0 +1,6 @@
# Note: See the PyPA documentation for a list of file names that are included/excluded by default:
# https://packaging.python.org/guides/using-manifest-in/#how-files-are-included-in-an-sdist
# Please only add entries here for files that are *not* already handled by default.
recursive-include tests *.py
recursive-include tests/data *.rsrc

326
README.md Normal file
View File

@ -0,0 +1,326 @@
# `rsrcfork`
A pure Python, cross-platform library/tool for reading Macintosh resource data,
as stored in resource forks and `.rsrc` files.
Resource forks were an important part of the Classic Mac OS,
where they provided a standard way to store structured file data, metadata and application resources.
This usage continued into Mac OS X (now called macOS) for backward compatibility,
but over time resource forks became less commonly used in favor of simple data fork-only formats, application bundles, and extended attributes.
As of OS X 10.8 and the deprecation of the Carbon API,
macOS no longer provides any officially supported APIs for using and manipulating resource data.
Despite this, parts of macOS still support and use resource forks,
for example to store custom file and folder icons set by the user.
## Features
* Pure Python, cross-platform - no native Mac APIs are used.
* Provides both a Python API and a command-line tool.
* Resource data can be read from either the resource fork or the data fork.
* On Mac systems, the correct fork is selected automatically when reading a file.
This allows reading both regular resource forks and resource data stored in data forks (as with `.rsrc` and similar files).
* On non-Mac systems, resource forks are not available, so the data fork is always used.
* Compressed resources (supported by System 7 through Mac OS 9) are automatically decompressed.
* Only the standard System 7.0 resource compression methods are supported.
Resources that use non-standard decompressors cannot be decompressed.
* Object `repr`s are REPL-friendly:
all relevant information is displayed,
and long data is truncated to avoid filling up the screen by accident.
## Requirements
Python 3.6 or later.
No other libraries are required.
## Installation
`rsrcfork` is available [on PyPI](https://pypi.org/project/rsrcfork/) and can be installed using `pip`:
```sh
$ python3 -m pip install rsrcfork
```
Alternatively you can download the source code manually,
and run this command in the source code directory to install it:
```sh
$ python3 -m pip install .
```
## Examples
### Simple example
```python-repl
>>> import rsrcfork
>>> rf = rsrcfork.open("/Users/Shared/Test.textClipping")
>>> rf
<rsrcfork.ResourceFile at 0x1046e6048, attributes ResourceFileAttrs.0, containing 4 resource types: [b'utxt', b'utf8', b'TEXT', b'drag']>
>>> rf[b"TEXT"]
<Resource map for type b'TEXT', containing one resource: <Resource type b'TEXT', id 256, name None, attributes ResourceAttrs.0, data b'Here is some text'>>
```
### Automatic selection of data/resource fork
```python-repl
>>> import rsrcfork
>>> datarf = rsrcfork.open("/System/Library/Fonts/Monaco.dfont") # Resources in data fork
>>> datarf._stream
<_io.BufferedReader name='/System/Library/Fonts/Monaco.dfont'>
>>> resourcerf = rsrcfork.open("/Users/Shared/Test.textClipping") # Resources in resource fork
>>> resourcerf._stream
<_io.BufferedReader name='/Users/Shared/Test.textClipping/..namedfork/rsrc'>
```
### Command-line interface
```sh
$ rsrcfork list /Users/Shared/Test.textClipping
4 resource types:
'TEXT': 1 resources:
(256): 17 bytes
'drag': 1 resources:
(128): 64 bytes
'utf8': 1 resources:
(256): 17 bytes
'utxt': 1 resources:
(256): 34 bytes
$ rsrcfork read /Users/Shared/Test.textClipping "'TEXT' (256)"
Resource 'TEXT' (256): 17 bytes:
00000000 48 65 72 65 20 69 73 20 73 6f 6d 65 20 74 65 78 |Here is some tex|
00000010 74 |t|
00000011
```
## Limitations
This library only understands the resource file's general structure,
i. e. the type codes, IDs, attributes, and data of the resources stored in the file.
The data of individual resources is provided in raw bytes form and is not processed further -
the format of this data is specific to each resource type.
Definitions of common resource types can be found inside Carbon and related frameworks in Apple's macOS SDKs as `.r` files,
a format roughly similar to C struct definitions,
which is used by the `Rez` and `DeRez` command-line tools to de/compile resource data.
There doesn't seem to be an exact specification of this format,
and most documentation on it is only available inside old manuals for MPW (Macintosh Programmer's Workshop) or similar development tools for old Mac systems.
Some macOS text editors, such as BBEdit/TextWrangler and TextMate support syntax highlighting for `.r` files.
Writing resource data is not supported at all.
## Further info on resource files
For technical info and documentation about resource files and resources,
see the ["resource forks" section of the mac_file_format_docs repo](https://github.com/dgelessus/mac_file_format_docs/blob/master/README.md#resource-forks).
## Changelog
### Version 1.8.1 (next version)
* Added `open` and `open_raw` methods to `Resource` objects,
for stream-based access to resource data.
* Fixed reading of compressed resource headers with the header length field incorrectly set to 0
(because real Mac OS apparently accepts this).
### Version 1.8.0
* Removed the old (non-subcommand-based) CLI syntax.
* Added filtering support to the `list` subcommand.
* Added a `resource-info` subcommand to display technical information about resources
(more detailed than what is displayed by `list` and `read`).
* Added a `raw-compress-info` subcommand to display technical header information about standalone compressed resource data.
* Made the library PEP 561-compliant by adding a py.typed file.
* Fixed an incorrect `AssertionError` when using the `--no-decompress` command-line options.
### Version 1.7.0
* Added a `raw-decompress` subcommand to decompress compressed resource data stored in a standalone file rather than as a resource.
* Optimized lazy loading of `Resource` objects.
Previously, resource data would be read from disk whenever a `Resource` object was looked up,
even if the data itself is never used.
Now the resource data is only loaded once the `data` (or `data_raw`) attribute is accessed.
* The same optimization applies to the `name` attribute,
although this is unlikely to make a difference in practice.
* As a result, it is no longer possible to construct `Resource` objects without a resource file.
This was previously possible, but had no practical use.
* Fixed a small error in the `'dcmp' (0)` decompression implementation.
### Version 1.6.0
* Added a new subcommand-based command-line syntax to the `rsrcfork` tool,
similar to other CLI tools such as `git` or `diskutil`.
* This subcommand-based syntax is meant to replace the old CLI options,
as the subcommand structure is easier to understand and more extensible in the future.
* Currently there are three subcommands:
`list` to list resources in a file,
`read` to read/display resource data,
and `read-header` to read a resource file's header data.
These subcommands can be used to perform all operations that were also available with the old CLI syntax.
* The old CLI syntax is still supported for now,
but it will be removed soon.
* The new syntax no longer supports reading CLI arguments from a file (using `@args_file.txt`),
abbreviating long options (e. g. `--no-d` instead of `--no-decompress`),
or the short option `-f` instead of `--fork`.
If you have a need for any of these features,
please open an issue.
### Version 1.5.0
* Added stream-based decompression methods to the `rsrcfork.compress` module.
* The internal decompressor implementations have been refactored to use streams.
* This allows for incremental decompression of compressed resource data.
In practice this has no noticeable effect yet,
because the main `rsrcfork` API doesn't support incremental reading of resource data.
* Fixed the command line tool always displaying an incorrect error "Cannot specify an explicit fork when reading from stdin" when using `-` (stdin) as the input file.
### Version 1.4.0
* Added `length` and `length_raw` attributes to `Resource`.
These attributes are equivalent to the `len` of `data` and `data_raw` respectively,
but may be faster to access.
* Currently, the only optimized case is `length` for compressed resources,
but more optimizations may be added in the future.
* Added a `compressed_info` attribute to `Resource` that provides access to the header information of compressed resources.
* Improved handling of compressed resources when listing resource files with the command line tool.
* Metadata of compressed resources is now displayed even if no decompressor implementation is available
(as long as the compressed data header can be parsed).
* Performance has been improved -
the data no longer needs to be fully decompressed to get its length,
this information is now read from the header.
* The `'dcmp'` ID used to decompress each resource is displayed.
* Fixed an incorrect `options.packages` in `setup.cfg`,
which made the library unusable except when installing from source using `--editable`.
* Fixed `ResourceFile.__enter__` returning `None`,
which made it impossible to use `ResourceFile` properly in a `with` statement.
* Fixed various minor errors reported by type checking with `mypy`.
### Version 1.3.0.post1
* Fixed an incorrect `options.packages` in `setup.cfg`,
which made the library unusable except when installing from source using `--editable`.
### Version 1.2.0.post1
* Fixed an incorrect `options.packages` in `setup.cfg`,
which made the library unusable except when installing from source using `--editable`.
### Version 1.3.0
* Added a `--group` command line option to group resources in list format by type (the default), ID, or with no grouping.
* Added a `dump-text` output format to the command line tool.
This format is identical to `dump`,
but instead of a hex dump,
it outputs the resource data as text.
The data is decoded as MacRoman and classic Mac newlines (`\r`) are translated.
This is useful for examining resources that contain mostly plain text.
* Changed the command line tool to sort resources by type and ID,
and added a `--no-sort` option to disable sorting and output resources in file order
(which was the previous behavior).
* Renamed the `rsrcfork.Resource` attributes `resource_type` and `resource_id` to `type` and ``id``, respectively.
The old names have been deprecated and will be removed in the future,
but are still supported for now.
* Changed `--format=dump` output to match `hexdump -C`'s format -
spacing has been adjusted,
and multiple subsequent identical lines are collapsed into a single `*`.
### Version 1.2.0
* Added support for compressed resources.
* Compressed resource data is automatically decompressed,
both in the Python API and on the command line.
* This is technically a breaking change,
since in previous versions the compressed resource data was returned directly.
However, this change will not affect end users negatively,
unless one has already implemented custom handling for compressed resources.
* Currently, only the three standard System 7.0 compression formats (`'dcmp'` IDs 0, 1, 2) are supported.
Attempting to access a resource compressed in an unsupported format results in a `DecompressError`.
* To access the raw resource data as stored in the file,
without automatic decompression,
use the `res.data_raw` attribute (for the Python API),
or the `--no-decompress` option (for the command-line interface).
This can be used to read the resource data in its compressed form,
even if the compression format is not supported.
* Improved automatic data/resource fork selection for files whose resource fork contains invalid data.
* This fixes reading certain system files with resource data in their data fork
(such as HIToolbox.rsrc in HIToolbox.framework, or .dfont fonts)
on recent macOS versions (at least macOS 10.14, possibly earlier).
Although these files have no resource fork,
recent macOS versions will successfully open the resource fork and return garbage data for it.
This behavior is now detected and handled by using the data fork instead.
* Replaced the `rsrcfork` parameter of `rsrcfork.open`/`ResourceFork.open` with a new `fork` parameter.
`fork` accepts string values (like the command line `--fork` option) rather than `rsrcfork`'s hard to understand `None`/`True`/`False`.
* The old `rsrcfork` parameter has been deprecated and will be removed in the future,
but for now it still works as before.
* Added an explanatory message when a resource filter on the command line doesn't match any resources in the resource file.
Previously there would either be no output or a confusing error,
depending on the selected `--format`.
* Changed resource type codes and names to be displayed in MacRoman instead of escaping all non-ASCII characters.
* Cleaned up the resource descriptions in listings and dumps to improve readability.
Previously they included some redundant or unnecessary information -
for example, each resource with no attributes set would be explicitly marked as "no attributes".
* Unified the formats of resource descriptions in listings and dumps,
which were previously slightly different from each other.
* Improved error messages when attempting to read multiple resources using `--format=hex` or `--format=raw`.
* Fixed reading from non-seekable streams not working for some resource files.
* Removed the `allow_seek` parameter of `ResourceFork.__init__` and the `--read-mode` command line option.
They are no longer necessary,
and were already practically useless before due to non-seekable stream reading being broken.
### Version 1.1.3.post1
* Fixed a formatting error in the README.rst to allow upload to PyPI.
### Version 1.1.3
**Note: This version is not available on PyPI, see version 1.1.3.post1 changelog for details.**
* Added a setuptools entry point for the command-line interface.
This allows calling it using just `rsrcfork` instead of `python3 -m rsrcfork`.
* Changed the default value of `ResourceFork.__init__`'s `close` keyword argument from `True` to `False`.
This matches the behavior of classes like `zipfile.ZipFile` and `tarfile.TarFile`.
* Fixed `ResourceFork.open` and `ResourceFork.__init__` not closing their streams in some cases.
* Refactored the single `rsrcfork.py` file into a package.
This is an internal change and should have no effect on how the `rsrcfork` module is used.
### Version 1.1.2
* Added support for the resource file attributes "Resources Locked" and "Printer Driver MultiFinder Compatible" from ResEdit.
* Added more dummy constants for resource attributes with unknown meaning,
so that resource files containing such attributes can be loaded without errors.
### Version 1.1.1
* Fixed overflow issue with empty resource files or empty resource type entries
* Changed `_hexdump` to behave more like `hexdump -C`
### Version 1.1.0
* Added a command-line interface - run `python3 -m rsrcfork --help` for more info
### Version 1.0.0
* Initial version

View File

@ -1,200 +0,0 @@
``rsrcfork``
============
A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and ``.rsrc`` files.
Resource forks were an important part of the Classic Mac OS, where they provided a standard way to store structured file data, metadata and application resources. This usage continued into Mac OS X (now called macOS) for backward compatibility, but over time resource forks became less commonly used in favor of simple data fork-only formats, application bundles, and extended attributes.
As of OS X 10.8 and the deprecation of the Carbon API, macOS no longer provides any officially supported APIs for using and manipulating resource data. Despite this, parts of macOS still support and use resource forks, for example to store custom file and folder icons set by the user.
Features
--------
* Pure Python, cross-platform - no native Mac APIs are used.
* Provides both a Python API and a command-line tool.
* Resource data can be read from either the resource fork or the data fork.
* On Mac systems, the correct fork is selected automatically when reading a file. This allows reading both regular resource forks and resource data stored in data forks (as with ``.rsrc`` and similar files).
* On non-Mac systems, resource forks are not available, so the data fork is always used.
* Compressed resources (supported by System 7 through Mac OS 9) are automatically decompressed.
* Only the standard System 7.0 resource compression methods are supported. Resources that use non-standard decompressors cannot be decompressed.
* Object ``repr``\s are REPL-friendly: all relevant information is displayed, and long data is truncated to avoid filling up the screen by accident.
Requirements
------------
Python 3.6 or later. No other libraries are required.
Installation
------------
``rsrcfork`` is available `on PyPI <https://pypi.org/project/rsrcfork/>`_ and can be installed using ``pip``:
.. code-block:: sh
python3 -m pip install rsrcfork
Alternatively you can download the source code manually, and run this command in the source code directory to install it:
.. code-block:: sh
python3 -m pip install .
Examples
--------
Simple example
^^^^^^^^^^^^^^
.. code-block:: python
>>> import rsrcfork
>>> rf = rsrcfork.open("/Users/Shared/Test.textClipping")
>>> rf
<rsrcfork.ResourceFile at 0x1046e6048, attributes ResourceFileAttrs.0, containing 4 resource types: [b'utxt', b'utf8', b'TEXT', b'drag']>
>>> rf[b"TEXT"]
<rsrcfork.ResourceFile._LazyResourceMap at 0x10470ed30 containing one resource: rsrcfork.Resource(type=b'TEXT', id=256, name=None, attributes=ResourceAttrs.0, data=b'Here is some text')>
Automatic selection of data/resource fork
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
>>> import rsrcfork
>>> datarf = rsrcfork.open("/System/Library/Fonts/Monaco.dfont") # Resources in data fork
>>> datarf._stream
<_io.BufferedReader name='/System/Library/Fonts/Monaco.dfont'>
>>> resourcerf = rsrcfork.open("/Users/Shared/Test.textClipping") # Resources in resource fork
>>> resourcerf._stream
<_io.BufferedReader name='/Users/Shared/Test.textClipping/..namedfork/rsrc'>
Command-line interface
^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: sh
$ python3 -m rsrcfork /Users/Shared/Test.textClipping
4 resource types:
'utxt': 1 resources:
(256): 34 bytes
'utf8': 1 resources:
(256): 17 bytes
'TEXT': 1 resources:
(256): 17 bytes
'drag': 1 resources:
(128): 64 bytes
$ python3 -m rsrcfork /Users/Shared/Test.textClipping "'TEXT' (256)"
Resource 'TEXT' (256): 17 bytes:
00000000 48 65 72 65 20 69 73 20 73 6f 6d 65 20 74 65 78 |Here is some tex|
00000010 74 |t|
00000011
Limitations
-----------
This library only understands the resource file's general structure, i. e. the type codes, IDs, attributes, and data of the resources stored in the file. The data of individual resources is provided in raw bytes form and is not processed further - the format of this data is specific to each resource type.
Definitions of common resource types can be found inside Carbon and related frameworks in Apple's macOS SDKs as ``.r`` files, a format roughly similar to C struct definitions, which is used by the ``Rez`` and ``DeRez`` command-line tools to de/compile resource data. There doesn't seem to be an exact specification of this format, and most documentation on it is only available inside old manuals for MPW (Macintosh Programmer's Workshop) or similar development tools for old Mac systems. Some macOS text editors, such as BBEdit/TextWrangler and TextMate support syntax highlighting for ``.r`` files.
Writing resource data is not supported at all.
Further info on resource files
------------------------------
Sources of information about the resource fork data format, and the structure of common resource types:
* Inside Macintosh, Volume I, Chapter 5 "The Resource Manager". This book can probably be obtained in physical form somewhere, but the relevant chapter/book is also available in a few places online:
* `Apple's legacy documentation <https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf>`_
* pagetable.com, a site that happened to have a copy of the book: `info blog post <http://www.pagetable.com/?p=50>`_, `direct download <http://www.weihenstephan.org/~michaste/pagetable/mac/Inside_Macintosh.pdf>`_
* `Wikipedia <https://en.wikipedia.org/wiki/Resource_fork>`_, of course
* The `Resource Fork <http://fileformats.archiveteam.org/wiki/Resource_Fork>`_ article on "Just Solve the File Format Problem" (despite the title, this is a decent site and not clickbait)
* The `KSFL <https://github.com/kreativekorp/ksfl>`_ library (and `its wiki <https://github.com/kreativekorp/ksfl/wiki/Macintosh-Resource-File-Format>`_), written in Java, which supports reading and writing resource files
* Alysis Software Corporation's article on resource compression (found on `the company's website <http://www.alysis.us/arctechnology.htm>`_ and in `MacTech Magazine's online archive <http://preserve.mactech.com/articles/mactech/Vol.09/09.01/ResCompression/index.html>`_) has some information on the structure of certain kinds of compressed resources.
* Apple's macOS SDK, which is distributed with Xcode. The latest version of Xcode is available for free from the Mac App Store. Current and previous versions can be downloaded from `the Apple Developer download page <https://developer.apple.com/download/more/>`_. Accessing these downloads requires an Apple ID with (at least) a free developer program membership.
* Apple's MPW (Macintosh Programmer's Workshop) and related developer tools. These were previously available from Apple's FTP server at ftp://ftp.apple.com/, which is no longer functional. Because of this, these downloads are only available on mirror sites, such as http://staticky.com/mirrors/ftp.apple.com/.
If these links are no longer functional, some are archived in the `Internet Archive Wayback Machine <https://archive.org/web/>`_ or `archive.is <http://archive.is/>`_ aka `archive.fo <https://archive.fo/>`_.
Changelog
---------
Version 1.3.0
^^^^^^^^^^^^^
* Added a ``--group`` command line option to group resources in list format by type (the default), ID, or with no grouping.
* Added a ``dump-text`` output format to the command line tool. This format is identical to ``dump``, but instead of a hex dump, it outputs the resource data as text. The data is decoded as MacRoman and classic Mac newlines (``\r``) are translated. This is useful for examining resources that contain mostly plain text.
* Changed the command line tool to sort resources by type and ID, and added a ``--no-sort`` option to disable sorting and output resources in file order (which was the previous behavior).
* Renamed the ``rsrcfork.Resource`` attributes ``resource_type`` and ``resource_id`` to ``type`` and ``id``, respectively. The old names have been deprecated and will be removed in the future, but are still supported for now.
* Changed ``--format=dump`` output to match ``hexdump -C``'s format - spacing has been adjusted, and multiple subsequent identical lines are collapsed into a single ``*``.
Version 1.2.0
^^^^^^^^^^^^^
* Added support for compressed resources.
* Compressed resource data is automatically decompressed, both in the Python API and on the command line.
* This is technically a breaking change, since in previous versions the compressed resource data was returned directly. However, this change will not affect end users negatively, unless one has already implemented custom handling for compressed resources.
* Currently, only the three standard System 7.0 compression formats (``'dcmp'`` IDs 0, 1, 2) are supported. Attempting to access a resource compressed in an unsupported format results in a ``DecompressError``.
* To access the raw resource data as stored in the file, without automatic decompression, use the ``res.data_raw`` attribute (for the Python API), or the ``--no-decompress`` option (for the command-line interface). This can be used to read the resource data in its compressed form, even if the compression format is not supported.
* Improved automatic data/resource fork selection for files whose resource fork contains invalid data.
* This fixes reading certain system files with resource data in their data fork (such as HIToolbox.rsrc in HIToolbox.framework, or .dfont fonts) on recent macOS versions (at least macOS 10.14, possibly earlier). Although these files have no resource fork, recent macOS versions will successfully open the resource fork and return garbage data for it. This behavior is now detected and handled by using the data fork instead.
* Replaced the ``rsrcfork`` parameter of ``rsrcfork.open``/``ResourceFork.open`` with a new ``fork`` parameter. ``fork`` accepts string values (like the command line ``--fork`` option) rather than ``rsrcfork``'s hard to understand ``None``/``True``/``False``.
* The old ``rsrcfork`` parameter has been deprecated and will be removed in the future, but for now it still works as before.
* Added an explanatory message when a resource filter on the command line doesn't match any resources in the resource file. Previously there would either be no output or a confusing error, depending on the selected ``--format``.
* Changed resource type codes and names to be displayed in MacRoman instead of escaping all non-ASCII characters.
* Cleaned up the resource descriptions in listings and dumps to improve readability. Previously they included some redundant or unnecessary information - for example, each resource with no attributes set would be explicitly marked as "no attributes".
* Unified the formats of resource descriptions in listings and dumps, which were previously slightly different from each other.
* Improved error messages when attempting to read multiple resources using ``--format=hex`` or ``--format=raw``.
* Fixed reading from non-seekable streams not working for some resource files.
* Removed the ``allow_seek`` parameter of ``ResourceFork.__init__`` and the ``--read-mode`` command line option. They are no longer necessary, and were already practically useless before due to non-seekable stream reading being broken.
Version 1.1.3.post1
^^^^^^^^^^^^^^^^^^^
* Fixed a formatting error in the README.rst to allow upload to PyPI.
Version 1.1.3
^^^^^^^^^^^^^
**Note: This version is not available on PyPI, see version 1.1.3.post1 changelog for details.**
* Added a setuptools entry point for the command-line interface. This allows calling it using just ``rsrcfork`` instead of ``python3 -m rsrcfork``.
* Changed the default value of ``ResourceFork.__init__``'s ``close`` keyword argument from ``True`` to ``False``. This matches the behavior of classes like ``zipfile.ZipFile`` and ``tarfile.TarFile``.
* Fixed ``ResourceFork.open`` and ``ResourceFork.__init__`` not closing their streams in some cases.
* Refactored the single ``rsrcfork.py`` file into a package. This is an internal change and should have no effect on how the ``rsrcfork`` module is used.
Version 1.1.2
^^^^^^^^^^^^^
* Added support for the resource file attributes "Resources Locked" and "Printer Driver MultiFinder Compatible" from ResEdit.
* Added more dummy constants for resource attributes with unknown meaning, so that resource files containing such attributes can be loaded without errors.
Version 1.1.1
^^^^^^^^^^^^^
* Fixed overflow issue with empty resource files or empty resource type entries
* Changed ``_hexdump`` to behave more like ``hexdump -C``
Version 1.1.0
^^^^^^^^^^^^^
* Added a command-line interface - run ``python3 -m rsrcfork --help`` for more info
Version 1.0.0
^^^^^^^^^^^^^
* Initial version

6
pyproject.toml Normal file
View File

@ -0,0 +1,6 @@
[build-system]
requires = [
"setuptools >= 46.4.0",
"wheel >= 0.32.0",
]
build-backend = "setuptools.build_meta"

View File

@ -1,18 +0,0 @@
"""A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and ``.rsrc`` files."""
__version__ = "1.3.0"
__all__ = [
"Resource",
"ResourceAttrs",
"ResourceFile",
"ResourceFileAttrs",
"compress",
"open",
]
from . import api, compress
from .api import Resource, ResourceAttrs, ResourceFile, ResourceFileAttrs
# noinspection PyShadowingBuiltins
open = ResourceFile.open

View File

@ -1,456 +0,0 @@
import argparse
import collections
import enum
import itertools
import sys
import textwrap
import typing
from . import __version__, api, compress
# The encoding to use when rendering bytes as text (in four-char codes, strings, hex dumps, etc.) or reading a quoted byte string (from the command line).
_TEXT_ENCODING = "MacRoman"
# Translation table to replace ASCII non-printable characters with periods.
_TRANSLATE_NONPRINTABLES = {k: "." for k in [*range(0x20), 0x7f]}
_REZ_ATTR_NAMES = {
api.ResourceAttrs.resSysRef: None, # "Illegal or reserved attribute"
api.ResourceAttrs.resSysHeap: "sysheap",
api.ResourceAttrs.resPurgeable: "purgeable",
api.ResourceAttrs.resLocked: "locked",
api.ResourceAttrs.resProtected: "protected",
api.ResourceAttrs.resPreload: "preload",
api.ResourceAttrs.resChanged: None, # "Illegal or reserved attribute"
api.ResourceAttrs.resCompressed: None, # "Extended Header resource attribute"
}
F = typing.TypeVar("F", bound=enum.Flag, covariant=True)
def _decompose_flags(value: F) -> typing.Sequence[F]:
"""Decompose an enum.Flags instance into separate enum constants."""
return [bit for bit in type(value) if bit in value]
def _is_printable(char: str) -> bool:
"""Determine whether a character is printable for our purposes.
We mainly use Python's definition of printable (i. e. everything that Unicode does not consider a separator or "other" character). However, we also treat U+F8FF as printable, which is the private use codepoint used for the Apple logo character.
"""
return char.isprintable() or char == "\uf8ff"
def _bytes_unescape(string: str) -> bytes:
"""Convert a string containing text (in _TEXT_ENCODING) and hex escapes to a bytestring.
(We implement our own unescaping mechanism here to not depend on any of Python's string/bytes escape syntax.)
"""
out = []
it = iter(string)
for char in it:
if char == "\\":
try:
esc = next(it)
if esc in "\\\'\"":
out.append(esc)
elif esc == "x":
x1, x2 = next(it), next(it)
out.append(int(x1+x2, 16))
else:
raise ValueError(f"Unknown escape character: {esc}")
except StopIteration:
raise ValueError("End of string in escape sequence")
else:
out.extend(char.encode(_TEXT_ENCODING))
return bytes(out)
def _bytes_escape(bs: bytes, *, quote: str=None) -> str:
"""Convert a bytestring to a string (using _TEXT_ENCODING), with non-printable characters hex-escaped.
(We implement our own escaping mechanism here to not depend on Python's str or bytes repr.)
"""
out = []
for byte, char in zip(bs, bs.decode(_TEXT_ENCODING)):
if char in {quote, "\\"}:
out.append(f"\\{char}")
elif _is_printable(char):
out.append(char)
else:
out.append(f"\\x{byte:02x}")
return "".join(out)
def _filter_resources(rf: api.ResourceFile, filters: typing.Sequence[str]) -> typing.Sequence[api.Resource]:
matching = collections.OrderedDict()
for filter in filters:
if len(filter) == 4:
try:
resources = rf[filter.encode("ascii")]
except KeyError:
continue
for res in resources.values():
matching[res.type, res.id] = res
elif filter[0] == filter[-1] == "'":
try:
resources = rf[_bytes_unescape(filter[1:-1])]
except KeyError:
continue
for res in resources.values():
matching[res.type, res.id] = res
else:
pos = filter.find("'", 1)
if pos == -1:
raise ValueError(f"Invalid filter {filter!r}: Resource type must be single-quoted")
elif filter[pos + 1] != " ":
raise ValueError(f"Invalid filter {filter!r}: Resource type and ID must be separated by a space")
restype, resid = filter[:pos + 1], filter[pos + 2:]
if not restype[0] == restype[-1] == "'":
raise ValueError(
f"Invalid filter {filter!r}: Resource type is not a single-quoted type identifier: {restype!r}")
restype = _bytes_unescape(restype[1:-1])
if len(restype) != 4:
raise ValueError(
f"Invalid filter {filter!r}: Type identifier must be 4 bytes after replacing escapes, got {len(restype)} bytes: {restype!r}")
if resid[0] != "(" or resid[-1] != ")":
raise ValueError(f"Invalid filter {filter!r}: Resource ID must be parenthesized")
resid = resid[1:-1]
try:
resources = rf[restype]
except KeyError:
continue
if resid[0] == resid[-1] == '"':
name = _bytes_unescape(resid[1:-1])
for res in resources.values():
if res.name == name:
matching[res.type, res.id] = res
break
elif ":" in resid:
if resid.count(":") > 1:
raise ValueError(f"Invalid filter {filter!r}: Too many colons in ID range expression: {resid!r}")
start, end = resid.split(":")
start, end = int(start), int(end)
for res in resources.values():
if start <= res.id <= end:
matching[res.type, res.id] = res
else:
resid = int(resid)
try:
res = resources[resid]
except KeyError:
continue
matching[res.type, res.id] = res
return list(matching.values())
def _hexdump(data: bytes):
last_line = None
asterisk_shown = False
for i in range(0, len(data), 16):
line = data[i:i + 16]
# If the same 16-byte lines appear multiple times, print only the first one, and replace all further lines with a single line with an asterisk.
# This is unambiguous - to find out how many lines were collapsed this way, the user can compare the addresses of the lines before and after the asterisk.
if line == last_line:
if not asterisk_shown:
print("*")
asterisk_shown = True
else:
line_hex_left = " ".join(f"{byte:02x}" for byte in line[:8])
line_hex_right = " ".join(f"{byte:02x}" for byte in line[8:])
line_char = line.decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES)
print(f"{i:08x} {line_hex_left:<{8*2+7}} {line_hex_right:<{8*2+7}} |{line_char}|")
asterisk_shown = False
last_line = line
if data:
print(f"{len(data):08x}")
def _raw_hexdump(data: bytes):
for i in range(0, len(data), 16):
print(" ".join(f"{byte:02x}" for byte in data[i:i + 16]))
def _translate_text(data: bytes) -> str:
return data.decode(_TEXT_ENCODING).replace("\r", "\n")
def _describe_resource(res: api.Resource, *, include_type: bool, decompress: bool) -> str:
id_desc_parts = [f"{res.id}"]
if res.name is not None:
name = _bytes_escape(res.name, quote='"')
id_desc_parts.append(f'"{name}"')
id_desc = ", ".join(id_desc_parts)
content_desc_parts = []
if decompress and api.ResourceAttrs.resCompressed in res.attributes:
try:
res.data
except compress.DecompressError:
length_desc = f"decompression failed ({len(res.data_raw)} bytes compressed)"
else:
length_desc = f"{len(res.data)} bytes ({len(res.data_raw)} bytes compressed)"
else:
length_desc = f"{len(res.data_raw)} bytes"
content_desc_parts.append(length_desc)
attrs = _decompose_flags(res.attributes)
if attrs:
content_desc_parts.append(" | ".join(attr.name for attr in attrs))
content_desc = ", ".join(content_desc_parts)
desc = f"({id_desc}): {content_desc}"
if include_type:
restype = _bytes_escape(res.type, quote="'")
desc = f"'{restype}' {desc}"
return desc
def _parse_args() -> argparse.Namespace:
ap = argparse.ArgumentParser(
add_help=False,
fromfile_prefix_chars="@",
formatter_class=argparse.RawDescriptionHelpFormatter,
description=textwrap.dedent("""
Read and display resources from a file's resource or data fork.
When specifying resource filters, each one may be of one of the
following forms:
An unquoted type name (without escapes): TYPE
A quoted type name: 'TYPE'
A quoted type name and an ID: 'TYPE' (42)
A quoted type name and an ID range: 'TYPE' (24:42)
A quoted type name and a resource name: 'TYPE' ("foobar")
When multiple filters are specified, all resources matching any of them
are displayed.
"""),
)
ap.add_argument("--help", action="help", help="Display this help message and exit")
ap.add_argument("--version", action="version", version=__version__, help="Display version information and exit")
ap.add_argument("-a", "--all", action="store_true", help="When no filters are given, show all resources in full, instead of an overview")
ap.add_argument("-f", "--fork", choices=["auto", "data", "rsrc"], default="auto", help="The fork from which to read the resource data, or auto to guess (default: %(default)s)")
ap.add_argument("--no-decompress", action="store_false", dest="decompress", help="Do not decompress compressed resources, output compressed resource data as-is")
ap.add_argument("--format", choices=["dump", "dump-text", "hex", "raw", "derez"], default="dump", help="How to output the resources - human-readable info with hex dump (dump) (default), human-readable info with newline-translated data (dump-text), data only as hex (hex), data only as raw bytes (raw), or like DeRez with no resource definitions (derez)")
ap.add_argument("--group", action="store", choices=["none", "type", "id"], default="type", help="Group resources in list view by type or ID, or disable grouping (default: type)")
ap.add_argument("--no-sort", action="store_false", dest="sort", help="Output resources in the order in which they are stored in the file, instead of sorting them by type and ID")
ap.add_argument("--header-system", action="store_true", help="Output system-reserved header data and nothing else")
ap.add_argument("--header-application", action="store_true", help="Output application-specific header data and nothing else")
ap.add_argument("file", help="The file to read, or - for stdin")
ap.add_argument("filter", nargs="*", help="One or more filters to select which resources to display, or omit to show an overview of all resources")
ns = ap.parse_args()
return ns
def _show_header_data(data: bytes, *, format: str) -> None:
if format == "dump":
_hexdump(data)
elif format == "dump-text":
print(_translate_text(data))
elif format == "hex":
_raw_hexdump(data)
elif format == "raw":
sys.stdout.buffer.write(data)
elif format == "derez":
print("Cannot output file header data in derez format", file=sys.stderr)
sys.exit(1)
else:
raise ValueError(f"Unhandled output format: {format}")
def _show_filtered_resources(resources: typing.Sequence[api.Resource], format: str, decompress: bool) -> None:
if not resources:
if format in ("dump", "dump-text"):
print("No resources matched the filter")
elif format in ("hex", "raw"):
print("No resources matched the filter", file=sys.stderr)
sys.exit(1)
elif format == "derez":
print("/* No resources matched the filter */")
else:
raise AssertionError(f"Unhandled output format: {format}")
elif format in ("hex", "raw") and len(resources) != 1:
print(f"Format {format} can only output a single resource, but the filter matched {len(resources)} resources", file=sys.stderr)
sys.exit(1)
for res in resources:
if decompress:
data = res.data
else:
data = res.data_raw
if format in ("dump", "dump-text"):
# Human-readable info and hex or text dump
desc = _describe_resource(res, include_type=True, decompress=decompress)
print(f"Resource {desc}:")
if format == "dump":
_hexdump(data)
elif format == "dump-text":
print(_translate_text(data))
else:
raise AssertionError(f"Unhandled format: {format!r}")
print()
elif format == "hex":
# Data only as hex
_raw_hexdump(data)
elif format == "raw":
# Data only as raw bytes
sys.stdout.buffer.write(data)
elif format == "derez":
# Like DeRez with no resource definitions
attrs = list(_decompose_flags(res.attributes))
if decompress and api.ResourceAttrs.resCompressed in attrs:
attrs.remove(api.ResourceAttrs.resCompressed)
attrs_comment = " /* was compressed */"
else:
attrs_comment = ""
attr_descs = [_REZ_ATTR_NAMES[attr] for attr in attrs]
if None in attr_descs:
attr_descs[:] = [f"${res.attributes.value:02X}"]
parts = [str(res.id)]
if res.name is not None:
name = _bytes_escape(res.name, quote='"')
parts.append(f'"{name}"')
parts += attr_descs
restype = _bytes_escape(res.type, quote="'")
print(f"data '{restype}' ({', '.join(parts)}{attrs_comment}) {{")
for i in range(0, len(data), 16):
# Two-byte grouping is really annoying to implement.
groups = []
for j in range(0, 16, 2):
if i+j >= len(data):
break
elif i+j+1 >= len(data):
groups.append(f"{data[i+j]:02X}")
else:
groups.append(f"{data[i+j]:02X}{data[i+j+1]:02X}")
s = f'$"{" ".join(groups)}"'
comment = "/* " + data[i:i + 16].decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES) + " */"
print(f"\t{s:<54s}{comment}")
print("};")
print()
else:
raise ValueError(f"Unhandled output format: {format}")
def _list_resource_file(rf: api.ResourceFile, *, sort: bool, group: str, decompress: bool) -> None:
if rf.header_system_data != bytes(len(rf.header_system_data)):
print("Header system data:")
_hexdump(rf.header_system_data)
if rf.header_application_data != bytes(len(rf.header_application_data)):
print("Header application data:")
_hexdump(rf.header_application_data)
attrs = _decompose_flags(rf.file_attributes)
if attrs:
print("File attributes: " + " | ".join(attr.name for attr in attrs))
if len(rf) == 0:
print("No resources (empty resource file)")
return
if group == "none":
all_resources = []
for reses in rf.values():
all_resources.extend(reses.values())
if sort:
all_resources.sort(key=lambda res: (res.type, res.id))
print(f"{len(all_resources)} resources:")
for res in all_resources:
print(_describe_resource(res, include_type=True, decompress=decompress))
elif group == "type":
print(f"{len(rf)} resource types:")
restype_items = rf.items()
if sort:
restype_items = sorted(restype_items, key=lambda item: item[0])
for typecode, resources in restype_items:
restype = _bytes_escape(typecode, quote="'")
print(f"'{restype}': {len(resources)} resources:")
resources_items = resources.items()
if sort:
resources_items = sorted(resources_items, key=lambda item: item[0])
for resid, res in resources_items:
print(_describe_resource(res, include_type=False, decompress=decompress))
print()
elif group == "id":
all_resources = []
for reses in rf.values():
all_resources.extend(reses.values())
all_resources.sort(key=lambda res: res.id)
resources_by_id = {resid: list(reses) for resid, reses in itertools.groupby(all_resources, key=lambda res: res.id)}
print(f"{len(resources_by_id)} resource IDs:")
for resid, resources in resources_by_id.items():
print(f"({resid}): {len(resources)} resources:")
if sort:
resources.sort(key=lambda res: res.type)
for res in resources:
print(_describe_resource(res, include_type=True, decompress=decompress))
print()
else:
raise AssertionError(f"Unhandled group mode: {group!r}")
def main():
ns = _parse_args()
if ns.file == "-":
if ns.fork is not None:
print("Cannot specify an explicit fork when reading from stdin", file=sys.stderr)
sys.exit(1)
rf = api.ResourceFile(sys.stdin.buffer)
else:
rf = api.ResourceFile.open(ns.file, fork=ns.fork)
with rf:
if ns.header_system or ns.header_application:
if ns.header_system:
data = rf.header_system_data
else:
data = rf.header_application_data
_show_header_data(data, format=ns.format)
elif ns.filter or ns.all:
if ns.filter:
resources = _filter_resources(rf, ns.filter)
else:
resources = []
for reses in rf.values():
resources.extend(reses.values())
if ns.sort:
resources.sort(key=lambda res: (res.type, res.id))
_show_filtered_resources(resources, format=ns.format, decompress=ns.decompress)
else:
_list_resource_file(rf, sort=ns.sort, group=ns.group, decompress=ns.decompress)
sys.exit(0)
if __name__ == "__main__":
sys.exit(main())

View File

@ -1,97 +0,0 @@
import struct
from . import dcmp0
from . import dcmp1
from . import dcmp2
from .common import DecompressError
__all__ = [
"DecompressError",
"decompress",
]
# The signature of all compressed resource data, 0xa89f6572 in hex, or "®üer" in MacRoman.
COMPRESSED_SIGNATURE = b"\xa8\x9fer"
# The compression type commonly used for application resources.
COMPRESSED_TYPE_APPLICATION = 0x0801
# The compression type commonly used for System file resources.
COMPRESSED_TYPE_SYSTEM = 0x0901
# Common header for compressed resources of all types.
# 4 bytes: Signature (see above).
# 2 bytes: Length of the complete header (this common part and the type-specific part that follows it). (This meaning is just a guess - the field's value is always 0x0012, so there's no way to know for certain what it means.)
# 2 bytes: Compression type. Known so far: 0x0901 is used in the System file's resources. 0x0801 is used in other files' resources.
# 4 bytes: Length of the data after decompression.
STRUCT_COMPRESSED_HEADER = struct.Struct(">4sHHI")
# Header continuation part for an "application" compressed resource.
# 1 byte: "Working buffer fractional size" - the ratio of the compressed data size to the uncompressed data size, times 256.
# 1 byte: "Expansion buffer size" - the maximum number of bytes that the data might grow during decompression.
# 2 bytes: The ID of the 'dcmp' resource that can decompress this resource. Currently only ID 0 is supported.
# 2 bytes: Reserved (always zero).
STRUCT_COMPRESSED_APPLICATION_HEADER = struct.Struct(">BBhH")
# Header continuation part for a "system" compressed resource.
# 2 bytes: The ID of the 'dcmp' resource that can decompress this resource. Currently only ID 2 is supported.
# 4 bytes: Decompressor-specific parameters.
STRUCT_COMPRESSED_SYSTEM_HEADER = struct.Struct(">h4s")
def _decompress_application(data: bytes, decompressed_length: int, *, debug: bool=False) -> bytes:
working_buffer_fractional_size, expansion_buffer_size, dcmp_id, reserved = STRUCT_COMPRESSED_APPLICATION_HEADER.unpack_from(data)
if debug:
print(f"Working buffer fractional size: {working_buffer_fractional_size} (=> {len(data) * 256 / working_buffer_fractional_size})")
print(f"Expansion buffer size: {expansion_buffer_size}")
if dcmp_id == 0:
decompress_func = dcmp0.decompress
elif dcmp_id == 1:
decompress_func = dcmp1.decompress
else:
raise DecompressError(f"Unsupported 'dcmp' ID: {dcmp_id}, expected 0 or 1")
if reserved != 0:
raise DecompressError(f"Reserved field should be 0, not 0x{reserved:>04x}")
return decompress_func(data[STRUCT_COMPRESSED_APPLICATION_HEADER.size:], decompressed_length, debug=debug)
def _decompress_system(data: bytes, decompressed_length: int, *, debug: bool=False) -> bytes:
dcmp_id, params = STRUCT_COMPRESSED_SYSTEM_HEADER.unpack_from(data)
if dcmp_id == 2:
decompress_func = dcmp2.decompress
else:
raise DecompressError(f"Unsupported 'dcmp' ID: {dcmp_id}, expected 2")
return decompress_func(data[STRUCT_COMPRESSED_SYSTEM_HEADER.size:], decompressed_length, params, debug=debug)
def decompress(data: bytes, *, debug: bool=False) -> bytes:
"""Decompress the given compressed resource data."""
try:
signature, header_length, compression_type, decompressed_length = STRUCT_COMPRESSED_HEADER.unpack_from(data)
except struct.error:
raise DecompressError(f"Invalid header")
if signature != COMPRESSED_SIGNATURE:
raise DecompressError(f"Invalid signature: {signature!r}, expected {COMPRESSED_SIGNATURE}")
if header_length != 0x12:
raise DecompressError(f"Unsupported header length: 0x{header_length:>04x}, expected 0x12")
if compression_type == COMPRESSED_TYPE_APPLICATION:
decompress_func = _decompress_application
elif compression_type == COMPRESSED_TYPE_SYSTEM:
decompress_func = _decompress_system
else:
raise DecompressError(f"Unsupported compression type: 0x{compression_type:>04x}")
if debug:
print(f"Decompressed length: {decompressed_length}")
decompressed = decompress_func(data[STRUCT_COMPRESSED_HEADER.size:], decompressed_length, debug=debug)
if len(decompressed) != decompressed_length:
raise DecompressError(f"Actual length of decompressed data ({len(decompressed)}) does not match length stored in resource ({decompressed_length})")
return decompressed

View File

@ -1,23 +0,0 @@
import typing
class DecompressError(Exception):
"""Raised when resource data decompression fails, because the data is invalid or the compression type is not supported."""
def _read_variable_length_integer(data: bytes, position: int) -> typing.Tuple[int, int]:
"""Read a variable-length integer starting at the given position in the data, and return the integer as well as the number of bytes consumed.
This variable-length integer format is used by the 0xfe codes in the compression formats used by 'dcmp' (0) and 'dcmp' (1).
"""
assert len(data) > position
if data[position] == 0xff:
assert len(data) > position + 4
return int.from_bytes(data[position+1:position+5], "big", signed=True), 5
elif data[position] >= 0x80:
assert len(data) > position + 1
data_modified = bytes([(data[position] - 0xc0) & 0xff, data[position+1]])
return int.from_bytes(data_modified, "big", signed=True), 2
else:
return int.from_bytes(data[position:position+1], "big", signed=True), 1

View File

@ -6,9 +6,6 @@ author = dgelessus
classifiers =
Development Status :: 4 - Beta
Intended Audience :: Developers
Topic :: Software Development :: Disassemblers
Topic :: System
Topic :: Utilities
License :: OSI Approved :: MIT License
Operating System :: MacOS :: MacOS 9
Operating System :: MacOS :: MacOS X
@ -18,11 +15,20 @@ classifiers =
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Topic :: Software Development :: Disassemblers
Topic :: System
Topic :: Utilities
Typing :: Typed
license = MIT
license_file = LICENSE
description = A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and ``.rsrc`` files
long_description = file: README.rst
long_description_content_type = text/x-rst
license_files =
LICENSE
description = A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and .rsrc files
long_description = file: README.md
long_description_content_type = text/markdown
keywords =
rsrc
fork
@ -33,12 +39,64 @@ keywords =
macos
[options]
setup_requires =
setuptools>=39.2.0
# mypy can only find type hints in the package if zip_safe is set to False,
# see https://mypy.readthedocs.io/en/latest/installed_packages.html#making-pep-561-compatible-packages
zip_safe = False
python_requires = >=3.6
packages =
rsrcfork
packages = find:
package_dir =
= src
[options.package_data]
rsrcfork =
py.typed
[options.packages.find]
where = src
[options.entry_points]
console_scripts =
rsrcfork = rsrcfork.__main__:main
[flake8]
extend-exclude =
.mypy_cache/,
build/,
dist/,
# The following issues are ignored because they do not match our code style:
ignore =
# These E1 checks report many false positives for code that is (consistently) indented with tabs alone.
# indentation contains mixed spaces and tabs
E101,
# over-indented
E117,
# continuation line over-indented for hanging indent
E126,
# missing whitespace around arithmetic operator
E226,
# at least two spaces before inline comment
E261,
# line too long
E501,
# indentation contains tabs
W191,
# blank line contains whitespace
W293,
# line break before binary operator
W503,
[mypy]
files=src/**/*.py
python_version = 3.6
disallow_untyped_calls = True
disallow_untyped_defs = True
disallow_untyped_decorators = True
no_implicit_optional = True
warn_unused_ignores = True
warn_unreachable = True
warn_redundant_casts = True

38
src/rsrcfork/__init__.py Normal file
View File

@ -0,0 +1,38 @@
"""A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and ``.rsrc`` files."""
# To release a new version:
# * Remove the .dev suffix from the version number in this file.
# * Update the changelog in the README.md (rename the "next version" section to the correct version number).
# * Remove the ``dist`` directory (if it exists) to clean up any old release files.
# * Run ``python3 setup.py sdist bdist_wheel`` to build the release files.
# * Run ``python3 -m twine check dist/*`` to check the release files.
# * Fix any errors reported by the build and/or check steps.
# * Commit the changes to master.
# * Tag the release commit with the version number, prefixed with a "v" (e. g. version 1.2.3 is tagged as v1.2.3).
# * Fast-forward the release branch to the new release commit.
# * Push the master and release branches.
# * Upload the release files to PyPI using ``python3 -m twine upload dist/*``.
# * On the GitHub repo's Releases page, edit the new release tag and add the relevant changelog section from the README.md.
# After releasing:
# * (optional) Remove the build and dist directories from the previous release as they are no longer needed.
# * Bump the version number in this file to the next version and add a .dev suffix.
# * Add a new empty section for the next version to the README.md changelog.
# * Commit and push the changes to master.
__version__ = "1.8.1.dev"
__all__ = [
"Resource",
"ResourceAttrs",
"ResourceFile",
"ResourceFileAttrs",
"compress",
"open",
]
from .api import Resource, ResourceAttrs, ResourceFile, ResourceFileAttrs
from . import compress
# noinspection PyShadowingBuiltins
open = ResourceFile.open

876
src/rsrcfork/__main__.py Normal file
View File

@ -0,0 +1,876 @@
import argparse
import enum
import io
import itertools
import shutil
import sys
import typing
from . import __version__, api, compress
# The encoding to use when rendering bytes as text (in four-char codes, strings, hex dumps, etc.) or reading a quoted byte string (from the command line).
_TEXT_ENCODING = "MacRoman"
_REZ_ATTR_NAMES = {
api.ResourceAttrs.resSysRef: None, # "Illegal or reserved attribute"
api.ResourceAttrs.resSysHeap: "sysheap",
api.ResourceAttrs.resPurgeable: "purgeable",
api.ResourceAttrs.resLocked: "locked",
api.ResourceAttrs.resProtected: "protected",
api.ResourceAttrs.resPreload: "preload",
api.ResourceAttrs.resChanged: None, # "Illegal or reserved attribute"
api.ResourceAttrs.resCompressed: None, # "Extended Header resource attribute"
}
F = typing.TypeVar("F", bound=enum.Flag)
def decompose_flags(value: F) -> typing.Sequence[F]:
"""Decompose an enum.Flags instance into separate enum constants."""
return [bit for bit in type(value) if bit in value]
def join_flag_names(flags: typing.Iterable[F], sep: str = " | ") -> str:
"""Join an iterable of enum.Flag instances into a string representation based on their names.
All values in ``flags`` should be named constants.
"""
names: typing.List[str] = []
for flag in flags:
name = flag.name
if name is None:
names.append(str(flag))
else:
names.append(name)
return sep.join(names)
def is_printable(char: str) -> bool:
"""Determine whether a character is printable for our purposes.
We mainly use Python's definition of printable (i. e. everything that Unicode does not consider a separator or "other" character). However, we also treat U+F8FF as printable, which is the private use codepoint used for the Apple logo character.
"""
return char.isprintable() or char == "\uf8ff"
# Translation table to replace non-printable characters with periods.
_TRANSLATE_NONPRINTABLES = {ord(c): "." for c in bytes(range(256)).decode(_TEXT_ENCODING) if not is_printable(c)}
def bytes_unescape(string: str) -> bytes:
"""Convert a string containing text (in _TEXT_ENCODING) and hex escapes to a bytestring.
(We implement our own unescaping mechanism here to not depend on any of Python's string/bytes escape syntax.)
"""
out: typing.List[int] = []
it = iter(string)
for char in it:
if char == "\\":
try:
esc = next(it)
if esc in "\\\'\"":
out.extend(esc.encode(_TEXT_ENCODING))
elif esc == "x":
x1, x2 = next(it), next(it)
out.append(int(x1+x2, 16))
else:
raise ValueError(f"Unknown escape character: {esc}")
except StopIteration:
raise ValueError("End of string in escape sequence")
else:
out.extend(char.encode(_TEXT_ENCODING))
return bytes(out)
def bytes_escape(bs: bytes, *, quote: typing.Optional[str] = None) -> str:
"""Convert a bytestring to a string (using _TEXT_ENCODING), with non-printable characters hex-escaped.
(We implement our own escaping mechanism here to not depend on Python's str or bytes repr.)
"""
out = []
for byte, char in zip(bs, bs.decode(_TEXT_ENCODING)):
if char in {quote, "\\"}:
out.append(f"\\{char}")
elif is_printable(char):
out.append(char)
else:
out.append(f"\\x{byte:02x}")
return "".join(out)
def bytes_quote(bs: bytes, quote: str) -> str:
"""Convert a bytestring to a quoted string (using _TEXT_ENCODING), with non-printable characters hex-escaped.
(We implement our own escaping mechanism here to not depend on Python's str or bytes repr.)
"""
return quote + bytes_escape(bs, quote=quote) + quote
MIN_RESOURCE_ID = -0x8000
MAX_RESOURCE_ID = 0x7fff
class ResourceFilter(object):
type: bytes
min_id: int
max_id: int
name: typing.Optional[bytes]
@classmethod
def from_string(cls, filter: str) -> "ResourceFilter":
if len(filter) == 4:
restype = filter.encode("ascii")
return cls(restype, MIN_RESOURCE_ID, MAX_RESOURCE_ID, None)
elif filter[0] == filter[-1] == "'":
restype = bytes_unescape(filter[1:-1])
return cls(restype, MIN_RESOURCE_ID, MAX_RESOURCE_ID, None)
else:
pos = filter.find("'", 1)
if pos == -1:
raise ValueError(f"Invalid filter {filter!r}: Resource type must be single-quoted")
elif filter[pos + 1] != " ":
raise ValueError(f"Invalid filter {filter!r}: Resource type and ID must be separated by a space")
restype_str, resid_str = filter[:pos + 1], filter[pos + 2:]
if not restype_str[0] == restype_str[-1] == "'":
raise ValueError(
f"Invalid filter {filter!r}: Resource type is not a single-quoted type identifier: {restype_str!r}")
restype = bytes_unescape(restype_str[1:-1])
if resid_str[0] != "(" or resid_str[-1] != ")":
raise ValueError(f"Invalid filter {filter!r}: Resource ID must be parenthesized")
resid_str = resid_str[1:-1]
if resid_str[0] == resid_str[-1] == '"':
name = bytes_unescape(resid_str[1:-1])
return cls(restype, MIN_RESOURCE_ID, MAX_RESOURCE_ID, name)
elif ":" in resid_str:
if resid_str.count(":") > 1:
raise ValueError(f"Invalid filter {filter!r}: Too many colons in ID range expression: {resid_str!r}")
start_str, end_str = resid_str.split(":")
start, end = int(start_str), int(end_str)
return cls(restype, start, end, None)
else:
resid = int(resid_str)
return cls(restype, resid, resid, None)
def __init__(self, restype: bytes, min_id: int, max_id: int, name: typing.Optional[bytes]) -> None:
super().__init__()
if len(restype) != 4:
raise ValueError(f"Invalid filter: Type code must be exactly 4 bytes long, not {len(restype)} bytes: {restype!r}")
elif min_id < MIN_RESOURCE_ID:
raise ValueError(f"Invalid filter: Resource ID lower bound ({min_id}) cannot be lower than {MIN_RESOURCE_ID}")
elif max_id > MAX_RESOURCE_ID:
raise ValueError(f"Invalid filter: Resource ID upper bound ({max_id}) cannot be greater than {MAX_RESOURCE_ID}")
elif min_id > max_id:
raise ValueError(f"Invalid filter: Resource ID lower bound ({min_id}) cannot be greater than upper bound ({max_id})")
self.type = restype
self.min_id = min_id
self.max_id = max_id
self.name = name
def __repr__(self) -> str:
return f"{type(self).__name__}({self.type!r}, {self.min_id!r}, {self.max_id!r}, {self.name!r})"
def matches(self, res: api.Resource) -> bool:
return res.type == self.type and self.min_id <= res.id <= self.max_id and (self.name is None or res.name == self.name)
def filter_resources(rf: api.ResourceFile, filters: typing.Sequence[str]) -> typing.Iterable[api.Resource]:
if not filters:
# Special case: an empty list of filters matches all resources rather than none
for reses in rf.values():
yield from reses.values()
else:
filter_objs = [ResourceFilter.from_string(filter) for filter in filters]
for reses in rf.values():
for res in reses.values():
if any(filter_obj.matches(res) for filter_obj in filter_objs):
yield res
def hexdump_stream(stream: typing.BinaryIO) -> typing.Iterable[str]:
last_line = None
asterisk_shown = False
line = stream.read(16)
i = 0
while line:
# If the same 16-byte lines appear multiple times, print only the first one, and replace all further lines with a single line with an asterisk.
# This is unambiguous - to find out how many lines were collapsed this way, the user can compare the addresses of the lines before and after the asterisk.
if line == last_line:
if not asterisk_shown:
yield "*"
asterisk_shown = True
else:
line_hex_left = " ".join(f"{byte:02x}" for byte in line[:8])
line_hex_right = " ".join(f"{byte:02x}" for byte in line[8:])
line_char = line.decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES)
yield f"{i:08x} {line_hex_left:<{8*2+7}} {line_hex_right:<{8*2+7}} |{line_char}|"
asterisk_shown = False
last_line = line
i += len(line)
line = stream.read(16)
if i:
yield f"{i:08x}"
def hexdump(data: bytes) -> typing.Iterable[str]:
yield from hexdump_stream(io.BytesIO(data))
def raw_hexdump_stream(stream: typing.BinaryIO) -> typing.Iterable[str]:
line = stream.read(16)
while line:
yield " ".join(f"{byte:02x}" for byte in line)
line = stream.read(16)
def raw_hexdump(data: bytes) -> typing.Iterable[str]:
yield from raw_hexdump_stream(io.BytesIO(data))
def translate_text(data: bytes) -> str:
return data.decode(_TEXT_ENCODING).replace("\r", "\n")
def describe_resource(res: api.Resource, *, include_type: bool, decompress: bool) -> str:
id_desc_parts = [f"{res.id}"]
if res.name is not None:
id_desc_parts.append(bytes_quote(res.name, '"'))
id_desc = ", ".join(id_desc_parts)
content_desc_parts = []
if decompress and api.ResourceAttrs.resCompressed in res.attributes:
try:
res.compressed_info
except compress.DecompressError:
length_desc = f"unparseable compressed data header ({res.length_raw} bytes compressed)"
else:
assert res.compressed_info is not None
length_desc = f"{res.length} bytes ({res.length_raw} bytes compressed)"
else:
length_desc = f"{res.length_raw} bytes"
content_desc_parts.append(length_desc)
attrs = decompose_flags(res.attributes)
if attrs:
content_desc_parts.append(join_flag_names(attrs))
content_desc = ", ".join(content_desc_parts)
desc = f"({id_desc}): {content_desc}"
if include_type:
quoted_restype = bytes_quote(res.type, "'")
desc = f"{quoted_restype} {desc}"
return desc
def show_filtered_resources(resources: typing.Sequence[api.Resource], format: str, decompress: bool) -> None:
if not resources:
if format in ("dump", "dump-text"):
print("No resources matched the filter")
elif format in ("hex", "raw"):
print("No resources matched the filter", file=sys.stderr)
sys.exit(1)
elif format == "derez":
print("/* No resources matched the filter */")
else:
raise AssertionError(f"Unhandled output format: {format}")
elif format in ("hex", "raw") and len(resources) != 1:
print(f"Format {format} can only output a single resource, but the filter matched {len(resources)} resources", file=sys.stderr)
sys.exit(1)
for res in resources:
if decompress:
open_func = res.open
else:
open_func = res.open_raw
with open_func() as f:
if format in ("dump", "dump-text"):
# Human-readable info and hex or text dump
desc = describe_resource(res, include_type=True, decompress=decompress)
print(f"Resource {desc}:")
if format == "dump":
for line in hexdump_stream(f):
print(line)
elif format == "dump-text":
print(translate_text(f.read()))
else:
raise AssertionError(f"Unhandled format: {format!r}")
print()
elif format == "hex":
# Data only as hex
for line in raw_hexdump_stream(f):
print(line)
elif format == "raw":
# Data only as raw bytes
shutil.copyfileobj(f, sys.stdout.buffer)
elif format == "derez":
# Like DeRez with no resource definitions
attrs = list(decompose_flags(res.attributes))
if decompress and api.ResourceAttrs.resCompressed in attrs:
attrs.remove(api.ResourceAttrs.resCompressed)
attrs_comment = " /* was compressed */"
else:
attrs_comment = ""
attr_descs_with_none = [_REZ_ATTR_NAMES[attr] for attr in attrs]
if None in attr_descs_with_none:
attr_descs = [f"${res.attributes.value:02X}"]
else:
attr_descs = typing.cast(typing.List[str], attr_descs_with_none)
parts = [str(res.id)]
if res.name is not None:
parts.append(bytes_quote(res.name, '"'))
parts += attr_descs
quoted_restype = bytes_quote(res.type, "'")
print(f"data {quoted_restype} ({', '.join(parts)}{attrs_comment}) {{")
bytes_line = f.read(16)
while bytes_line:
# Two-byte grouping is really annoying to implement.
groups = []
for j in range(0, 16, 2):
if j >= len(bytes_line):
break
elif j+1 >= len(bytes_line):
groups.append(f"{bytes_line[j]:02X}")
else:
groups.append(f"{bytes_line[j]:02X}{bytes_line[j+1]:02X}")
s = f'$"{" ".join(groups)}"'
comment = "/* " + bytes_line.decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES) + " */"
print(f"\t{s:<54s}{comment}")
bytes_line = f.read(16)
print("};")
print()
else:
raise ValueError(f"Unhandled output format: {format}")
def list_resources(resources: typing.List[api.Resource], *, sort: bool, group: str, decompress: bool) -> None:
if len(resources) == 0:
print("No resources matched the filter")
return
if group == "none":
if sort:
resources.sort(key=lambda res: (res.type, res.id))
print(f"{len(resources)} resources:")
for res in resources:
print(describe_resource(res, include_type=True, decompress=decompress))
elif group == "type":
if sort:
resources.sort(key=lambda res: res.type)
resources_by_type = {restype: list(reses) for restype, reses in itertools.groupby(resources, key=lambda res: res.type)}
print(f"{len(resources_by_type)} resource types:")
for restype, restype_resources in resources_by_type.items():
quoted_restype = bytes_quote(restype, "'")
print(f"{quoted_restype}: {len(restype_resources)} resources:")
if sort:
restype_resources.sort(key=lambda res: res.id)
for res in restype_resources:
print(describe_resource(res, include_type=False, decompress=decompress))
print()
elif group == "id":
resources.sort(key=lambda res: res.id)
resources_by_id = {resid: list(reses) for resid, reses in itertools.groupby(resources, key=lambda res: res.id)}
print(f"{len(resources_by_id)} resource IDs:")
for resid, resid_resources in resources_by_id.items():
print(f"({resid}): {len(resid_resources)} resources:")
if sort:
resid_resources.sort(key=lambda res: res.type)
for res in resid_resources:
print(describe_resource(res, include_type=True, decompress=decompress))
print()
else:
raise AssertionError(f"Unhandled group mode: {group!r}")
def format_compressed_header_info(header_info: compress.CompressedHeaderInfo) -> typing.Iterable[str]:
yield f"Header length: {header_info.header_length} bytes"
yield f"Compression type: 0x{header_info.compression_type:>04x}"
yield f"Decompressed data length: {header_info.decompressed_length} bytes"
yield f"'dcmp' resource ID: {header_info.dcmp_id}"
if isinstance(header_info, compress.CompressedType8HeaderInfo):
yield f"Working buffer fractional size: {header_info.working_buffer_fractional_size} 256ths of compressed data length"
yield f"Expansion buffer size: {header_info.expansion_buffer_size} bytes"
elif isinstance(header_info, compress.CompressedType9HeaderInfo):
yield f"Decompressor-specific parameters: {header_info.parameters!r}"
else:
raise AssertionError(f"Unhandled compressed header info type: {type(header_info)}")
def make_subcommand_parser(subs: typing.Any, name: str, *, help: str, description: str, **kwargs: typing.Any) -> argparse.ArgumentParser:
"""Add a subcommand parser with some slightly modified defaults to a subcommand set.
This function is used to ensure that all subcommands use the same base configuration for their ArgumentParser.
"""
ap = subs.add_parser(
name,
formatter_class=argparse.RawDescriptionHelpFormatter,
help=help,
description=description,
allow_abbrev=False,
add_help=False,
**kwargs,
)
ap.add_argument("--help", action="help", help="Display this help message and exit.")
return ap
def add_resource_file_args(ap: argparse.ArgumentParser) -> None:
"""Define common options/arguments for specifying an input resource file.
This includes a positional argument for the resource file's path, and the ``--fork`` option to select which fork of the file to use.
"""
ap.add_argument("--fork", choices=["auto", "data", "rsrc"], default="auto", help="The fork from which to read the resource file data, or auto to guess. Default: %(default)s")
ap.add_argument("file", help="The file from which to read resources, or - for stdin.")
RESOURCE_FILTER_HELP = """
The resource filters use syntax similar to Rez (resource definition) files.
Each filter can have one of the following forms:
An unquoted type name (without escapes): TYPE
A quoted type name: 'TYPE'
A quoted type name and an ID: 'TYPE' (42)
A quoted type name and an ID range: 'TYPE' (24:42)
A quoted type name and a resource name: 'TYPE' ("foobar")
Note that the resource filter syntax uses quotes, parentheses and spaces,
which have special meanings in most shells. It is recommended to quote each
resource filter (using double quotes) to ensure that it is not interpreted
or rewritten by the shell.
"""
def add_resource_filter_args(ap: argparse.ArgumentParser) -> None:
"""Define common options/arguments for specifying resource filters."""
ap.add_argument("filter", nargs="*", help="One or more filters to select resources. If no filters are specified, all resources are selected.")
def open_resource_file(file: str, *, fork: str) -> api.ResourceFile:
"""Open a resource file at the given path, using the specified fork."""
if file == "-":
if fork != "auto":
print("Cannot specify an explicit fork when reading from stdin", file=sys.stderr)
sys.exit(1)
return api.ResourceFile(sys.stdin.buffer)
else:
return api.ResourceFile.open(file, fork=fork)
def do_read_header(ns: argparse.Namespace) -> typing.NoReturn:
with open_resource_file(ns.file, fork=ns.fork) as rf:
if ns.format in {"dump", "dump-text"}:
if ns.format == "dump":
dump_func = hexdump
elif ns.format == "dump-text":
def dump_func(data: bytes) -> typing.Iterable[str]:
yield translate_text(data)
else:
raise AssertionError(f"Unhandled --format: {ns.format!r}")
if ns.part in {"system", "all"}:
print("System-reserved header data:")
for line in dump_func(rf.header_system_data):
print(line)
if ns.part in {"application", "all"}:
print("Application-specific header data:")
for line in dump_func(rf.header_application_data):
print(line)
elif ns.format in {"hex", "raw"}:
if ns.part == "system":
data = rf.header_system_data
elif ns.part == "application":
data = rf.header_application_data
elif ns.part == "all":
data = rf.header_system_data + rf.header_application_data
else:
raise AssertionError(f"Unhandled --part: {ns.part!r}")
if ns.format == "hex":
for line in raw_hexdump(data):
print(line)
elif ns.format == "raw":
sys.stdout.buffer.write(data)
else:
raise AssertionError(f"Unhandled --format: {ns.format!r}")
else:
raise AssertionError(f"Unhandled --format: {ns.format!r}")
sys.exit(0)
def do_info(ns: argparse.Namespace) -> typing.NoReturn:
with open_resource_file(ns.file, fork=ns.fork) as rf:
print("System-reserved header data:")
for line in hexdump(rf.header_system_data):
print(line)
print()
print("Application-specific header data:")
for line in hexdump(rf.header_application_data):
print(line)
print()
print(f"Resource data starts at {rf.data_offset:#x} and is {rf.data_length:#x} bytes long")
print(f"Resource map starts at {rf.map_offset:#x} and is {rf.map_length:#x} bytes long")
attrs = decompose_flags(rf.file_attributes)
if attrs:
attrs_desc = join_flag_names(attrs)
else:
attrs_desc = "(none)"
print(f"Resource map attributes: {attrs_desc}")
print(f"Resource map type list starts at {rf.map_type_list_offset:#x} (relative to map start) and contains {len(rf)} types")
print(f"Resource map name list starts at {rf.map_name_list_offset:#x} (relative to map start)")
sys.exit(0)
def do_list(ns: argparse.Namespace) -> typing.NoReturn:
with open_resource_file(ns.file, fork=ns.fork) as rf:
if not rf:
print("No resources (empty resource file)")
else:
resources = list(filter_resources(rf, ns.filter))
list_resources(resources, sort=ns.sort, group=ns.group, decompress=ns.decompress)
sys.exit(0)
def do_resource_info(ns: argparse.Namespace) -> typing.NoReturn:
with open_resource_file(ns.file, fork=ns.fork) as rf:
resources = list(filter_resources(rf, ns.filter))
if ns.sort:
resources.sort(key=lambda res: (res.type, res.id))
if not resources:
print("No resources matched the filter")
sys.exit(0)
for res in resources:
quoted_restype = bytes_quote(res.type, "'")
print(f"Resource {quoted_restype} ({res.id}):")
if res.name is None:
print("\tName: none (unnamed)")
else:
assert res.name_offset is not None
quoted_name = bytes_quote(res.name, '"')
print(f'\tName: {quoted_name} (at offset {res.name_offset} in name list)')
attrs = decompose_flags(res.attributes)
if attrs:
attrs_desc = join_flag_names(attrs)
else:
attrs_desc = "(none)"
print(f"\tAttributes: {attrs_desc}")
print(f"\tData: {res.length_raw} bytes stored at offset {res.data_raw_offset} in resource file data")
if api.ResourceAttrs.resCompressed in res.attributes and ns.decompress:
print()
print("\tCompressed resource header info:")
try:
res.compressed_info
except compress.DecompressError:
print("\t\t(failed to parse compressed resource header)")
else:
assert res.compressed_info is not None
for line in format_compressed_header_info(res.compressed_info):
print(f"\t\t{line}")
print()
sys.exit(0)
def do_read(ns: argparse.Namespace) -> typing.NoReturn:
with open_resource_file(ns.file, fork=ns.fork) as rf:
resources = list(filter_resources(rf, ns.filter))
if ns.sort:
resources.sort(key=lambda res: (res.type, res.id))
show_filtered_resources(resources, format=ns.format, decompress=ns.decompress)
sys.exit(0)
def do_raw_compress_info(ns: argparse.Namespace) -> typing.NoReturn:
if ns.input_file == "-":
in_stream = sys.stdin.buffer
close_in_stream = False
else:
in_stream = open(ns.input_file, "rb")
close_in_stream = True
try:
for line in format_compressed_header_info(compress.CompressedHeaderInfo.parse_stream(in_stream)):
print(line)
finally:
if close_in_stream:
in_stream.close()
sys.exit(0)
def do_raw_decompress(ns: argparse.Namespace) -> typing.NoReturn:
if ns.input_file == "-":
in_stream = sys.stdin.buffer
close_in_stream = False
else:
in_stream = open(ns.input_file, "rb")
close_in_stream = True
try:
header_info = compress.CompressedHeaderInfo.parse_stream(in_stream)
# Open the output file only after parsing the header, so that the file is only created (or its existing contents deleted) if the input file is valid.
if ns.output_file == "-":
if ns.debug:
print("Cannot use --debug if the decompression output file is - (stdout).", file=sys.stderr)
print("The debug output goes to stdout and would conflict with the decompressed data.", file=sys.stderr)
sys.exit(2)
out_stream = sys.stdout.buffer
close_out_stream = False
else:
out_stream = open(ns.output_file, "wb")
close_out_stream = True
try:
for chunk in compress.decompress_stream_parsed(header_info, in_stream, debug=ns.debug):
out_stream.write(chunk)
finally:
if close_out_stream:
out_stream.close()
finally:
if close_in_stream:
in_stream.close()
sys.exit(0)
def main() -> typing.NoReturn:
"""Main function of the CLI.
This function is a valid setuptools entry point. Arguments are passed in sys.argv, and every execution path ends with a sys.exit call. (setuptools entry points are also permitted to return an integer, which will be treated as an exit code. We do not use this feature and instead always call sys.exit ourselves.)
"""
ap = argparse.ArgumentParser(
description="""
%(prog)s is a tool for working with Classic Mac OS resource files.
Currently this tool can only read resource files; modifying/writing resource
files is not supported yet.
Note: This tool is intended for human users. The output format is not
machine-readable and may change at any time. The command-line syntax usually
does not change much across versions, but this should not be relied on.
Automated scripts and programs should use the Python API provided by the
rsrcfork library, which this tool is a part of.
""",
formatter_class=argparse.RawDescriptionHelpFormatter,
allow_abbrev=False,
add_help=False,
)
ap.add_argument("--help", action="help", help="Display this help message and exit.")
ap.add_argument("--version", action="version", version=__version__, help="Display version information and exit.")
subs = ap.add_subparsers(
dest="subcommand",
# TODO Add required=True (added in Python 3.7) once we drop Python 3.6 compatibility.
metavar="SUBCOMMAND",
)
ap_read_header = make_subcommand_parser(
subs,
"read-header",
help="Read the header data from a resource file.",
description="""
Read and output a resource file's header data.
The header data consists of two parts:
The system-reserved data is 112 bytes long and used by the Classic Mac OS
Finder as temporary storage space. It usually contains parts of the
file metadata (name, type/creator code, etc.).
The application-specific data is 128 bytes long and is available for use by
applications. In practice it usually contains junk data that happened to be in
memory when the resource file was written.
Mac OS X does not use the header data fields anymore. Resource files written
on Mac OS X normally have both parts of the header data set to all zero bytes.
""",
)
ap_read_header.add_argument("--format", choices=["dump", "dump-text", "hex", "raw"], default="dump", help="How to output the header data: human-readable info with hex dump (dump) (default), human-readable info with newline-translated data (dump-text), data only as hex (hex), or data only as raw bytes (raw). Default: %(default)s")
ap_read_header.add_argument("--part", choices=["system", "application", "all"], default="all", help="Which part of the header to read. Default: %(default)s")
add_resource_file_args(ap_read_header)
ap_info = make_subcommand_parser(
subs,
"info",
help="Display technical information about the resource file.",
description="""
Display technical information and stats about the resource file.
""",
)
add_resource_file_args(ap_info)
ap_list = make_subcommand_parser(
subs,
"list",
help="List the resources in a file.",
description=f"""
List the resources stored in a resource file.
Each resource's type, ID, name (if any), attributes (if any), and data length
are displayed. For compressed resources, the compressed and decompressed data
length are displayed, as well as the ID of the 'dcmp' resource used to
decompress the resource data.
{RESOURCE_FILTER_HELP}
""",
)
ap_list.add_argument("--no-decompress", action="store_false", dest="decompress", help="Do not parse the data header of compressed resources and only output their compressed length.")
ap_list.add_argument("--group", action="store", choices=["none", "type", "id"], default="type", help="Group resources by type or ID, or disable grouping. Default: %(default)s")
ap_list.add_argument("--no-sort", action="store_false", dest="sort", help="Output resources in the order in which they are stored in the file, instead of sorting them by type and ID.")
add_resource_file_args(ap_list)
add_resource_filter_args(ap_list)
ap_resource_info = make_subcommand_parser(
subs,
"resource-info",
help="Display technical information about resources.",
description=f"""
Display technical information about one or more resources.
{RESOURCE_FILTER_HELP}
""",
)
ap_resource_info.add_argument("--no-decompress", action="store_false", dest="decompress", help="Do not parse the contents of compressed resources, only output regular resource information.")
ap_resource_info.add_argument("--no-sort", action="store_false", dest="sort", help="Output resources in the order in which they are stored in the file, instead of sorting them by type and ID.")
add_resource_file_args(ap_resource_info)
add_resource_filter_args(ap_resource_info)
ap_read = make_subcommand_parser(
subs,
"read",
help="Read data from resources.",
description=f"""
Read the data of one or more resources.
{RESOURCE_FILTER_HELP}
""",
)
ap_read.add_argument("--no-decompress", action="store_false", dest="decompress", help="Do not decompress compressed resources, output the raw compressed resource data.")
ap_read.add_argument("--format", choices=["dump", "dump-text", "hex", "raw", "derez"], default="dump", help="How to output the resources: human-readable info with hex dump (dump), human-readable info with newline-translated data (dump-text), data only as hex (hex), data only as raw bytes (raw), or like DeRez with no resource definitions (derez). Default: %(default)s")
ap_read.add_argument("--no-sort", action="store_false", dest="sort", help="Output resources in the order in which they are stored in the file, instead of sorting them by type and ID.")
add_resource_file_args(ap_read)
add_resource_filter_args(ap_read)
ap_raw_compress_info = make_subcommand_parser(
subs,
"raw-compress-info",
help="Display technical information about raw compressed resource data.",
description="""
Display technical information about raw compressed resource data that is stored
in a standalone file and not as a resource in a resource file.
""",
)
ap_raw_compress_info.add_argument("input_file", help="The file from which to read the compressed resource data, or - for stdin.")
ap_raw_decompress = make_subcommand_parser(
subs,
"raw-decompress",
help="Decompress raw compressed resource data.",
description="""
Decompress raw compressed resource data that is stored in a standalone file
and not as a resource in a resource file.
This subcommand can be used in a shell pipeline by passing - as the input and
output file name, i. e. "%(prog)s - -".
Note: All other rsrcfork subcommands natively support compressed resources and
will automatically decompress them as needed. This subcommand is only needed
to decompress resource data that has been read from a resource file in
compressed form (e. g. using --no-decompress or another tool that does not
handle resource compression).
""",
)
ap_raw_decompress.add_argument("--debug", action="store_true", help="Display debugging output from the decompressor on stdout. Cannot be used if the output file is - (stdout).")
ap_raw_decompress.add_argument("input_file", help="The file from which to read the compressed resource data, or - for stdin.")
ap_raw_decompress.add_argument("output_file", help="The file to which to write the decompressed resource data, or - for stdout.")
ns = ap.parse_args()
if ns.subcommand is None:
# TODO Remove this branch once we drop Python 3.6 compatibility, because this case will be handled by passing required=True to add_subparsers (see above).
print("Missing subcommand", file=sys.stderr)
sys.exit(2)
elif ns.subcommand == "read-header":
do_read_header(ns)
elif ns.subcommand == "info":
do_info(ns)
elif ns.subcommand == "list":
do_list(ns)
elif ns.subcommand == "resource-info":
do_resource_info(ns)
elif ns.subcommand == "read":
do_read(ns)
elif ns.subcommand == "raw-compress-info":
do_raw_compress_info(ns)
elif ns.subcommand == "raw-decompress":
do_raw_decompress(ns)
else:
raise AssertionError(f"Subcommand not handled: {ns.subcommand!r}")
if __name__ == "__main__":
sys.exit(main())

93
src/rsrcfork/_io_utils.py Normal file
View File

@ -0,0 +1,93 @@
"""A collection of utility functions and classes related to IO streams. For internal use only."""
import io
import typing
def read_exact(stream: typing.BinaryIO, byte_count: int) -> bytes:
"""Read byte_count bytes from the stream and raise an exception if too few bytes are read (i. e. if EOF was hit prematurely).
:param stream: The stream to read from.
:param byte_count: The number of bytes to read.
:return: The read data, which is exactly ``byte_count`` bytes long.
:raise EOFError: If not enough data could be read from the stream.
"""
data = stream.read(byte_count)
if len(data) != byte_count:
raise EOFError(f"Attempted to read {byte_count} bytes of data, but only got {len(data)} bytes")
return data
if typing.TYPE_CHECKING:
class PeekableIO(typing.Protocol):
"""Minimal protocol for binary IO streams that support the peek method.
The peek method is supported by various standard Python binary IO streams, such as io.BufferedReader. If a stream does not natively support the peek method, it may be wrapped using the custom helper function make_peekable.
"""
def readable(self) -> bool:
...
def read(self, size: typing.Optional[int] = ...) -> bytes:
...
def peek(self, size: int = ...) -> bytes:
...
class _PeekableIOWrapper(object):
"""Wrapper class to add peek support to an existing stream. Do not instantiate this class directly, use the make_peekable function instead.
Python provides a standard io.BufferedReader class, which supports the peek method. However, according to its documentation, it only supports wrapping io.RawIOBase subclasses, and not streams which are already otherwise buffered.
Warning: this class does not perform any buffering of its own, outside of what is required to make peek work. It is strongly recommended to only wrap streams that are already buffered or otherwise fast to read from. In particular, raw streams (io.RawIOBase subclasses) should be wrapped using io.BufferedReader instead.
"""
_wrapped: typing.BinaryIO
_readahead: bytes
def __init__(self, wrapped: typing.BinaryIO) -> None:
super().__init__()
self._wrapped = wrapped
self._readahead = b""
def readable(self) -> bool:
return self._wrapped.readable()
def read(self, size: typing.Optional[int] = None) -> bytes:
if size is None or size < 0:
ret = self._readahead + self._wrapped.read()
self._readahead = b""
elif size <= len(self._readahead):
ret = self._readahead[:size]
self._readahead = self._readahead[size:]
else:
ret = self._readahead + self._wrapped.read(size - len(self._readahead))
self._readahead = b""
return ret
def peek(self, size: int = -1) -> bytes:
if not self._readahead:
self._readahead = self._wrapped.read(io.DEFAULT_BUFFER_SIZE if size < 0 else size)
return self._readahead
def make_peekable(stream: typing.BinaryIO) -> "PeekableIO":
"""Wrap an arbitrary binary IO stream so that it supports the peek method.
The stream is wrapped as efficiently as possible (or not at all if it already supports the peek method). However, in the worst case a custom wrapper class needs to be used, which may not be particularly efficient and only supports a very minimal interface. The only methods that are guaranteed to exist on the returned stream are readable, read, and peek.
"""
if hasattr(stream, "peek"):
# Stream is already peekable, nothing to be done.
return typing.cast("PeekableIO", stream)
elif not typing.TYPE_CHECKING and isinstance(stream, io.RawIOBase):
# This branch is skipped when type checking - mypy incorrectly warns about this code being unreachable, because it thinks that a typing.BinaryIO cannot be an instance of io.RawIOBase.
# Raw IO streams can be wrapped efficiently using BufferedReader.
return io.BufferedReader(stream)
else:
# Other streams need to be wrapped using our custom wrapper class.
return _PeekableIOWrapper(stream)

View File

@ -4,9 +4,11 @@ import enum
import io
import os
import struct
import types
import typing
import warnings
from . import _io_utils
from . import compress
# The formats of all following structures is as described in the Inside Macintosh book (see module docstring).
@ -58,9 +60,11 @@ STRUCT_RESOURCE_REFERENCE = struct.Struct(">hHI4x")
# 1 byte: Length of following resource name.
STRUCT_RESOURCE_NAME_HEADER = struct.Struct(">B")
class InvalidResourceFileError(Exception):
pass
class ResourceFileAttrs(enum.Flag):
"""Resource file attribute flags. The descriptions for these flags are taken from comments on the map*Bit and map* enum constants in <CarbonCore/Resources.h>."""
@ -81,6 +85,7 @@ class ResourceFileAttrs(enum.Flag):
_BIT_1 = 1 << 1
_BIT_0 = 1 << 0
class ResourceAttrs(enum.Flag):
"""Resource attribute flags. The descriptions for these flags are taken from comments on the res*Bit and res* enum constants in <CarbonCore/Resources.h>."""
@ -93,51 +98,149 @@ class ResourceAttrs(enum.Flag):
resChanged = 1 << 1 # "Existing resource changed since last update", "Resource changed?"
resCompressed = 1 << 0 # "indicates that the resource data is compressed" (only documented in https://github.com/kreativekorp/ksfl/wiki/Macintosh-Resource-File-Format)
class Resource(object):
"""A single resource from a resource file."""
__slots__ = ("type", "id", "name", "attributes", "data_raw", "_data_decompressed")
_resfile: "ResourceFile"
type: bytes
id: int
name_offset: int
_name: typing.Optional[bytes]
attributes: ResourceAttrs
data_raw_offset: int
_length_raw: int
_data_raw: bytes
_compressed_info: compress.common.CompressedHeaderInfo
_data_decompressed: bytes
def __init__(self, resource_type: bytes, resource_id: int, name: typing.Optional[bytes], attributes: ResourceAttrs, data_raw: bytes):
"""Create a new resource with the given type code, ID, name, attributes, and data."""
def __init__(self, resfile: "ResourceFile", resource_type: bytes, resource_id: int, name_offset: int, attributes: ResourceAttrs, data_raw_offset: int) -> None:
"""Create a resource object representing a resource stored in a resource file.
External code should not call this constructor manually. Resources should be looked up through a ResourceFile object instead.
"""
super().__init__()
self.type: bytes = resource_type
self.id: int = resource_id
self.name: typing.Optional[bytes] = name
self.attributes: ResourceAttrs = attributes
self.data_raw: bytes = data_raw
self._resfile = resfile
self.type = resource_type
self.id = resource_id
self.name_offset = name_offset
self.attributes = attributes
self.data_raw_offset = data_raw_offset
def __repr__(self):
def __repr__(self) -> str:
try:
data = self.data
with self.open() as f:
data = f.read(33)
except compress.DecompressError:
decompress_ok = False
data = self.data_raw
with self.open_raw() as f:
data = f.read(33)
else:
decompress_ok = True
if len(data) > 32:
data_repr = f"<{len(data)} bytes: {data[:32]}...>"
data_repr = f"<{len(data)} bytes: {data[:32]!r}...>"
else:
data_repr = repr(data)
if not decompress_ok:
data_repr = f"<decompression failed - compressed data: {data_repr}>"
return f"{type(self).__module__}.{type(self).__qualname__}(type={self.type}, id={self.id}, name={self.name}, attributes={self.attributes}, data={data_repr})"
return f"<{type(self).__qualname__} type {self.type!r}, id {self.id}, name {self.name!r}, attributes {self.attributes}, data {data_repr}>"
@property
def resource_type(self) -> bytes:
warnings.warn(DeprecationWarning("The resource_type attribute has been deprecated and will be removed in a future version. Please use the type attribute instead."))
warnings.warn(DeprecationWarning("The resource_type attribute has been deprecated and will be removed in a future version. Please use the type attribute instead."), stacklevel=2)
return self.type
@property
def resource_id(self) -> int:
warnings.warn(DeprecationWarning("The resource_id attribute has been deprecated and will be removed in a future version. Please use the id attribute instead."))
warnings.warn(DeprecationWarning("The resource_id attribute has been deprecated and will be removed in a future version. Please use the id attribute instead."), stacklevel=2)
return self.id
@property
def name(self) -> typing.Optional[bytes]:
try:
return self._name
except AttributeError:
if self.name_offset == 0xffff:
self._name = None
else:
self._resfile._stream.seek(self._resfile.map_offset + self._resfile.map_name_list_offset + self.name_offset)
(name_length,) = self._resfile._stream_unpack(STRUCT_RESOURCE_NAME_HEADER)
self._name = self._resfile._read_exact(name_length)
return self._name
@property
def data_raw(self) -> bytes:
try:
return self._data_raw
except AttributeError:
self._resfile._stream.seek(self._resfile.data_offset + self.data_raw_offset + STRUCT_RESOURCE_DATA_HEADER.size)
self._data_raw = _io_utils.read_exact(self._resfile._stream, self.length_raw)
return self._data_raw
def open_raw(self) -> typing.BinaryIO:
"""Create a binary file-like object that provides access to this resource's raw data, which may be compressed.
The returned stream is read-only and seekable.
Multiple resource data streams can be opened at the same time for the same resource or for different resources in the same file,
without interfering with each other.
If a :class:`ResourceFile` is closed,
all resource data streams for that file may become unusable.
This method is recommended over :attr:`data_raw` if the data is accessed incrementally or only partially,
because the stream API does not require the entire resource data to be read in advance.
"""
return io.BytesIO(self.data_raw)
@property
def compressed_info(self) -> typing.Optional[compress.common.CompressedHeaderInfo]:
"""The compressed resource header information, or None if this resource is not compressed.
Accessing this attribute may raise a DecompressError if the resource data is compressed and the header could not be parsed. To access the unparsed header data, use the data_raw attribute.
"""
if ResourceAttrs.resCompressed in self.attributes:
try:
return self._compressed_info
except AttributeError:
with self.open_raw() as f:
self._compressed_info = compress.common.CompressedHeaderInfo.parse_stream(f)
return self._compressed_info
else:
return None
@property
def length_raw(self) -> int:
"""The length of the raw resource data, which may be compressed.
Accessing this attribute may be faster than computing len(self.data_raw) manually.
"""
try:
return self._length_raw
except AttributeError:
self._resfile._stream.seek(self._resfile.data_offset + self.data_raw_offset)
(self._length_raw,) = self._resfile._stream_unpack(STRUCT_RESOURCE_DATA_HEADER)
return self._length_raw
@property
def length(self) -> int:
"""The length of the resource data. If the resource data is compressed, this is the length of the data after decompression.
Accessing this attribute may be faster than computing len(self.data) manually.
"""
if self.compressed_info is not None:
return self.compressed_info.decompressed_length
else:
return self.length_raw
@property
def data(self) -> bytes:
"""The resource data, decompressed if necessary.
@ -145,72 +248,101 @@ class Resource(object):
Accessing this attribute may raise a DecompressError if the resource data is compressed and could not be decompressed. To access the compressed resource data, use the data_raw attribute.
"""
if ResourceAttrs.resCompressed in self.attributes:
if self.compressed_info is not None:
try:
return self._data_decompressed
except AttributeError:
self._data_decompressed = compress.decompress(self.data_raw)
with self.open_raw() as compressed_f:
compressed_f.seek(self.compressed_info.header_length)
self._data_decompressed = b"".join(compress.decompress_stream_parsed(self.compressed_info, compressed_f))
return self._data_decompressed
else:
return self.data_raw
def open(self) -> typing.BinaryIO:
"""Create a binary file-like object that provides access to this resource's data, decompressed if necessary.
The returned stream is read-only and seekable.
Multiple resource data streams can be opened at the same time for the same resource or for different resources in the same file,
without interfering with each other.
If a :class:`ResourceFile` is closed,
all resource data streams for that file may become unusable.
This method is recommended over :attr:`data` if the data is accessed incrementally or only partially,
because the stream API does not require the entire resource data to be read (and possibly decompressed) in advance.
"""
return io.BytesIO(self.data)
class ResourceFile(collections.abc.Mapping):
class _LazyResourceMap(typing.Mapping[int, Resource]):
"""Internal class: Read-only wrapper for a mapping of resource IDs to resource objects.
This class behaves like a normal read-only mapping. The main difference to a plain dict (or similar mapping) is that this mapping has a specialized repr to avoid excessive output when working in the REPL.
"""
type: bytes
_submap: typing.Mapping[int, Resource]
def __init__(self, resource_type: bytes, submap: typing.Mapping[int, Resource]) -> None:
"""Create a new _LazyResourceMap that wraps the given mapping."""
super().__init__()
self.type = resource_type
self._submap = submap
def __len__(self) -> int:
"""Get the number of resources with this type code."""
return len(self._submap)
def __iter__(self) -> typing.Iterator[int]:
"""Iterate over the IDs of all resources with this type code."""
return iter(self._submap)
def __contains__(self, key: object) -> bool:
"""Check if a resource with the given ID exists for this type code."""
return key in self._submap
def __getitem__(self, key: int) -> Resource:
"""Get a resource with the given ID for this type code."""
return self._submap[key]
def __repr__(self) -> str:
if len(self) == 1:
contents = f"one resource: {next(iter(self.values()))}"
else:
contents = f"{len(self)} resources with IDs {list(self)}"
return f"<Resource map for type {self.type!r}, containing {contents}>"
class ResourceFile(typing.Mapping[bytes, typing.Mapping[int, Resource]], typing.ContextManager["ResourceFile"]):
"""A resource file reader operating on a byte stream."""
# noinspection PyProtectedMember
class _LazyResourceMap(collections.abc.Mapping):
"""Internal class: Lazy mapping of resource IDs to resource objects, returned when subscripting a ResourceFile."""
def __init__(self, resfile: "ResourceFile", restype: bytes):
"""Create a new _LazyResourceMap "containing" all resources in resfile that have the type code restype."""
super().__init__()
self._resfile: "ResourceFile" = resfile
self._restype: bytes = restype
self._submap: typing.Mapping[int, typing.Tuple[int, ResourceAttrs, int]] = self._resfile._references[self._restype]
def __len__(self):
"""Get the number of resources with this type code."""
return len(self._submap)
def __iter__(self):
"""Iterate over the IDs of all resources with this type code."""
return iter(self._submap)
def __contains__(self, key: int):
"""Check if a resource with the given ID exists for this type code."""
return key in self._submap
def __getitem__(self, key: int) -> Resource:
"""Get a resource with the given ID for this type code."""
name_offset, attributes, data_offset = self._submap[key]
if name_offset == 0xffff:
name = None
else:
self._resfile._stream.seek(self._resfile.map_offset + self._resfile.map_name_list_offset + name_offset)
(name_length,) = self._resfile._stream_unpack(STRUCT_RESOURCE_NAME_HEADER)
name = self._resfile._read_exact(name_length)
self._resfile._stream.seek(self._resfile.data_offset + data_offset)
(data_length,) = self._resfile._stream_unpack(STRUCT_RESOURCE_DATA_HEADER)
data = self._resfile._read_exact(data_length)
return Resource(self._restype, key, name, attributes, data)
def __repr__(self):
if len(self) == 1:
return f"<{type(self).__module__}.{type(self).__qualname__} at {id(self):#x} containing one resource: {next(iter(self.values()))}>"
else:
return f"<{type(self).__module__}.{type(self).__qualname__} at {id(self):#x} containing {len(self)} resources with IDs: {list(self)}>"
_close_stream: bool
_stream: typing.BinaryIO
data_offset: int
map_offset: int
data_length: int
map_length: int
header_system_data: bytes
header_application_data: bytes
map_type_list_offset: int
map_name_list_offset: int
file_attributes: ResourceFileAttrs
_reference_counts: typing.MutableMapping[bytes, int]
_references: typing.MutableMapping[bytes, typing.MutableMapping[int, Resource]]
@classmethod
def open(cls, filename: typing.Union[str, bytes, os.PathLike], *, fork: str="auto", **kwargs) -> "ResourceFile":
def open(cls, filename: typing.Union[str, os.PathLike], *, fork: str = "auto", **kwargs: typing.Any) -> "ResourceFile":
"""Open the file at the given path as a ResourceFile.
The fork parameter controls which fork of the file the resource data will be read from. It accepts the following values:
@ -237,7 +369,7 @@ class ResourceFile(collections.abc.Mapping):
fork = "rsrc"
else:
fork = "data"
warnings.warn(DeprecationWarning(f"The rsrcfork parameter has been deprecated and will be removed in a future version. Please use fork={fork!r} instead of rsrcfork={kwargs['rsrcfork']!r}."))
warnings.warn(DeprecationWarning(f"The rsrcfork parameter has been deprecated and will be removed in a future version. Please use fork={fork!r} instead of rsrcfork={kwargs['rsrcfork']!r}."), stacklevel=2)
del kwargs["rsrcfork"]
if fork == "auto":
@ -269,7 +401,7 @@ class ResourceFile(collections.abc.Mapping):
else:
raise ValueError(f"Unsupported value for the fork parameter: {fork!r}")
def __init__(self, stream: typing.io.BinaryIO, *, close: bool=False):
def __init__(self, stream: typing.BinaryIO, *, close: bool = False) -> None:
"""Create a ResourceFile wrapping the given byte stream.
To read resource file data from a bytes object, wrap it in an io.BytesIO.
@ -283,8 +415,7 @@ class ResourceFile(collections.abc.Mapping):
super().__init__()
self._close_stream: bool = close
self._stream: typing.io.BinaryIO
self._close_stream = close
if stream.seekable():
self._stream = stream
else:
@ -303,12 +434,12 @@ class ResourceFile(collections.abc.Mapping):
def _read_exact(self, byte_count: int) -> bytes:
"""Read byte_count bytes from the stream and raise an exception if too few bytes are read (i. e. if EOF was hit prematurely)."""
data = self._stream.read(byte_count)
if len(data) != byte_count:
raise InvalidResourceFileError(f"Attempted to read {byte_count} bytes of data, but only got {len(data)} bytes")
return data
try:
return _io_utils.read_exact(self._stream, byte_count)
except EOFError as e:
raise InvalidResourceFileError(str(e))
def _stream_unpack(self, st: struct.Struct) -> typing.Tuple:
def _stream_unpack(self, st: struct.Struct) -> tuple:
"""Unpack data from the stream according to the struct st. The number of bytes to read is determined using st.size, so variable-sized structs cannot be used with this method."""
try:
@ -316,17 +447,11 @@ class ResourceFile(collections.abc.Mapping):
except struct.error as e:
raise InvalidResourceFileError(str(e))
def _read_header(self):
def _read_header(self) -> None:
"""Read the resource file header, starting at the current stream position."""
assert self._stream.tell() == 0
self.data_offset: int
self.map_offset: int
self.data_length: int
self.map_length: int
self.header_system_data: bytes
self.header_application_data: bytes
(
self.data_offset,
self.map_offset,
@ -339,25 +464,23 @@ class ResourceFile(collections.abc.Mapping):
if self._stream.tell() != self.data_offset:
raise InvalidResourceFileError(f"The data offset ({self.data_offset}) should point exactly to the end of the file header ({self._stream.tell()})")
def _read_map_header(self):
def _read_map_header(self) -> None:
"""Read the map header, starting at the current stream position."""
assert self._stream.tell() == self.map_offset
self.map_type_list_offset: int
self.map_name_list_offset: int
(
_file_attributes,
self.map_type_list_offset,
self.map_name_list_offset,
) = self._stream_unpack(STRUCT_RESOURCE_MAP_HEADER)
self.file_attributes: ResourceFileAttrs = ResourceFileAttrs(_file_attributes)
self.file_attributes = ResourceFileAttrs(_file_attributes)
def _read_all_resource_types(self):
def _read_all_resource_types(self) -> None:
"""Read all resource types, starting at the current stream position."""
self._reference_counts: typing.MutableMapping[bytes, int] = collections.OrderedDict()
self._reference_counts = collections.OrderedDict()
(type_list_length_m1,) = self._stream_unpack(STRUCT_RESOURCE_TYPE_LIST_HEADER)
type_list_length = (type_list_length_m1 + 1) % 0x10000
@ -371,13 +494,13 @@ class ResourceFile(collections.abc.Mapping):
count = (count_m1 + 1) % 0x10000
self._reference_counts[resource_type] = count
def _read_all_references(self):
def _read_all_references(self) -> None:
"""Read all resource references, starting at the current stream position."""
self._references: typing.MutableMapping[bytes, typing.MutableMapping[int, typing.Tuple[int, ResourceAttrs, int]]] = collections.OrderedDict()
self._references = collections.OrderedDict()
for resource_type, count in self._reference_counts.items():
resmap: typing.MutableMapping[int, typing.Tuple[int, ResourceAttrs, int]] = collections.OrderedDict()
resmap: typing.MutableMapping[int, Resource] = collections.OrderedDict()
self._references[resource_type] = resmap
for _ in range(count):
(
@ -389,9 +512,9 @@ class ResourceFile(collections.abc.Mapping):
attributes = attributes_and_data_offset >> 24
data_offset = attributes_and_data_offset & ((1 << 24) - 1)
resmap[resource_id] = (name_offset, ResourceAttrs(attributes), data_offset)
resmap[resource_id] = Resource(self, resource_type, resource_id, name_offset, ResourceAttrs(attributes), data_offset)
def close(self):
def close(self) -> None:
"""Close this ResourceFile.
If close=True was passed when this ResourceFile was created, the underlying stream's close method is called as well.
@ -400,31 +523,37 @@ class ResourceFile(collections.abc.Mapping):
if self._close_stream:
self._stream.close()
def __enter__(self):
pass
def __enter__(self) -> "ResourceFile":
return self
def __exit__(self, exc_type, exc_val, exc_tb):
def __exit__(
self,
exc_type: typing.Optional[typing.Type[BaseException]],
exc_val: typing.Optional[BaseException],
exc_tb: typing.Optional[types.TracebackType]
) -> typing.Optional[bool]:
self.close()
return None
def __len__(self):
def __len__(self) -> int:
"""Get the number of resource types in this ResourceFile."""
return len(self._references)
def __iter__(self):
def __iter__(self) -> typing.Iterator[bytes]:
"""Iterate over all resource types in this ResourceFile."""
return iter(self._references)
def __contains__(self, key: bytes):
def __contains__(self, key: object) -> bool:
"""Check whether this ResourceFile contains any resources of the given type."""
return key in self._references
def __getitem__(self, key: bytes) -> "ResourceFile._LazyResourceMap":
def __getitem__(self, key: bytes) -> "_LazyResourceMap":
"""Get a lazy mapping of all resources with the given type in this ResourceFile."""
return ResourceFile._LazyResourceMap(self, key)
return _LazyResourceMap(key, self._references[key])
def __repr__(self):
def __repr__(self) -> str:
return f"<{type(self).__module__}.{type(self).__qualname__} at {id(self):#x}, attributes {self.file_attributes}, containing {len(self)} resource types: {list(self)}>"

View File

@ -0,0 +1,68 @@
import io
import typing
from . import dcmp0
from . import dcmp1
from . import dcmp2
from .common import DecompressError, CompressedHeaderInfo, CompressedType8HeaderInfo, CompressedType9HeaderInfo
__all__ = [
"CompressedHeaderInfo",
"CompressedType8HeaderInfo",
"CompressedType9HeaderInfo",
"DecompressError",
"decompress",
"decompress_parsed",
"decompress_stream",
"decompress_stream_parsed",
]
# Maps 'dcmp' IDs to their corresponding Python implementations.
# Each decompressor has the signature (header_info: CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool=False) -> typing.Iterator[bytes].
DECOMPRESSORS = {
0: dcmp0.decompress_stream,
1: dcmp1.decompress_stream,
2: dcmp2.decompress_stream,
}
def decompress_stream_parsed(header_info: CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Decompress compressed resource data from a stream, whose header has already been read and parsed into a CompressedHeaderInfo object."""
try:
decompress_func = DECOMPRESSORS[header_info.dcmp_id]
except KeyError:
raise DecompressError(f"Unsupported 'dcmp' ID: {header_info.dcmp_id}")
decompressed_length = 0
for chunk in decompress_func(header_info, stream, debug=debug):
decompressed_length += len(chunk)
yield chunk
if decompressed_length != header_info.decompressed_length:
raise DecompressError(f"Actual length of decompressed data ({decompressed_length}) does not match length stored in resource ({header_info.decompressed_length})")
def decompress_parsed(header_info: CompressedHeaderInfo, data: bytes, *, debug: bool = False) -> bytes:
"""Decompress the given compressed resource data, whose header has already been removed and parsed into a CompressedHeaderInfo object."""
return b"".join(decompress_stream_parsed(header_info, io.BytesIO(data), debug=debug))
def decompress_stream(stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Decompress compressed resource data from a stream."""
header_info = CompressedHeaderInfo.parse_stream(stream)
if debug:
print(f"Compressed resource data header: {header_info}")
yield from decompress_stream_parsed(header_info, stream, debug=debug)
def decompress(data: bytes, *, debug: bool = False) -> bytes:
"""Decompress the given compressed resource data."""
return b"".join(decompress_stream(io.BytesIO(data), debug=debug))

View File

@ -0,0 +1,133 @@
import io
import struct
import typing
from .. import _io_utils
class DecompressError(Exception):
"""Raised when resource data decompression fails, because the data is invalid or the compression type is not supported."""
# The signature of all compressed resource data, 0xa89f6572 in hex, or "®üer" in MacRoman.
COMPRESSED_SIGNATURE = b"\xa8\x9fer"
# The number of the "type 8" compression type. This type is used in the Finder, ResEdit, and some other system files.
COMPRESSED_TYPE_8 = 0x0801
# The number of the "type 9" compression type. This type is used in the System file and System 7.5's Installer.
COMPRESSED_TYPE_9 = 0x0901
# Common header for compressed resources of all types.
# 4 bytes: Signature (see above).
# 2 bytes: Length of the complete header (this common part and the type-specific part that follows it). (This meaning is just a guess - the field's value is always 0x0012, so there's no way to know for certain what it means.)
# 2 bytes: Compression type. Known so far: 0x0801 ("type 8") and 0x0901 ("type 9").
# 4 bytes: Length of the data after decompression.
# 6 bytes: Remainder of the header. The exact format varies depending on the compression type.
STRUCT_COMPRESSED_HEADER = struct.Struct(">4sHHI6s")
# Remainder of header for a "type 8" compressed resource.
# 1 byte: "Working buffer fractional size" - the ratio of the compressed data size to the uncompressed data size, times 256.
# 1 byte: "Expansion buffer size" - the maximum number of bytes that the data might grow during decompression.
# 2 bytes: The ID of the 'dcmp' resource that can decompress this resource. Currently only ID 0 is supported.
# 2 bytes: Reserved (always zero).
STRUCT_COMPRESSED_TYPE_8_HEADER = struct.Struct(">BBhH")
# Remainder of header for a "type 9" compressed resource.
# 2 bytes: The ID of the 'dcmp' resource that can decompress this resource. Currently only ID 2 is supported.
# 4 bytes: Decompressor-specific parameters.
STRUCT_COMPRESSED_TYPE_9_HEADER = struct.Struct(">h4s")
class CompressedHeaderInfo(object):
@classmethod
def parse_stream(cls, stream: typing.BinaryIO) -> "CompressedHeaderInfo":
try:
signature, header_length, compression_type, decompressed_length, remainder = STRUCT_COMPRESSED_HEADER.unpack(stream.read(STRUCT_COMPRESSED_HEADER.size))
except struct.error:
raise DecompressError("Invalid header")
if signature != COMPRESSED_SIGNATURE:
raise DecompressError(f"Invalid signature: {signature!r}, expected {COMPRESSED_SIGNATURE!r}")
if header_length not in {0, 0x12}:
raise DecompressError(f"Unsupported header length value: 0x{header_length:>04x}, expected 0x12 or 0")
if compression_type == COMPRESSED_TYPE_8:
working_buffer_fractional_size, expansion_buffer_size, dcmp_id, reserved = STRUCT_COMPRESSED_TYPE_8_HEADER.unpack(remainder)
if reserved != 0:
raise DecompressError(f"Reserved field should be 0, not 0x{reserved:>04x}")
return CompressedType8HeaderInfo(header_length, compression_type, decompressed_length, dcmp_id, working_buffer_fractional_size, expansion_buffer_size)
elif compression_type == COMPRESSED_TYPE_9:
dcmp_id, parameters = STRUCT_COMPRESSED_TYPE_9_HEADER.unpack(remainder)
return CompressedType9HeaderInfo(header_length, compression_type, decompressed_length, dcmp_id, parameters)
else:
raise DecompressError(f"Unsupported compression type: 0x{compression_type:>04x}")
@classmethod
def parse(cls, data: bytes) -> "CompressedHeaderInfo":
return cls.parse_stream(io.BytesIO(data))
header_length: int
compression_type: int
decompressed_length: int
dcmp_id: int
def __init__(self, header_length: int, compression_type: int, decompressed_length: int, dcmp_id: int) -> None:
super().__init__()
self.header_length = header_length
self.compression_type = compression_type
self.decompressed_length = decompressed_length
self.dcmp_id = dcmp_id
class CompressedType8HeaderInfo(CompressedHeaderInfo):
working_buffer_fractional_size: int
expansion_buffer_size: int
def __init__(self, header_length: int, compression_type: int, decompressed_length: int, dcmp_id: int, working_buffer_fractional_size: int, expansion_buffer_size: int) -> None:
super().__init__(header_length, compression_type, decompressed_length, dcmp_id)
self.working_buffer_fractional_size = working_buffer_fractional_size
self.expansion_buffer_size = expansion_buffer_size
def __repr__(self) -> str:
return f"{type(self).__qualname__}(header_length={self.header_length}, compression_type=0x{self.compression_type:>04x}, decompressed_length={self.decompressed_length}, dcmp_id={self.dcmp_id}, working_buffer_fractional_size={self.working_buffer_fractional_size}, expansion_buffer_size={self.expansion_buffer_size})"
class CompressedType9HeaderInfo(CompressedHeaderInfo):
parameters: bytes
def __init__(self, header_length: int, compression_type: int, decompressed_length: int, dcmp_id: int, parameters: bytes) -> None:
super().__init__(header_length, compression_type, decompressed_length, dcmp_id)
self.parameters = parameters
def __repr__(self) -> str:
return f"{type(self).__qualname__}(header_length={self.header_length}, compression_type=0x{self.compression_type:>04x}, decompressed_length={self.decompressed_length}, dcmp_id={self.dcmp_id}, parameters={self.parameters!r})"
def read_exact(stream: typing.BinaryIO, byte_count: int) -> bytes:
"""Read byte_count bytes from the stream and raise an exception if too few bytes are read (i. e. if EOF was hit prematurely)."""
try:
return _io_utils.read_exact(stream, byte_count)
except EOFError as e:
raise DecompressError(str(e))
def read_variable_length_integer(stream: typing.BinaryIO) -> int:
"""Read a variable-length integer from the stream.
This variable-length integer format is used by the 0xfe codes in the compression formats used by 'dcmp' (0) and 'dcmp' (1).
"""
head = read_exact(stream, 1)
if head[0] == 0xff:
return int.from_bytes(read_exact(stream, 4), "big", signed=True)
elif head[0] >= 0x80:
data_modified = bytes([(head[0] - 0xc0) & 0xff]) + read_exact(stream, 1)
return int.from_bytes(data_modified, "big", signed=True)
else:
return int.from_bytes(head, "big", signed=True)

View File

@ -1,3 +1,5 @@
import typing
from . import common
# Lookup table for codes in range(0x4b, 0xfe).
@ -36,133 +38,103 @@ TABLE = [TABLE_DATA[i:i + 2] for i in range(0, len(TABLE_DATA), 2)]
assert len(TABLE) == len(range(0x4b, 0xfe))
def decompress(data: bytes, decompressed_length: int, *, debug: bool=False) -> bytes:
"""Decompress compressed data in the format used by 'dcmp' (0)."""
def decompress_stream_inner(header_info: common.CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Internal helper function, implements the main decompression algorithm. Only called from decompress_stream, which performs some extra checks and debug logging."""
prev_literals = []
decompressed = b""
if not isinstance(header_info, common.CompressedType8HeaderInfo):
raise common.DecompressError(f"Incorrect header type: {type(header_info).__qualname__}")
i = 0
prev_literals: typing.List[bytes] = []
while i < len(data):
byte = data[i]
while True: # Loop is terminated when the EOF marker (0xff) is encountered
(byte,) = common.read_exact(stream, 1)
if debug:
print(f"Tag byte 0x{byte:>02x}, at 0x{i:x}, decompressing to 0x{len(decompressed):x}")
print(f"Tag byte 0x{byte:>02x}")
if byte in range(0x00, 0x20):
# Literal byte sequence.
if byte in (0x00, 0x10):
# The length of the literal data is stored in the next byte.
count_div2 = data[i+1]
begin = i + 2
(count_div2,) = common.read_exact(stream, 1)
else:
# The length of the literal data is stored in the low nibble of the tag byte.
count_div2 = byte >> 0 & 0xf
begin = i + 1
end = begin + 2*count_div2
count = 2 * count_div2
# Controls whether or not the literal is stored so that it can be referenced again later.
do_store = byte >= 0x10
literal = data[begin:end]
literal = common.read_exact(stream, count)
if debug:
print(f"Literal (storing: {do_store})")
print(f"\t-> {literal}")
decompressed += literal
if do_store:
if debug:
print(f"\t-> stored as literal number 0x{len(prev_literals):x}")
print(f"\t-> storing as literal number 0x{len(prev_literals):x}")
prev_literals.append(literal)
i = end
yield literal
elif byte in (0x20, 0x21):
# Backreference to a previous literal, 2-byte form.
# This can reference literals with index in range(0x28, 0x228).
table_index = 0x28 + ((byte - 0x20) << 8 | data[i+1])
i += 2
(next_byte,) = common.read_exact(stream, 1)
table_index = 0x28 + ((byte - 0x20) << 8 | next_byte)
if debug:
print(f"Backreference (2-byte form) to 0x{table_index:>02x}")
literal = prev_literals[table_index]
if debug:
print(f"\t-> {literal}")
decompressed += literal
yield prev_literals[table_index]
elif byte == 0x22:
# Backreference to a previous literal, 3-byte form.
# This can reference any literal with index 0x28 and higher, but is only necessary for literals with index 0x228 and higher.
table_index = 0x28 + int.from_bytes(data[i+1:i+3], "big", signed=False)
i += 3
table_index = 0x28 + int.from_bytes(common.read_exact(stream, 2), "big", signed=False)
if debug:
print(f"Backreference (3-byte form) to 0x{table_index:>02x}")
literal = prev_literals[table_index]
if debug:
print(f"\t-> {literal}")
decompressed += literal
yield prev_literals[table_index]
elif byte in range(0x23, 0x4b):
# Backreference to a previous literal, 1-byte form.
# This can reference literals with indices in range(0x28).
table_index = byte - 0x23
i += 1
if debug:
print(f"Backreference (1-byte form) to 0x{table_index:>02x}")
literal = prev_literals[table_index]
if debug:
print(f"\t-> {literal}")
decompressed += literal
yield prev_literals[table_index]
elif byte in range(0x4b, 0xfe):
# Reference into a fixed table of two-byte literals.
# All compressed resources use the same table.
table_index = byte - 0x4b
i += 1
if debug:
print(f"Fixed table reference to 0x{table_index:>02x}")
entry = TABLE[table_index]
if debug:
print(f"\t-> {entry}")
decompressed += entry
yield TABLE[table_index]
elif byte == 0xfe:
# Extended code, whose meaning is controlled by the following byte.
i += 1
kind = data[i]
(kind,) = common.read_exact(stream, 1)
if debug:
print(f"Extended code: 0x{kind:>02x}")
i += 1
if kind == 0x00:
# Compact representation of (part of) a segment loader jump table, as used in 'CODE' (0) resources.
if debug:
print(f"Segment loader jump table entries")
print("Segment loader jump table entries")
# All generated jump table entries have the same segment number.
segment_number_int, length = common._read_variable_length_integer(data, i)
i += length
segment_number_int = common.read_variable_length_integer(stream)
if debug:
print(f"\t-> segment number: {segment_number_int:#x}")
# The tail part of all jump table entries (i. e. everything except for the address).
entry_tail = b"?<" + segment_number_int.to_bytes(2, "big", signed=True) + b"\xa9\xf0"
if debug:
print(f"\t-> tail of first entry: {entry_tail}")
entry_tail = b"?<" + segment_number_int.to_bytes(2, "big", signed=False) + b"\xa9\xf0"
# The tail is output once *without* an address in front, i. e. the first entry's address must be generated manually by a previous code.
decompressed += entry_tail
yield entry_tail
count, length = common._read_variable_length_integer(data, i)
i += length
count = common.read_variable_length_integer(stream)
if count <= 0:
raise common.DecompressError(f"Jump table entry count must be greater than 0, not {count}")
# The second entry's address is stored explicitly.
current_int, length = common._read_variable_length_integer(data, i)
i += length
current_int = common.read_variable_length_integer(stream)
if debug:
print(f"-> address of second entry: {current_int:#x}")
entry = current_int.to_bytes(2, "big", signed=False) + entry_tail
if debug:
print(f"-> second entry: {entry}")
decompressed += entry
print(f"\t-> address of second entry: {current_int:#x}")
yield current_int.to_bytes(2, "big", signed=False) + entry_tail
for _ in range(1, count):
# All further entries' addresses are stored as differences relative to the previous entry's address.
diff, length = common._read_variable_length_integer(data, i)
i += length
diff = common.read_variable_length_integer(stream)
# For some reason, each difference is 6 higher than it should be.
diff -= 6
@ -170,10 +142,7 @@ def decompress(data: bytes, decompressed_length: int, *, debug: bool=False) -> b
current_int = (current_int + diff) & 0xffff
if debug:
print(f"\t-> difference {diff:#x}: {current_int:#x}")
entry = current_int.to_bytes(2, "big", signed=False) + entry_tail
if debug:
print(f"\t-> {entry}")
decompressed += entry
yield current_int.to_bytes(2, "big", signed=False) + entry_tail
elif kind in (0x02, 0x03):
# Repeat 1 or 2 bytes a certain number of times.
@ -188,42 +157,36 @@ def decompress(data: bytes, decompressed_length: int, *, debug: bool=False) -> b
print(f"Repeat {byte_count}-byte value")
# The byte(s) to repeat, stored as a variable-length integer. The value is treated as unsigned, i. e. the integer is never negative.
to_repeat_int, length = common._read_variable_length_integer(data, i)
i += length
to_repeat_int = common.read_variable_length_integer(stream)
try:
to_repeat = to_repeat_int.to_bytes(byte_count, "big", signed=False)
except OverflowError:
raise common.DecompressError(f"Value to repeat out of range for {byte_count}-byte repeat: {to_repeat_int:#x}")
count_m1, length = common._read_variable_length_integer(data, i)
i += length
count = count_m1 + 1
count = common.read_variable_length_integer(stream) + 1
if count <= 0:
raise common.DecompressError(f"Repeat count must be positive: {count}")
repeated = to_repeat * count
if debug:
print(f"\t-> {to_repeat} * {count}: {repeated}")
decompressed += repeated
print(f"\t-> {to_repeat!r} * {count}")
yield to_repeat * count
elif kind == 0x04:
# A sequence of 16-bit signed integers, with each integer encoded as a difference relative to the previous integer. The first integer is stored explicitly.
if debug:
print(f"Difference-encoded 16-bit integers")
print("Difference-encoded 16-bit integers")
# The first integer is stored explicitly, as a signed value.
initial_int, length = common._read_variable_length_integer(data, i)
i += length
initial_int = common.read_variable_length_integer(stream)
try:
initial = initial_int.to_bytes(2, "big", signed=True)
except OverflowError:
raise common.DecompressError(f"Initial value out of range for 16-bit integer difference encoding: {initial_int:#x}")
if debug:
print(f"\t-> initial: {initial}")
decompressed += initial
print(f"\t-> initial: 0x{initial_int:>04x}")
yield initial
count, length = common._read_variable_length_integer(data, i)
i += length
count = common.read_variable_length_integer(stream)
if count < 0:
raise common.DecompressError(f"Count cannot be negative: {count}")
@ -232,64 +195,75 @@ def decompress(data: bytes, decompressed_length: int, *, debug: bool=False) -> b
for _ in range(count):
# The difference to the previous integer is stored as an 8-bit signed integer.
# The usual variable-length integer format is *not* used here.
diff = int.from_bytes(data[i:i+1], "big", signed=True)
i += 1
diff = int.from_bytes(common.read_exact(stream, 1), "big", signed=True)
# Simulate 16-bit integer wraparound.
current_int = (current_int + diff) & 0xffff
current = current_int.to_bytes(2, "big", signed=False)
if debug:
print(f"\t-> difference {diff:#x}: {current}")
decompressed += current
print(f"\t-> difference {diff:#x}: 0x{current_int:>04x}")
yield current_int.to_bytes(2, "big", signed=False)
elif kind == 0x06:
# A sequence of 32-bit signed integers, with each integer encoded as a difference relative to the previous integer. The first integer is stored explicitly.
if debug:
print(f"Difference-encoded 16-bit integers")
print("Difference-encoded 32-bit integers")
# The first integer is stored explicitly, as a signed value.
initial_int, length = common._read_variable_length_integer(data, i)
i += length
initial_int = common.read_variable_length_integer(stream)
try:
initial = initial_int.to_bytes(4, "big", signed=True)
except OverflowError:
raise common.DecompressError(f"Initial value out of range for 32-bit integer difference encoding: {initial_int:#x}")
if debug:
print(f"\t-> initial: {initial}")
decompressed += initial
print(f"\t-> initial: 0x{initial_int:>08x}")
yield initial
count, length = common._read_variable_length_integer(data, i)
i += length
count = common.read_variable_length_integer(stream)
assert count >= 0
# To make the following calculations simpler, the signed initial_int value is converted to unsigned.
current_int = initial_int & 0xffffffff
for _ in range(count):
# The difference to the previous integer is stored as a variable-length integer, whose value may be negative.
diff, length = common._read_variable_length_integer(data, i)
i += length
diff = common.read_variable_length_integer(stream)
# Simulate 32-bit integer wraparound.
current_int = (current_int + diff) & 0xffffffff
current = current_int.to_bytes(4, "big", signed=False)
if debug:
print(f"\t-> difference {diff:#x}: {current}")
decompressed += current
print(f"\t-> difference {diff:#x}: 0x{current_int:>08x}")
yield current_int.to_bytes(4, "big", signed=False)
else:
raise common.DecompressError(f"Unknown extended code: 0x{kind:>02x}")
elif byte == 0xff:
# End of data marker, always occurs exactly once as the last byte of the compressed data.
if debug:
print("End marker")
if i != len(data) - 1:
raise common.DecompressError(f"End marker reached at {i}, before the expected end of data at {len(data) - 1}")
i += 1
# Check that there really is no more data left.
extra = stream.read(1)
if extra:
raise common.DecompressError(f"Extra data encountered after end of data marker (first extra byte: {extra!r})")
break
else:
raise common.DecompressError(f"Unknown tag byte: 0x{data[i]:>02x}")
raise common.DecompressError(f"Unknown tag byte: 0x{byte:>02x}")
def decompress_stream(header_info: common.CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Decompress compressed data in the format used by 'dcmp' (0)."""
if decompressed_length % 2 != 0 and len(decompressed) == decompressed_length + 1:
# Special case: if the decompressed data length stored in the header is odd and one less than the length of the actual decompressed data, drop the last byte.
# This is necessary because nearly all codes generate data in groups of 2 or 4 bytes, so it is basically impossible to represent data with an odd length using this compression format.
decompressed = decompressed[:-1]
return decompressed
decompressed_length = 0
for chunk in decompress_stream_inner(header_info, stream, debug=debug):
if debug:
print(f"\t-> {chunk!r}")
if header_info.decompressed_length % 2 != 0 and decompressed_length + len(chunk) == header_info.decompressed_length + 1:
# Special case: if the decompressed data length stored in the header is odd and one less than the length of the actual decompressed data, drop the last byte.
# This is necessary because nearly all codes generate data in groups of 2 or 4 bytes, so it is basically impossible to represent data with an odd length using this compression format.
decompressed_length += len(chunk) - 1
yield chunk[:-1]
else:
decompressed_length += len(chunk)
yield chunk
if debug:
print(f"Decompressed {decompressed_length:#x} bytes so far")

View File

@ -1,3 +1,5 @@
import typing
from . import common
# Lookup table for codes in range(0xd5, 0xfe).
@ -19,96 +21,75 @@ TABLE = [TABLE_DATA[i:i + 2] for i in range(0, len(TABLE_DATA), 2)]
assert len(TABLE) == len(range(0xd5, 0xfe))
def decompress(data: bytes, decompressed_length: int, *, debug: bool=False) -> bytes:
"""Decompress compressed data in the format used by 'dcmp' (1)."""
def decompress_stream_inner(header_info: common.CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Internal helper function, implements the main decompression algorithm. Only called from decompress_stream, which performs some extra checks and debug logging."""
prev_literals = []
decompressed = b""
if not isinstance(header_info, common.CompressedType8HeaderInfo):
raise common.DecompressError(f"Incorrect header type: {type(header_info).__qualname__}")
i = 0
prev_literals: typing.List[bytes] = []
while i < len(data):
byte = data[i]
while True: # Loop is terminated when the EOF marker (0xff) is encountered
(byte,) = common.read_exact(stream, 1)
if debug:
print(f"Tag byte 0x{byte:>02x}, at 0x{i:x}, decompressing to 0x{len(decompressed):x}")
print(f"Tag byte 0x{byte:>02x}")
if byte in range(0x00, 0x20):
# Literal byte sequence, 1-byte header.
# The length of the literal data is stored in the low nibble of the tag byte.
count = (byte >> 0 & 0xf) + 1
begin = i + 1
end = begin + count
# Controls whether or not the literal is stored so that it can be referenced again later.
do_store = byte >= 0x10
literal = data[begin:end]
literal = common.read_exact(stream, count)
if debug:
print(f"Literal (1-byte header, storing: {do_store})")
print(f"\t-> {literal}")
decompressed += literal
if do_store:
if debug:
print(f"\t-> stored as literal number 0x{len(prev_literals):x}")
print(f"\t-> storing as literal number 0x{len(prev_literals):x}")
prev_literals.append(literal)
i = end
yield literal
elif byte in range(0x20, 0xd0):
# Backreference to a previous literal, 1-byte form.
# This can reference literals with indices in range(0xb0).
table_index = byte - 0x20
i += 1
if debug:
print(f"Backreference (1-byte form) to 0x{table_index:>02x}")
literal = prev_literals[table_index]
if debug:
print(f"\t-> {literal}")
decompressed += literal
yield prev_literals[table_index]
elif byte in (0xd0, 0xd1):
# Literal byte sequence, 2-byte header.
# The length of the literal data is stored in the following byte.
count = data[i+1]
begin = i + 2
end = begin + count
(count,) = common.read_exact(stream, 1)
# Controls whether or not the literal is stored so that it can be referenced again later.
do_store = byte == 0xd1
literal = data[begin:end]
literal = common.read_exact(stream, count)
if debug:
print(f"Literal (2-byte header, storing: {do_store})")
print(f"\t-> {literal}")
decompressed += literal
if do_store:
if debug:
print(f"\t-> stored as literal number 0x{len(prev_literals):x}")
print(f"\t-> storing as literal number 0x{len(prev_literals):x}")
prev_literals.append(literal)
i = end
yield literal
elif byte == 0xd2:
# Backreference to a previous literal, 2-byte form.
# This can reference literals with indices in range(0xb0, 0x1b0).
table_index = data[i+1] + 0xb0
i += 2
(next_byte,) = common.read_exact(stream, 1)
table_index = next_byte + 0xb0
if debug:
print(f"Backreference (2-byte form) to 0x{table_index:>02x}")
literal = prev_literals[table_index]
if debug:
print(f"\t-> {literal}")
decompressed += literal
yield prev_literals[table_index]
elif byte in range(0xd5, 0xfe):
# Reference into a fixed table of two-byte literals.
# All compressed resources use the same table.
table_index = byte - 0xd5
i += 1
if debug:
print(f"Fixed table reference to 0x{table_index:>02x}")
entry = TABLE[table_index]
if debug:
print(f"\t-> {entry}")
decompressed += entry
yield TABLE[table_index]
elif byte == 0xfe:
# Extended code, whose meaning is controlled by the following byte.
i += 1
kind = data[i]
(kind,) = common.read_exact(stream, 1)
if debug:
print(f"Extended code: 0x{kind:>02x}")
i += 1
if kind == 0x02:
# Repeat 1 byte a certain number of times.
@ -119,33 +100,45 @@ def decompress(data: bytes, decompressed_length: int, *, debug: bool=False) -> b
print(f"Repeat {byte_count}-byte value")
# The byte(s) to repeat, stored as a variable-length integer. The value is treated as unsigned, i. e. the integer is never negative.
to_repeat_int, length = common._read_variable_length_integer(data, i)
i += length
to_repeat_int = common.read_variable_length_integer(stream)
try:
to_repeat = to_repeat_int.to_bytes(byte_count, "big", signed=False)
except OverflowError:
raise common.DecompressError(f"Value to repeat out of range for {byte_count}-byte repeat: {to_repeat_int:#x}")
count_m1, length = common._read_variable_length_integer(data, i)
i += length
count = count_m1 + 1
count = common.read_variable_length_integer(stream) + 1
if count <= 0:
raise common.DecompressError(f"Repeat count must be positive: {count}")
repeated = to_repeat * count
if debug:
print(f"\t-> {to_repeat} * {count}: {repeated}")
decompressed += repeated
print(f"\t-> {to_repeat!r} * {count}")
yield to_repeat * count
else:
raise common.DecompressError(f"Unknown extended code: 0x{kind:>02x}")
elif byte == 0xff:
# End of data marker, always occurs exactly once as the last byte of the compressed data.
if debug:
print("End marker")
if i != len(data) - 1:
raise common.DecompressError(f"End marker reached at {i}, before the expected end of data at {len(data) - 1}")
i += 1
# Check that there really is no more data left.
extra = stream.read(1)
if extra:
raise common.DecompressError(f"Extra data encountered after end of data marker (first extra byte: {extra!r})")
break
else:
raise common.DecompressError(f"Unknown tag byte: 0x{data[i]:>02x}")
raise common.DecompressError(f"Unknown tag byte: 0x{byte:>02x}")
def decompress_stream(header_info: common.CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Decompress compressed data in the format used by 'dcmp' (1)."""
return decompressed
decompressed_length = 0
for chunk in decompress_stream_inner(header_info, stream, debug=debug):
if debug:
print(f"\t-> {chunk!r}")
decompressed_length += len(chunk)
yield chunk
if debug:
print(f"Decompressed {decompressed_length:#x} bytes so far")

View File

@ -2,6 +2,7 @@ import enum
import struct
import typing
from .. import _io_utils
from . import common
@ -73,68 +74,73 @@ def _split_bits(i: int) -> typing.Tuple[bool, bool, bool, bool, bool, bool, bool
)
def _decompress_system_untagged(data: bytes, decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool=False) -> bytes:
parts = []
i = 0
while i < len(data):
if i == len(data) - 1 and decompressed_length % 2 != 0:
def _decompress_untagged(stream: "_io_utils.PeekableIO", decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool = False) -> typing.Iterator[bytes]:
while True: # Loop is terminated when EOF is reached.
table_index_data = stream.read(1)
if not table_index_data:
# End of compressed data.
break
elif not stream.peek(1) and decompressed_length % 2 != 0:
# Special case: if we are at the last byte of the compressed data, and the decompressed data has an odd length, the last byte is a single literal byte, and not a table reference.
if debug:
print(f"Last byte: {data[-1:]}")
parts.append(data[-1:])
print(f"Last byte: {table_index_data!r}")
yield table_index_data
break
# Compressed data is untagged, every byte is a table reference.
(table_index,) = table_index_data
if debug:
print(f"Reference: {data[i]} -> {table[data[i]]}")
parts.append(table[data[i]])
i += 1
return b"".join(parts)
print(f"Reference: {table_index} -> {table[table_index]!r}")
yield table[table_index]
def _decompress_system_tagged(data: bytes, decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool=False) -> bytes:
parts = []
i = 0
while i < len(data):
if i == len(data) - 1 and decompressed_length % 2 != 0:
def _decompress_tagged(stream: "_io_utils.PeekableIO", decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool = False) -> typing.Iterator[bytes]:
while True: # Loop is terminated when EOF is reached.
tag_data = stream.read(1)
if not tag_data:
# End of compressed data.
break
elif not stream.peek(1) and decompressed_length % 2 != 0:
# Special case: if we are at the last byte of the compressed data, and the decompressed data has an odd length, the last byte is a single literal byte, and not a tag or a table reference.
if debug:
print(f"Last byte: {data[-1:]}")
parts.append(data[-1:])
print(f"Last byte: {tag_data!r}")
yield tag_data
break
# Compressed data is tagged, each tag byte is followed by 8 table references and/or literals.
tag = data[i]
(tag,) = tag_data
if debug:
print(f"Tag: 0b{tag:>08b}")
i += 1
for is_ref in _split_bits(tag):
if is_ref:
# This is a table reference (a single byte that is an index into the table).
table_index_data = stream.read(1)
if not table_index_data:
# End of compressed data.
break
(table_index,) = table_index_data
if debug:
print(f"Reference: {data[i]} -> {table[data[i]]}")
parts.append(table[data[i]])
i += 1
print(f"Reference: {table_index} -> {table[table_index]!r}")
yield table[table_index]
else:
# This is a literal (two uncompressed bytes that are literally copied into the output).
# Note: if i == len(data)-1, the literal is actually only a single byte long.
# This case is handled automatically - the slice extends one byte past the end of the data, and only one byte is returned.
literal = stream.read(2)
if not literal:
# End of compressed data.
break
# Note: the literal may be only a single byte long if it is located exactly at EOF. This is intended and expected - the 1-byte literal is yielded normally, and on the next iteration, decompression is terminated as EOF is detected.
if debug:
print(f"Literal: {data[i:i+2]}")
parts.append(data[i:i + 2])
i += 2
# If the end of the compressed data is reached in the middle of a chunk, all further tag bits are ignored (they should be zero) and decompression ends.
if i >= len(data):
break
return b"".join(parts)
print(f"Literal: {literal!r}")
yield literal
def decompress(data: bytes, decompressed_length: int, parameters: bytes, *, debug: bool=False) -> bytes:
def decompress_stream(header_info: common.CompressedHeaderInfo, stream: typing.BinaryIO, *, debug: bool = False) -> typing.Iterator[bytes]:
"""Decompress compressed data in the format used by 'dcmp' (2)."""
unknown, table_count_m1, flags_raw = STRUCT_PARAMETERS.unpack(parameters)
if not isinstance(header_info, common.CompressedType9HeaderInfo):
raise common.DecompressError(f"Incorrect header type: {type(header_info).__qualname__}")
unknown, table_count_m1, flags_raw = STRUCT_PARAMETERS.unpack(header_info.parameters)
if debug:
print(f"Value of unknown parameter field: 0x{unknown:>04x}")
@ -152,24 +158,21 @@ def decompress(data: bytes, decompressed_length: int, parameters: bytes, *, debu
print(f"Flags: {flags}")
if ParameterFlags.CUSTOM_TABLE in flags:
table_start = 0
data_start = table_start + table_count * 2
table = []
for i in range(table_start, data_start, 2):
table.append(data[i:i + 2])
for _ in range(table_count):
table.append(common.read_exact(stream, 2))
if debug:
print(f"Using custom table: {table}")
else:
if table_count_m1 != 0:
raise common.DecompressError(f"table_count_m1 field is {table_count_m1}, but must be zero when the default table is used")
table = DEFAULT_TABLE
data_start = 0
if debug:
print("Using default table")
if ParameterFlags.TAGGED in flags:
decompress_func = _decompress_system_tagged
decompress_func = _decompress_tagged
else:
decompress_func = _decompress_system_untagged
decompress_func = _decompress_untagged
return decompress_func(data[data_start:], decompressed_length, table, debug=debug)
yield from decompress_func(_io_utils.make_peekable(stream), header_info.decompressed_length, table, debug=debug)

0
src/rsrcfork/py.typed Normal file
View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 355 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 478 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 159 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

BIN
tests/data/empty.rsrc Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 286 B

BIN
tests/data/testfile.rsrc Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 558 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 602 B

329
tests/test_rsrcfork.py Normal file
View File

@ -0,0 +1,329 @@
import collections
import io
import pathlib
import shutil
import sys
import tempfile
import typing
import unittest
import rsrcfork
RESOURCE_FORKS_SUPPORTED = sys.platform.startswith("darwin")
RESOURCE_FORKS_NOT_SUPPORTED_MESSAGE = "Resource forks are only supported on Mac"
DATA_DIR = pathlib.Path(__file__).parent / "data"
EMPTY_RSRC_FILE = DATA_DIR / "empty.rsrc"
TEXTCLIPPING_RSRC_FILE = DATA_DIR / "unicode.textClipping.rsrc"
TESTFILE_RSRC_FILE = DATA_DIR / "testfile.rsrc"
COMPRESS_DATA_DIR = DATA_DIR / "compress"
COMPRESSED_DIR = COMPRESS_DATA_DIR / "compressed"
UNCOMPRESSED_DIR = COMPRESS_DATA_DIR / "uncompressed"
COMPRESS_RSRC_FILE_NAMES = [
"Finder.rsrc",
"Finder Help.rsrc",
# "Install.rsrc", # Commented out for performance - this file contains a lot of small resources.
"System.rsrc",
]
def make_pascal_string(s):
return bytes([len(s)]) + s
UNICODE_TEXT = "Here is some text, including Üñïçø∂é!"
DRAG_DATA = (
b"\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03"
b"utxt\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00"
b"utf8\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00"
b"TEXT\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00"
)
TEXTCLIPPING_RESOURCES = collections.OrderedDict([
(b"utxt", collections.OrderedDict([
(256, UNICODE_TEXT.encode("utf-16-be")),
])),
(b"utf8", collections.OrderedDict([
(256, UNICODE_TEXT.encode("utf-8")),
])),
(b"TEXT", collections.OrderedDict([
(256, UNICODE_TEXT.encode("macroman")),
])),
(b"drag", collections.OrderedDict([
(128, DRAG_DATA),
]))
])
TESTFILE_HEADER_SYSTEM_DATA = (
b"\xa7F$\x08 <\x00\x00\xab\x03\xa7F <\x00\x00"
b"\x01\x00\xb4\x88f\x06`\np\x00`\x06 <\x00\x00"
b"\x08testfile\x00\x02\x00\x02\x00rs"
b"rcRSED\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
b"\x02\x00rsrcRSED\x00\x00\x00\x00\x00\x00"
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
b"\x00\x00\xdaIp~\x00\x00\x00\x00\x00\x00\x02.\xfe\x84"
)
TESTFILE_HEADER_APPLICATION_DATA = b"This is the application-specific header data section. Apparently I can write whatever nonsense I want here. A few more bytes...."
TESTFILE_RESOURCES = collections.OrderedDict([
(b"STR ", collections.OrderedDict([
(128, (
None, rsrcfork.ResourceAttrs(0),
make_pascal_string(b"The String, without name or attributes"),
)),
(129, (
b"The Name", rsrcfork.ResourceAttrs(0),
make_pascal_string(b"The String, with name and no attributes"),
)),
(130, (
None, rsrcfork.ResourceAttrs.resProtected | rsrcfork.ResourceAttrs.resPreload,
make_pascal_string(b"The String, without name but with attributes"),
)),
(131, (
b"The Name with Attributes", rsrcfork.ResourceAttrs.resSysHeap,
make_pascal_string(b"The String, with both name and attributes"),
)),
])),
])
class UnseekableStreamWrapper(io.BufferedIOBase):
_wrapped: typing.BinaryIO
def __init__(self, wrapped: typing.BinaryIO) -> None:
super().__init__()
self._wrapped = wrapped
def read(self, size: typing.Optional[int] = -1) -> bytes:
return self._wrapped.read(size)
def open_resource_fork(path: pathlib.Path, mode: str) -> typing.BinaryIO:
return (path / "..namedfork" / "rsrc").open(mode)
class ResourceFileReadTests(unittest.TestCase):
def test_empty(self) -> None:
with rsrcfork.open(EMPTY_RSRC_FILE, fork="data") as rf:
self.assertEqual(rf.header_system_data, bytes(112))
self.assertEqual(rf.header_application_data, bytes(128))
self.assertEqual(rf.file_attributes, rsrcfork.ResourceFileAttrs(0))
self.assertEqual(list(rf), [])
def internal_test_textclipping(self, rf: rsrcfork.ResourceFile) -> None:
self.assertEqual(rf.header_system_data, bytes(112))
self.assertEqual(rf.header_application_data, bytes(128))
self.assertEqual(rf.file_attributes, rsrcfork.ResourceFileAttrs(0))
self.assertEqual(list(rf), list(TEXTCLIPPING_RESOURCES))
for (actual_type, actual_reses), (expected_type, expected_reses) in zip(rf.items(), TEXTCLIPPING_RESOURCES.items()):
with self.subTest(type=expected_type):
self.assertEqual(actual_type, expected_type)
self.assertEqual(list(actual_reses), list(expected_reses))
for (actual_id, actual_res), (expected_id, expected_data) in zip(actual_reses.items(), expected_reses.items()):
with self.subTest(id=expected_id):
self.assertEqual(actual_res.type, expected_type)
self.assertEqual(actual_id, expected_id)
self.assertEqual(actual_res.id, expected_id)
self.assertEqual(actual_res.name, None)
self.assertEqual(actual_res.attributes, rsrcfork.ResourceAttrs(0))
self.assertEqual(actual_res.data, expected_data)
with actual_res.open() as f:
self.assertEqual(f.read(10), expected_data[:10])
self.assertEqual(f.read(5), expected_data[10:15])
self.assertEqual(f.read(), expected_data[15:])
f.seek(0)
self.assertEqual(f.read(), expected_data)
self.assertEqual(actual_res.compressed_info, None)
actual_res_1 = rf[b"TEXT"][256]
expected_data_1 = TEXTCLIPPING_RESOURCES[b"TEXT"][256]
actual_res_2 = rf[b"utxt"][256]
expected_data_2 = TEXTCLIPPING_RESOURCES[b"utxt"][256]
with self.subTest(stream_test="multiple streams for the same resource"):
with actual_res_1.open() as f1, actual_res_1.open() as f2:
f1.seek(5)
f2.seek(10)
self.assertEqual(f1.read(10), expected_data_1[5:15])
self.assertEqual(f2.read(10), expected_data_1[10:20])
self.assertEqual(f1.read(), expected_data_1[15:])
self.assertEqual(f2.read(), expected_data_1[20:])
with self.subTest(stream_test="multiple streams for different resources"):
with actual_res_1.open() as f1, actual_res_2.open() as f2:
f1.seek(5)
f2.seek(10)
self.assertEqual(f1.read(10), expected_data_1[5:15])
self.assertEqual(f2.read(10), expected_data_2[10:20])
self.assertEqual(f1.read(), expected_data_1[15:])
self.assertEqual(f2.read(), expected_data_2[20:])
def test_textclipping_seekable_stream(self) -> None:
with TEXTCLIPPING_RSRC_FILE.open("rb") as f:
with rsrcfork.ResourceFile(f) as rf:
self.internal_test_textclipping(rf)
def test_textclipping_unseekable_stream(self) -> None:
with TEXTCLIPPING_RSRC_FILE.open("rb") as f:
with UnseekableStreamWrapper(f) as usf:
with rsrcfork.ResourceFile(usf) as rf:
self.internal_test_textclipping(rf)
def test_textclipping_path_data_fork(self) -> None:
with rsrcfork.open(TEXTCLIPPING_RSRC_FILE, fork="data") as rf:
self.internal_test_textclipping(rf)
@unittest.skipUnless(RESOURCE_FORKS_SUPPORTED, RESOURCE_FORKS_NOT_SUPPORTED_MESSAGE)
def test_textclipping_path_resource_fork(self) -> None:
with tempfile.NamedTemporaryFile() as tempf:
with TEXTCLIPPING_RSRC_FILE.open("rb") as dataf:
with open_resource_fork(pathlib.Path(tempf.name), "wb") as rsrcf:
shutil.copyfileobj(dataf, rsrcf)
with rsrcfork.open(tempf.name, fork="rsrc") as rf:
self.internal_test_textclipping(rf)
@unittest.skipUnless(RESOURCE_FORKS_SUPPORTED, RESOURCE_FORKS_NOT_SUPPORTED_MESSAGE)
def test_textclipping_path_auto_resource_fork(self) -> None:
with tempfile.NamedTemporaryFile() as temp_data_fork:
with TEXTCLIPPING_RSRC_FILE.open("rb") as source_file:
with open_resource_fork(pathlib.Path(temp_data_fork.name), "wb") as temp_rsrc_fork:
shutil.copyfileobj(source_file, temp_rsrc_fork)
with self.subTest(data_fork="empty"):
# Resource fork is selected when data fork is empty.
with rsrcfork.open(temp_data_fork.name) as rf:
self.internal_test_textclipping(rf)
with self.subTest(data_fork="non-resource data"):
# Resource fork is selected when data fork contains non-resource data.
temp_data_fork.write(b"This is the file's data fork. It should not be read, as the file has a resource fork.")
with rsrcfork.open(temp_data_fork.name) as rf:
self.internal_test_textclipping(rf)
with self.subTest(data_fork="valid resource data"):
# Resource fork is selected even when data fork contains valid resource data.
with EMPTY_RSRC_FILE.open("rb") as source_file:
shutil.copyfileobj(source_file, temp_data_fork)
with rsrcfork.open(temp_data_fork.name) as rf:
self.internal_test_textclipping(rf)
@unittest.skipUnless(RESOURCE_FORKS_SUPPORTED, RESOURCE_FORKS_NOT_SUPPORTED_MESSAGE)
def test_textclipping_path_auto_data_fork(self) -> None:
with tempfile.NamedTemporaryFile() as temp_data_fork:
with TEXTCLIPPING_RSRC_FILE.open("rb") as source_file:
shutil.copyfileobj(source_file, temp_data_fork)
# Have to flush the temporary file manually so that the data is visible to the other reads below.
# Normally this happens automatically as part of the close method, but that would also delete the temporary file, which we don't want.
temp_data_fork.flush()
with self.subTest(rsrc_fork="nonexistant"):
# Data fork is selected when resource fork does not exist.
with rsrcfork.open(temp_data_fork.name) as rf:
self.internal_test_textclipping(rf)
with self.subTest(rsrc_fork="empty"):
# Data fork is selected when resource fork exists, but is empty.
with open_resource_fork(pathlib.Path(temp_data_fork.name), "wb") as temp_rsrc_fork:
temp_rsrc_fork.write(b"")
with rsrcfork.open(temp_data_fork.name) as rf:
self.internal_test_textclipping(rf)
with self.subTest(rsrc_fork="non-resource data"):
# Data fork is selected when resource fork contains non-resource data.
with open_resource_fork(pathlib.Path(temp_data_fork.name), "wb") as temp_rsrc_fork:
temp_rsrc_fork.write(b"This is the file's resource fork. It contains junk, so it should be ignored in favor of the data fork.")
with rsrcfork.open(temp_data_fork.name) as rf:
self.internal_test_textclipping(rf)
def test_testfile(self) -> None:
with rsrcfork.open(TESTFILE_RSRC_FILE, fork="data") as rf:
self.assertEqual(rf.header_system_data, TESTFILE_HEADER_SYSTEM_DATA)
self.assertEqual(rf.header_application_data, TESTFILE_HEADER_APPLICATION_DATA)
self.assertEqual(rf.file_attributes, rsrcfork.ResourceFileAttrs.mapPrinterDriverMultiFinderCompatible | rsrcfork.ResourceFileAttrs.mapReadOnly)
self.assertEqual(list(rf), list(TESTFILE_RESOURCES))
for (actual_type, actual_reses), (expected_type, expected_reses) in zip(rf.items(), TESTFILE_RESOURCES.items()):
with self.subTest(type=expected_type):
self.assertEqual(actual_type, expected_type)
self.assertEqual(list(actual_reses), list(expected_reses))
for (actual_id, actual_res), (expected_id, (expected_name, expected_attrs, expected_data)) in zip(actual_reses.items(), expected_reses.items()):
with self.subTest(id=expected_id):
self.assertEqual(actual_res.type, expected_type)
self.assertEqual(actual_id, expected_id)
self.assertEqual(actual_res.id, expected_id)
self.assertEqual(actual_res.name, expected_name)
self.assertEqual(actual_res.attributes, expected_attrs)
self.assertEqual(actual_res.data, expected_data)
with actual_res.open() as f:
self.assertEqual(f.read(), expected_data)
self.assertEqual(actual_res.compressed_info, None)
def test_compress_compare(self) -> None:
# This test goes through pairs of resource files: one original file with both compressed and uncompressed resources, and one modified file where all compressed resources have been decompressed (using ResEdit on System 7.5.5).
# It checks that the rsrcfork library performs automatic decompression on the compressed resources, so that the compressed resource file appears to the user like the uncompressed resource file (ignoring resource order, which was lost during decompression using ResEdit).
for name in COMPRESS_RSRC_FILE_NAMES:
with self.subTest(name=name):
with rsrcfork.open(COMPRESSED_DIR / name, fork="data") as compressed_rf, rsrcfork.open(UNCOMPRESSED_DIR / name, fork="data") as uncompressed_rf:
self.assertEqual(sorted(compressed_rf), sorted(uncompressed_rf))
for (compressed_type, compressed_reses), (uncompressed_type, uncompressed_reses) in zip(sorted(compressed_rf.items()), sorted(uncompressed_rf.items())):
with self.subTest(type=compressed_type):
self.assertEqual(compressed_type, uncompressed_type)
self.assertEqual(sorted(compressed_reses), sorted(uncompressed_reses))
for (compressed_id, compressed_res), (uncompressed_id, uncompressed_res) in zip(sorted(compressed_reses.items()), sorted(uncompressed_reses.items())):
with self.subTest(id=compressed_id):
# The metadata of the compressed and uncompressed resources must match.
self.assertEqual(compressed_res.type, uncompressed_res.type)
self.assertEqual(compressed_id, uncompressed_id)
self.assertEqual(compressed_res.id, compressed_id)
self.assertEqual(compressed_res.id, uncompressed_res.id)
self.assertEqual(compressed_res.name, uncompressed_res.name)
self.assertEqual(compressed_res.attributes & ~rsrcfork.ResourceAttrs.resCompressed, uncompressed_res.attributes)
# The uncompressed resource really has to be not compressed.
self.assertNotIn(rsrcfork.ResourceAttrs.resCompressed, uncompressed_res.attributes)
self.assertEqual(uncompressed_res.compressed_info, None)
self.assertEqual(uncompressed_res.data, uncompressed_res.data_raw)
self.assertEqual(uncompressed_res.length, uncompressed_res.length_raw)
# The compressed resource's (automatically decompressed) data must match the uncompressed data.
self.assertEqual(compressed_res.data, uncompressed_res.data)
self.assertEqual(compressed_res.length, uncompressed_res.length)
with compressed_res.open() as compressed_f, uncompressed_res.open() as uncompressed_f:
compressed_f.seek(15)
uncompressed_f.seek(15)
self.assertEqual(compressed_f.read(10), uncompressed_f.read(10))
self.assertEqual(compressed_f.read(), uncompressed_f.read())
compressed_f.seek(0)
uncompressed_f.seek(0)
self.assertEqual(compressed_f.read(), uncompressed_f.read())
if rsrcfork.ResourceAttrs.resCompressed in compressed_res.attributes:
# Resources with the compressed attribute must expose correct compression metadata.
self.assertNotEqual(compressed_res.compressed_info, None)
self.assertEqual(compressed_res.compressed_info.decompressed_length, compressed_res.length)
else:
# Some resources in the "compressed" files are not actually compressed, in which case there is no compression metadata.
self.assertEqual(compressed_res.compressed_info, None)
self.assertEqual(compressed_res.data, compressed_res.data_raw)
self.assertEqual(compressed_res.length, compressed_res.length_raw)
if __name__ == "__main__":
unittest.main()

27
tox.ini Normal file
View File

@ -0,0 +1,27 @@
[tox]
# When updating the Python versions here,
# please also update the corresponding Python versions in the GitHub Actions workflow (.github/workflows/ci.yml).
envlist = py{36,311},flake8,mypy,package
[testenv]
commands = python -m unittest discover --start-directory ./tests
[testenv:flake8]
deps =
flake8 >= 3.8.0
flake8-bugbear
commands = flake8
[testenv:mypy]
deps =
mypy
commands = mypy
[testenv:package]
deps =
twine
wheel >= 0.32.0
commands =
python setup.py sdist bdist_wheel
twine check dist/*