Compare commits

...

33 Commits

Author SHA1 Message Date
dgelessus d2bbab1f5d Use Ubuntu 20.04 on GitHub Actions for Python 3.6 support (hopefully) 2023-02-14 22:58:28 +01:00
dgelessus a2663ae85d Allow compressed resource header length field to be 0 (see #10) 2023-02-14 21:31:45 +01:00
dgelessus 8f60fcdfc4 Add stacklevel to warnings.warn calls as recommended by flake8 2023-02-14 21:30:44 +01:00
dgelessus ff9377dc8d Reformat setup.cfg flake8 ignore option to make current flake8 happy 2023-02-14 21:30:10 +01:00
dgelessus 0624d4eae9 Mark Python 3.11 as supported 2023-02-14 21:09:17 +01:00
dgelessus 4ad1d1d6b9 Update GitHub Actions to current versions 2022-09-08 11:04:47 +02:00
dgelessus b95c4917cc Fix new mypy error about enum.Flag.name possibly being None 2022-09-08 11:03:52 +02:00
dgelessus ee767a106c Remove flake8-tabs plugin that is incompatible with flake8 5
And instead manually ignore the relevant indentation errors/warnings.
2022-09-08 10:46:45 +02:00
dgelessus 82951f5d8e Update usage examples in README.md 2022-09-08 10:42:42 +02:00
dgelessus b9fdac1c0b Wrap some long lines in README.md for better diffing/merging 2022-09-08 10:38:39 +02:00
dgelessus 70d51c2907 Convert README from reStructuredText to Markdown
Because it doesn't use any reStructuredText-specific features and
Markdown syntax is less annoying.
2022-09-08 10:30:21 +02:00
dgelessus f891a6ee00 Move main source code into src subdirectory
This avoids certain problems related to the entire repo root appearing
on the import path, usually when running Python/pip from the repo root
or when using an editable install.
2021-11-21 19:15:51 +01:00
dgelessus b1e5e7c96e Update actions/setup-python to v2 2021-11-21 18:50:34 +01:00
dgelessus 60709e386a Add Python 3.10 to test matrix and classifiers 2021-11-21 18:48:12 +01:00
dgelessus f437ee5f43 Reduce test matrix to just the oldest and newest Python versions
Testing the versions in between doesn't really bring much benefit, and
it becomes impractical when the range of supported versions grows.
2021-11-21 18:24:17 +01:00
dgelessus 5c3bc5d7e5 Remove custom stream types and read all resource data upfront again
The custom stream types were almost always slower than just reading the
entire data into memory, and there's no reason not to do that -
resources are small enough that memory usage and disk IO speed aren't a
concern (at least not for any machine that's modern enough to run
Python 3...).

Perhaps the only performance advantage was when reading a small amount
of data from the start of a compressed resource. In that case the
custom stream could incrementally decompress only the part of the data
that's actually needed, which was a bit faster than decompressing the
entire resource and then throwing away most of the data. But this
situation is rare enough that it's not worth handling in the rsrcfork
library. If this is a real performance issue for someone, they can
manually call the incremental decompression functions from
rsrcfork.compress where needed.
2020-11-01 19:28:25 +01:00
dgelessus d74dbc41ba Remove no longer needed type: ignore comment from .__main__ 2020-11-01 19:13:11 +01:00
dgelessus 0642b1e8bf Add Python 3.9 to test matrix and classifiers 2020-11-01 19:09:43 +01:00
dgelessus 54ccdb0a47 Add Typing :: Typed classifier 2020-08-08 19:36:23 +02:00
dgelessus f76817c389 Reorder classifiers alphabetically 2020-08-08 19:30:26 +02:00
dgelessus 9e6dfacff6 Add custom stream type for compressed resources 2020-08-01 14:11:06 +02:00
dgelessus 8d39469e6e Wrap _io_utils.SubStream in an io.BufferedReader for performance
Although the underlying stream is already buffered, the extra
BufferedReader wrapper around the SubStream results in a noticeable
performance improvement.
2020-07-23 15:49:28 +02:00
dgelessus 028be98e8d Suppress incorrect mypy shutil.copyfileobj error (python/mypy#8962) 2020-07-23 13:31:57 +02:00
dgelessus 98551263b3 Work around false mypy error about SubStream.__enter__ 2020-07-23 13:21:34 +02:00
dgelessus 0054d0f7b5 Fix variable naming conflict in show_filtered_resources
mypy does not like it when the same variable name is used for different
types, even if it is in unrelated if branches and could never actually
cause any problems.
2020-07-23 13:17:02 +02:00
dgelessus 126795239c Reimplement Resource.data_raw using a custom stream type (SubStream)
This way all reads performed on a resource data stream are forwarded
to the underlying resource file stream, with the read offsets and
lengths adjusted appropriately.
2020-07-23 02:42:32 +02:00
dgelessus 2907d9f9e8 Rewrite __main__ code to use stream-based resource reading 2020-07-21 14:45:48 +02:00
dgelessus 5c96baea29 Change (raw_)hexdump to yield lines instead of printing directly 2020-07-21 14:27:43 +02:00
dgelessus 664e992fa3 Rewrite Resource methods using stream API where appropriate 2020-07-21 14:20:50 +02:00
dgelessus 61247ec783 Add initial API and tests for stream-based resource reading
For now the stream-based API is a simple BytesIO wrapper around
data/data_raw, but it will be optimized in the future.
2020-07-21 14:12:09 +02:00
dgelessus 0f6018e4bf Move .compress.common.make_peekable and related code into ._io_utils 2020-07-19 23:16:36 +02:00
dgelessus 476a68916b Merge implementations of read_exact functions/methods
The old functions/methods still exist, so that they continue to raise
the same exceptions as before (which are different depending on
context), but they now use the same implementation internally.
2020-07-18 21:07:12 +02:00
dgelessus 4bbf2f7c14 Bump version to 1.8.1.dev 2020-07-18 17:47:48 +02:00
16 changed files with 698 additions and 473 deletions

View File

@ -3,18 +3,15 @@ jobs:
test:
strategy:
matrix:
platform: [macos-latest, ubuntu-latest, windows-latest]
platform: [macos-latest, ubuntu-20.04, windows-latest]
runs-on: ${{ matrix.platform }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v1
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.6"
- uses: actions/setup-python@v1
- uses: actions/setup-python@v4
with:
python-version: "3.7"
- uses: actions/setup-python@v1
with:
python-version: "3.8"
python-version: "3.11"
- run: python -m pip install --upgrade tox
- run: tox

326
README.md Normal file
View File

@ -0,0 +1,326 @@
# `rsrcfork`
A pure Python, cross-platform library/tool for reading Macintosh resource data,
as stored in resource forks and `.rsrc` files.
Resource forks were an important part of the Classic Mac OS,
where they provided a standard way to store structured file data, metadata and application resources.
This usage continued into Mac OS X (now called macOS) for backward compatibility,
but over time resource forks became less commonly used in favor of simple data fork-only formats, application bundles, and extended attributes.
As of OS X 10.8 and the deprecation of the Carbon API,
macOS no longer provides any officially supported APIs for using and manipulating resource data.
Despite this, parts of macOS still support and use resource forks,
for example to store custom file and folder icons set by the user.
## Features
* Pure Python, cross-platform - no native Mac APIs are used.
* Provides both a Python API and a command-line tool.
* Resource data can be read from either the resource fork or the data fork.
* On Mac systems, the correct fork is selected automatically when reading a file.
This allows reading both regular resource forks and resource data stored in data forks (as with `.rsrc` and similar files).
* On non-Mac systems, resource forks are not available, so the data fork is always used.
* Compressed resources (supported by System 7 through Mac OS 9) are automatically decompressed.
* Only the standard System 7.0 resource compression methods are supported.
Resources that use non-standard decompressors cannot be decompressed.
* Object `repr`s are REPL-friendly:
all relevant information is displayed,
and long data is truncated to avoid filling up the screen by accident.
## Requirements
Python 3.6 or later.
No other libraries are required.
## Installation
`rsrcfork` is available [on PyPI](https://pypi.org/project/rsrcfork/) and can be installed using `pip`:
```sh
$ python3 -m pip install rsrcfork
```
Alternatively you can download the source code manually,
and run this command in the source code directory to install it:
```sh
$ python3 -m pip install .
```
## Examples
### Simple example
```python-repl
>>> import rsrcfork
>>> rf = rsrcfork.open("/Users/Shared/Test.textClipping")
>>> rf
<rsrcfork.ResourceFile at 0x1046e6048, attributes ResourceFileAttrs.0, containing 4 resource types: [b'utxt', b'utf8', b'TEXT', b'drag']>
>>> rf[b"TEXT"]
<Resource map for type b'TEXT', containing one resource: <Resource type b'TEXT', id 256, name None, attributes ResourceAttrs.0, data b'Here is some text'>>
```
### Automatic selection of data/resource fork
```python-repl
>>> import rsrcfork
>>> datarf = rsrcfork.open("/System/Library/Fonts/Monaco.dfont") # Resources in data fork
>>> datarf._stream
<_io.BufferedReader name='/System/Library/Fonts/Monaco.dfont'>
>>> resourcerf = rsrcfork.open("/Users/Shared/Test.textClipping") # Resources in resource fork
>>> resourcerf._stream
<_io.BufferedReader name='/Users/Shared/Test.textClipping/..namedfork/rsrc'>
```
### Command-line interface
```sh
$ rsrcfork list /Users/Shared/Test.textClipping
4 resource types:
'TEXT': 1 resources:
(256): 17 bytes
'drag': 1 resources:
(128): 64 bytes
'utf8': 1 resources:
(256): 17 bytes
'utxt': 1 resources:
(256): 34 bytes
$ rsrcfork read /Users/Shared/Test.textClipping "'TEXT' (256)"
Resource 'TEXT' (256): 17 bytes:
00000000 48 65 72 65 20 69 73 20 73 6f 6d 65 20 74 65 78 |Here is some tex|
00000010 74 |t|
00000011
```
## Limitations
This library only understands the resource file's general structure,
i. e. the type codes, IDs, attributes, and data of the resources stored in the file.
The data of individual resources is provided in raw bytes form and is not processed further -
the format of this data is specific to each resource type.
Definitions of common resource types can be found inside Carbon and related frameworks in Apple's macOS SDKs as `.r` files,
a format roughly similar to C struct definitions,
which is used by the `Rez` and `DeRez` command-line tools to de/compile resource data.
There doesn't seem to be an exact specification of this format,
and most documentation on it is only available inside old manuals for MPW (Macintosh Programmer's Workshop) or similar development tools for old Mac systems.
Some macOS text editors, such as BBEdit/TextWrangler and TextMate support syntax highlighting for `.r` files.
Writing resource data is not supported at all.
## Further info on resource files
For technical info and documentation about resource files and resources,
see the ["resource forks" section of the mac_file_format_docs repo](https://github.com/dgelessus/mac_file_format_docs/blob/master/README.md#resource-forks).
## Changelog
### Version 1.8.1 (next version)
* Added `open` and `open_raw` methods to `Resource` objects,
for stream-based access to resource data.
* Fixed reading of compressed resource headers with the header length field incorrectly set to 0
(because real Mac OS apparently accepts this).
### Version 1.8.0
* Removed the old (non-subcommand-based) CLI syntax.
* Added filtering support to the `list` subcommand.
* Added a `resource-info` subcommand to display technical information about resources
(more detailed than what is displayed by `list` and `read`).
* Added a `raw-compress-info` subcommand to display technical header information about standalone compressed resource data.
* Made the library PEP 561-compliant by adding a py.typed file.
* Fixed an incorrect `AssertionError` when using the `--no-decompress` command-line options.
### Version 1.7.0
* Added a `raw-decompress` subcommand to decompress compressed resource data stored in a standalone file rather than as a resource.
* Optimized lazy loading of `Resource` objects.
Previously, resource data would be read from disk whenever a `Resource` object was looked up,
even if the data itself is never used.
Now the resource data is only loaded once the `data` (or `data_raw`) attribute is accessed.
* The same optimization applies to the `name` attribute,
although this is unlikely to make a difference in practice.
* As a result, it is no longer possible to construct `Resource` objects without a resource file.
This was previously possible, but had no practical use.
* Fixed a small error in the `'dcmp' (0)` decompression implementation.
### Version 1.6.0
* Added a new subcommand-based command-line syntax to the `rsrcfork` tool,
similar to other CLI tools such as `git` or `diskutil`.
* This subcommand-based syntax is meant to replace the old CLI options,
as the subcommand structure is easier to understand and more extensible in the future.
* Currently there are three subcommands:
`list` to list resources in a file,
`read` to read/display resource data,
and `read-header` to read a resource file's header data.
These subcommands can be used to perform all operations that were also available with the old CLI syntax.
* The old CLI syntax is still supported for now,
but it will be removed soon.
* The new syntax no longer supports reading CLI arguments from a file (using `@args_file.txt`),
abbreviating long options (e. g. `--no-d` instead of `--no-decompress`),
or the short option `-f` instead of `--fork`.
If you have a need for any of these features,
please open an issue.
### Version 1.5.0
* Added stream-based decompression methods to the `rsrcfork.compress` module.
* The internal decompressor implementations have been refactored to use streams.
* This allows for incremental decompression of compressed resource data.
In practice this has no noticeable effect yet,
because the main `rsrcfork` API doesn't support incremental reading of resource data.
* Fixed the command line tool always displaying an incorrect error "Cannot specify an explicit fork when reading from stdin" when using `-` (stdin) as the input file.
### Version 1.4.0
* Added `length` and `length_raw` attributes to `Resource`.
These attributes are equivalent to the `len` of `data` and `data_raw` respectively,
but may be faster to access.
* Currently, the only optimized case is `length` for compressed resources,
but more optimizations may be added in the future.
* Added a `compressed_info` attribute to `Resource` that provides access to the header information of compressed resources.
* Improved handling of compressed resources when listing resource files with the command line tool.
* Metadata of compressed resources is now displayed even if no decompressor implementation is available
(as long as the compressed data header can be parsed).
* Performance has been improved -
the data no longer needs to be fully decompressed to get its length,
this information is now read from the header.
* The `'dcmp'` ID used to decompress each resource is displayed.
* Fixed an incorrect `options.packages` in `setup.cfg`,
which made the library unusable except when installing from source using `--editable`.
* Fixed `ResourceFile.__enter__` returning `None`,
which made it impossible to use `ResourceFile` properly in a `with` statement.
* Fixed various minor errors reported by type checking with `mypy`.
### Version 1.3.0.post1
* Fixed an incorrect `options.packages` in `setup.cfg`,
which made the library unusable except when installing from source using `--editable`.
### Version 1.2.0.post1
* Fixed an incorrect `options.packages` in `setup.cfg`,
which made the library unusable except when installing from source using `--editable`.
### Version 1.3.0
* Added a `--group` command line option to group resources in list format by type (the default), ID, or with no grouping.
* Added a `dump-text` output format to the command line tool.
This format is identical to `dump`,
but instead of a hex dump,
it outputs the resource data as text.
The data is decoded as MacRoman and classic Mac newlines (`\r`) are translated.
This is useful for examining resources that contain mostly plain text.
* Changed the command line tool to sort resources by type and ID,
and added a `--no-sort` option to disable sorting and output resources in file order
(which was the previous behavior).
* Renamed the `rsrcfork.Resource` attributes `resource_type` and `resource_id` to `type` and ``id``, respectively.
The old names have been deprecated and will be removed in the future,
but are still supported for now.
* Changed `--format=dump` output to match `hexdump -C`'s format -
spacing has been adjusted,
and multiple subsequent identical lines are collapsed into a single `*`.
### Version 1.2.0
* Added support for compressed resources.
* Compressed resource data is automatically decompressed,
both in the Python API and on the command line.
* This is technically a breaking change,
since in previous versions the compressed resource data was returned directly.
However, this change will not affect end users negatively,
unless one has already implemented custom handling for compressed resources.
* Currently, only the three standard System 7.0 compression formats (`'dcmp'` IDs 0, 1, 2) are supported.
Attempting to access a resource compressed in an unsupported format results in a `DecompressError`.
* To access the raw resource data as stored in the file,
without automatic decompression,
use the `res.data_raw` attribute (for the Python API),
or the `--no-decompress` option (for the command-line interface).
This can be used to read the resource data in its compressed form,
even if the compression format is not supported.
* Improved automatic data/resource fork selection for files whose resource fork contains invalid data.
* This fixes reading certain system files with resource data in their data fork
(such as HIToolbox.rsrc in HIToolbox.framework, or .dfont fonts)
on recent macOS versions (at least macOS 10.14, possibly earlier).
Although these files have no resource fork,
recent macOS versions will successfully open the resource fork and return garbage data for it.
This behavior is now detected and handled by using the data fork instead.
* Replaced the `rsrcfork` parameter of `rsrcfork.open`/`ResourceFork.open` with a new `fork` parameter.
`fork` accepts string values (like the command line `--fork` option) rather than `rsrcfork`'s hard to understand `None`/`True`/`False`.
* The old `rsrcfork` parameter has been deprecated and will be removed in the future,
but for now it still works as before.
* Added an explanatory message when a resource filter on the command line doesn't match any resources in the resource file.
Previously there would either be no output or a confusing error,
depending on the selected `--format`.
* Changed resource type codes and names to be displayed in MacRoman instead of escaping all non-ASCII characters.
* Cleaned up the resource descriptions in listings and dumps to improve readability.
Previously they included some redundant or unnecessary information -
for example, each resource with no attributes set would be explicitly marked as "no attributes".
* Unified the formats of resource descriptions in listings and dumps,
which were previously slightly different from each other.
* Improved error messages when attempting to read multiple resources using `--format=hex` or `--format=raw`.
* Fixed reading from non-seekable streams not working for some resource files.
* Removed the `allow_seek` parameter of `ResourceFork.__init__` and the `--read-mode` command line option.
They are no longer necessary,
and were already practically useless before due to non-seekable stream reading being broken.
### Version 1.1.3.post1
* Fixed a formatting error in the README.rst to allow upload to PyPI.
### Version 1.1.3
**Note: This version is not available on PyPI, see version 1.1.3.post1 changelog for details.**
* Added a setuptools entry point for the command-line interface.
This allows calling it using just `rsrcfork` instead of `python3 -m rsrcfork`.
* Changed the default value of `ResourceFork.__init__`'s `close` keyword argument from `True` to `False`.
This matches the behavior of classes like `zipfile.ZipFile` and `tarfile.TarFile`.
* Fixed `ResourceFork.open` and `ResourceFork.__init__` not closing their streams in some cases.
* Refactored the single `rsrcfork.py` file into a package.
This is an internal change and should have no effect on how the `rsrcfork` module is used.
### Version 1.1.2
* Added support for the resource file attributes "Resources Locked" and "Printer Driver MultiFinder Compatible" from ResEdit.
* Added more dummy constants for resource attributes with unknown meaning,
so that resource files containing such attributes can be loaded without errors.
### Version 1.1.1
* Fixed overflow issue with empty resource files or empty resource type entries
* Changed `_hexdump` to behave more like `hexdump -C`
### Version 1.1.0
* Added a command-line interface - run `python3 -m rsrcfork --help` for more info
### Version 1.0.0
* Initial version

View File

@ -1,254 +0,0 @@
``rsrcfork``
============
A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and ``.rsrc`` files.
Resource forks were an important part of the Classic Mac OS, where they provided a standard way to store structured file data, metadata and application resources. This usage continued into Mac OS X (now called macOS) for backward compatibility, but over time resource forks became less commonly used in favor of simple data fork-only formats, application bundles, and extended attributes.
As of OS X 10.8 and the deprecation of the Carbon API, macOS no longer provides any officially supported APIs for using and manipulating resource data. Despite this, parts of macOS still support and use resource forks, for example to store custom file and folder icons set by the user.
Features
--------
* Pure Python, cross-platform - no native Mac APIs are used.
* Provides both a Python API and a command-line tool.
* Resource data can be read from either the resource fork or the data fork.
* On Mac systems, the correct fork is selected automatically when reading a file. This allows reading both regular resource forks and resource data stored in data forks (as with ``.rsrc`` and similar files).
* On non-Mac systems, resource forks are not available, so the data fork is always used.
* Compressed resources (supported by System 7 through Mac OS 9) are automatically decompressed.
* Only the standard System 7.0 resource compression methods are supported. Resources that use non-standard decompressors cannot be decompressed.
* Object ``repr``\s are REPL-friendly: all relevant information is displayed, and long data is truncated to avoid filling up the screen by accident.
Requirements
------------
Python 3.6 or later. No other libraries are required.
Installation
------------
``rsrcfork`` is available `on PyPI <https://pypi.org/project/rsrcfork/>`_ and can be installed using ``pip``:
.. code-block:: sh
python3 -m pip install rsrcfork
Alternatively you can download the source code manually, and run this command in the source code directory to install it:
.. code-block:: sh
python3 -m pip install .
Examples
--------
Simple example
^^^^^^^^^^^^^^
.. code-block:: python
>>> import rsrcfork
>>> rf = rsrcfork.open("/Users/Shared/Test.textClipping")
>>> rf
<rsrcfork.ResourceFile at 0x1046e6048, attributes ResourceFileAttrs.0, containing 4 resource types: [b'utxt', b'utf8', b'TEXT', b'drag']>
>>> rf[b"TEXT"]
<rsrcfork.ResourceFile._LazyResourceMap at 0x10470ed30 containing one resource: rsrcfork.Resource(type=b'TEXT', id=256, name=None, attributes=ResourceAttrs.0, data=b'Here is some text')>
Automatic selection of data/resource fork
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
>>> import rsrcfork
>>> datarf = rsrcfork.open("/System/Library/Fonts/Monaco.dfont") # Resources in data fork
>>> datarf._stream
<_io.BufferedReader name='/System/Library/Fonts/Monaco.dfont'>
>>> resourcerf = rsrcfork.open("/Users/Shared/Test.textClipping") # Resources in resource fork
>>> resourcerf._stream
<_io.BufferedReader name='/Users/Shared/Test.textClipping/..namedfork/rsrc'>
Command-line interface
^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: sh
$ python3 -m rsrcfork /Users/Shared/Test.textClipping
4 resource types:
'utxt': 1 resources:
(256): 34 bytes
'utf8': 1 resources:
(256): 17 bytes
'TEXT': 1 resources:
(256): 17 bytes
'drag': 1 resources:
(128): 64 bytes
$ python3 -m rsrcfork /Users/Shared/Test.textClipping "'TEXT' (256)"
Resource 'TEXT' (256): 17 bytes:
00000000 48 65 72 65 20 69 73 20 73 6f 6d 65 20 74 65 78 |Here is some tex|
00000010 74 |t|
00000011
Limitations
-----------
This library only understands the resource file's general structure, i. e. the type codes, IDs, attributes, and data of the resources stored in the file. The data of individual resources is provided in raw bytes form and is not processed further - the format of this data is specific to each resource type.
Definitions of common resource types can be found inside Carbon and related frameworks in Apple's macOS SDKs as ``.r`` files, a format roughly similar to C struct definitions, which is used by the ``Rez`` and ``DeRez`` command-line tools to de/compile resource data. There doesn't seem to be an exact specification of this format, and most documentation on it is only available inside old manuals for MPW (Macintosh Programmer's Workshop) or similar development tools for old Mac systems. Some macOS text editors, such as BBEdit/TextWrangler and TextMate support syntax highlighting for ``.r`` files.
Writing resource data is not supported at all.
Further info on resource files
------------------------------
For technical info and documentation about resource files and resources, see the `"resource forks" section of the mac_file_format_docs repo <https://github.com/dgelessus/mac_file_format_docs/blob/master/README.md#resource-forks>`_.
Changelog
---------
Version 1.8.0
^^^^^^^^^^^^^
* Removed the old (non-subcommand-based) CLI syntax.
* Added filtering support to the ``list`` subcommand.
* Added a ``resource-info`` subcommand to display technical information about resources (more detailed than what is displayed by ``list`` and ``read``).
* Added a ``raw-compress-info`` subcommand to display technical header information about standalone compressed resource data.
* Made the library PEP 561-compliant by adding a py.typed file.
* Fixed an incorrect ``AssertionError`` when using the ``--no-decompress`` command-line options.
Version 1.7.0
^^^^^^^^^^^^^
* Added a ``raw-decompress`` subcommand to decompress compressed resource data stored in a standalone file rather than as a resource.
* Optimized lazy loading of ``Resource`` objects. Previously, resource data would be read from disk whenever a ``Resource`` object was looked up, even if the data itself is never used. Now the resource data is only loaded once the ``data`` (or ``data_raw``) attribute is accessed.
* The same optimization applies to the ``name`` attribute, although this is unlikely to make a difference in practice.
* As a result, it is no longer possible to construct ``Resource`` objects without a resource file. This was previously possible, but had no practical use.
* Fixed a small error in the ``'dcmp' (0)`` decompression implementation.
Version 1.6.0
^^^^^^^^^^^^^
* Added a new subcommand-based command-line syntax to the ``rsrcfork`` tool, similar to other CLI tools such as ``git`` or ``diskutil``.
* This subcommand-based syntax is meant to replace the old CLI options, as the subcommand structure is easier to understand and more extensible in the future.
* Currently there are three subcommands: ``list`` to list resources in a file, ``read`` to read/display resource data, and ``read-header`` to read a resource file's header data. These subcommands can be used to perform all operations that were also available with the old CLI syntax.
* The old CLI syntax is still supported for now, but it will be removed soon.
* The new syntax no longer supports reading CLI arguments from a file (using ``@args_file.txt``), abbreviating long options (e. g. ``--no-d`` instead of ``--no-decompress``), or the short option ``-f`` instead of ``--fork``. If you have a need for any of these features, please open an issue.
Version 1.5.0
^^^^^^^^^^^^^
* Added stream-based decompression methods to the ``rsrcfork.compress`` module.
* The internal decompressor implementations have been refactored to use streams.
* This allows for incremental decompression of compressed resource data. In practice this has no noticeable effect yet, because the main ``rsrcfork`` API doesn't support incremental reading of resource data.
* Fixed the command line tool always displaying an incorrect error "Cannot specify an explicit fork when reading from stdin" when using ``-`` (stdin) as the input file.
Version 1.4.0
^^^^^^^^^^^^^
* Added ``length`` and ``length_raw`` attributes to ``Resource``. These attributes are equivalent to the ``len`` of ``data`` and ``data_raw`` respectively, but may be faster to access.
* Currently, the only optimized case is ``length`` for compressed resources, but more optimizations may be added in the future.
* Added a ``compressed_info`` attribute to ``Resource`` that provides access to the header information of compressed resources.
* Improved handling of compressed resources when listing resource files with the command line tool.
* Metadata of compressed resources is now displayed even if no decompressor implementation is available (as long as the compressed data header can be parsed).
* Performance has been improved - the data no longer needs to be fully decompressed to get its length, this information is now read from the header.
* The ``'dcmp'`` ID used to decompress each resource is displayed.
* Fixed an incorrect ``options.packages`` in ``setup.cfg``, which made the library unusable except when installing from source using ``--editable``.
* Fixed ``ResourceFile.__enter__`` returning ``None``, which made it impossible to use ``ResourceFile`` properly in a ``with`` statement.
* Fixed various minor errors reported by type checking with ``mypy``.
Version 1.3.0.post1
^^^^^^^^^^^^^^^^^^^
* Fixed an incorrect ``options.packages`` in ``setup.cfg``, which made the library unusable except when installing from source using ``--editable``.
Version 1.2.0.post1
^^^^^^^^^^^^^^^^^^^
* Fixed an incorrect ``options.packages`` in ``setup.cfg``, which made the library unusable except when installing from source using ``--editable``.
Version 1.3.0
^^^^^^^^^^^^^
* Added a ``--group`` command line option to group resources in list format by type (the default), ID, or with no grouping.
* Added a ``dump-text`` output format to the command line tool. This format is identical to ``dump``, but instead of a hex dump, it outputs the resource data as text. The data is decoded as MacRoman and classic Mac newlines (``\r``) are translated. This is useful for examining resources that contain mostly plain text.
* Changed the command line tool to sort resources by type and ID, and added a ``--no-sort`` option to disable sorting and output resources in file order (which was the previous behavior).
* Renamed the ``rsrcfork.Resource`` attributes ``resource_type`` and ``resource_id`` to ``type`` and ``id``, respectively. The old names have been deprecated and will be removed in the future, but are still supported for now.
* Changed ``--format=dump`` output to match ``hexdump -C``'s format - spacing has been adjusted, and multiple subsequent identical lines are collapsed into a single ``*``.
Version 1.2.0
^^^^^^^^^^^^^
* Added support for compressed resources.
* Compressed resource data is automatically decompressed, both in the Python API and on the command line.
* This is technically a breaking change, since in previous versions the compressed resource data was returned directly. However, this change will not affect end users negatively, unless one has already implemented custom handling for compressed resources.
* Currently, only the three standard System 7.0 compression formats (``'dcmp'`` IDs 0, 1, 2) are supported. Attempting to access a resource compressed in an unsupported format results in a ``DecompressError``.
* To access the raw resource data as stored in the file, without automatic decompression, use the ``res.data_raw`` attribute (for the Python API), or the ``--no-decompress`` option (for the command-line interface). This can be used to read the resource data in its compressed form, even if the compression format is not supported.
* Improved automatic data/resource fork selection for files whose resource fork contains invalid data.
* This fixes reading certain system files with resource data in their data fork (such as HIToolbox.rsrc in HIToolbox.framework, or .dfont fonts) on recent macOS versions (at least macOS 10.14, possibly earlier). Although these files have no resource fork, recent macOS versions will successfully open the resource fork and return garbage data for it. This behavior is now detected and handled by using the data fork instead.
* Replaced the ``rsrcfork`` parameter of ``rsrcfork.open``/``ResourceFork.open`` with a new ``fork`` parameter. ``fork`` accepts string values (like the command line ``--fork`` option) rather than ``rsrcfork``'s hard to understand ``None``/``True``/``False``.
* The old ``rsrcfork`` parameter has been deprecated and will be removed in the future, but for now it still works as before.
* Added an explanatory message when a resource filter on the command line doesn't match any resources in the resource file. Previously there would either be no output or a confusing error, depending on the selected ``--format``.
* Changed resource type codes and names to be displayed in MacRoman instead of escaping all non-ASCII characters.
* Cleaned up the resource descriptions in listings and dumps to improve readability. Previously they included some redundant or unnecessary information - for example, each resource with no attributes set would be explicitly marked as "no attributes".
* Unified the formats of resource descriptions in listings and dumps, which were previously slightly different from each other.
* Improved error messages when attempting to read multiple resources using ``--format=hex`` or ``--format=raw``.
* Fixed reading from non-seekable streams not working for some resource files.
* Removed the ``allow_seek`` parameter of ``ResourceFork.__init__`` and the ``--read-mode`` command line option. They are no longer necessary, and were already practically useless before due to non-seekable stream reading being broken.
Version 1.1.3.post1
^^^^^^^^^^^^^^^^^^^
* Fixed a formatting error in the README.rst to allow upload to PyPI.
Version 1.1.3
^^^^^^^^^^^^^
**Note: This version is not available on PyPI, see version 1.1.3.post1 changelog for details.**
* Added a setuptools entry point for the command-line interface. This allows calling it using just ``rsrcfork`` instead of ``python3 -m rsrcfork``.
* Changed the default value of ``ResourceFork.__init__``'s ``close`` keyword argument from ``True`` to ``False``. This matches the behavior of classes like ``zipfile.ZipFile`` and ``tarfile.TarFile``.
* Fixed ``ResourceFork.open`` and ``ResourceFork.__init__`` not closing their streams in some cases.
* Refactored the single ``rsrcfork.py`` file into a package. This is an internal change and should have no effect on how the ``rsrcfork`` module is used.
Version 1.1.2
^^^^^^^^^^^^^
* Added support for the resource file attributes "Resources Locked" and "Printer Driver MultiFinder Compatible" from ResEdit.
* Added more dummy constants for resource attributes with unknown meaning, so that resource files containing such attributes can be loaded without errors.
Version 1.1.1
^^^^^^^^^^^^^
* Fixed overflow issue with empty resource files or empty resource type entries
* Changed ``_hexdump`` to behave more like ``hexdump -C``
Version 1.1.0
^^^^^^^^^^^^^
* Added a command-line interface - run ``python3 -m rsrcfork --help`` for more info
Version 1.0.0
^^^^^^^^^^^^^
* Initial version

View File

@ -6,9 +6,6 @@ author = dgelessus
classifiers =
Development Status :: 4 - Beta
Intended Audience :: Developers
Topic :: Software Development :: Disassemblers
Topic :: System
Topic :: Utilities
License :: OSI Approved :: MIT License
Operating System :: MacOS :: MacOS 9
Operating System :: MacOS :: MacOS X
@ -19,12 +16,19 @@ classifiers =
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Topic :: Software Development :: Disassemblers
Topic :: System
Topic :: Utilities
Typing :: Typed
license = MIT
license_files =
LICENSE
description = A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and ``.rsrc`` files
long_description = file: README.rst
long_description_content_type = text/x-rst
description = A pure Python, cross-platform library/tool for reading Macintosh resource data, as stored in resource forks and .rsrc files
long_description = file: README.md
long_description_content_type = text/markdown
keywords =
rsrc
fork
@ -40,15 +44,15 @@ keywords =
zip_safe = False
python_requires = >=3.6
packages = find:
package_dir =
= src
[options.package_data]
rsrcfork =
py.typed
[options.packages.find]
include =
rsrcfork
rsrcfork.*
where = src
[options.entry_points]
console_scripts =
@ -62,19 +66,28 @@ extend-exclude =
# The following issues are ignored because they do not match our code style:
ignore =
E226, # missing whitespace around arithmetic operator
E261, # at least two spaces before inline comment
E501, # line too long
W293, # blank line contains whitespace
W503, # line break before binary operator
# flake8-tabs configuration
use-flake8-tabs = true
blank-lines-indent = always
indent-tabs-def = 1
# These E1 checks report many false positives for code that is (consistently) indented with tabs alone.
# indentation contains mixed spaces and tabs
E101,
# over-indented
E117,
# continuation line over-indented for hanging indent
E126,
# missing whitespace around arithmetic operator
E226,
# at least two spaces before inline comment
E261,
# line too long
E501,
# indentation contains tabs
W191,
# blank line contains whitespace
W293,
# line break before binary operator
W503,
[mypy]
files=rsrcfork/**/*.py
files=src/**/*.py
python_version = 3.6
disallow_untyped_calls = True

View File

@ -2,7 +2,7 @@
# To release a new version:
# * Remove the .dev suffix from the version number in this file.
# * Update the changelog in the README.rst (rename the "next version" section to the correct version number).
# * Update the changelog in the README.md (rename the "next version" section to the correct version number).
# * Remove the ``dist`` directory (if it exists) to clean up any old release files.
# * Run ``python3 setup.py sdist bdist_wheel`` to build the release files.
# * Run ``python3 -m twine check dist/*`` to check the release files.
@ -12,15 +12,15 @@
# * Fast-forward the release branch to the new release commit.
# * Push the master and release branches.
# * Upload the release files to PyPI using ``python3 -m twine upload dist/*``.
# * On the GitHub repo's Releases page, edit the new release tag and add the relevant changelog section from the README.rst. (Note: The README is in reStructuredText format, but GitHub's release notes use Markdown, so it may be necessary to adjust the markup syntax.)
# * On the GitHub repo's Releases page, edit the new release tag and add the relevant changelog section from the README.md.
# After releasing:
# * (optional) Remove the build and dist directories from the previous release as they are no longer needed.
# * Bump the version number in this file to the next version and add a .dev suffix.
# * Add a new empty section for the next version to the README.rst changelog.
# * Add a new empty section for the next version to the README.md changelog.
# * Commit and push the changes to master.
__version__ = "1.8.0"
__version__ = "1.8.1.dev"
__all__ = [
"Resource",

View File

@ -1,6 +1,8 @@
import argparse
import enum
import io
import itertools
import shutil
import sys
import typing
@ -29,6 +31,22 @@ def decompose_flags(value: F) -> typing.Sequence[F]:
return [bit for bit in type(value) if bit in value]
def join_flag_names(flags: typing.Iterable[F], sep: str = " | ") -> str:
"""Join an iterable of enum.Flag instances into a string representation based on their names.
All values in ``flags`` should be named constants.
"""
names: typing.List[str] = []
for flag in flags:
name = flag.name
if name is None:
names.append(str(flag))
else:
names.append(name)
return sep.join(names)
def is_printable(char: str) -> bool:
"""Determine whether a character is printable for our purposes.
@ -183,32 +201,45 @@ def filter_resources(rf: api.ResourceFile, filters: typing.Sequence[str]) -> typ
yield res
def hexdump(data: bytes) -> None:
def hexdump_stream(stream: typing.BinaryIO) -> typing.Iterable[str]:
last_line = None
asterisk_shown = False
for i in range(0, len(data), 16):
line = data[i:i + 16]
line = stream.read(16)
i = 0
while line:
# If the same 16-byte lines appear multiple times, print only the first one, and replace all further lines with a single line with an asterisk.
# This is unambiguous - to find out how many lines were collapsed this way, the user can compare the addresses of the lines before and after the asterisk.
if line == last_line:
if not asterisk_shown:
print("*")
yield "*"
asterisk_shown = True
else:
line_hex_left = " ".join(f"{byte:02x}" for byte in line[:8])
line_hex_right = " ".join(f"{byte:02x}" for byte in line[8:])
line_char = line.decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES)
print(f"{i:08x} {line_hex_left:<{8*2+7}} {line_hex_right:<{8*2+7}} |{line_char}|")
yield f"{i:08x} {line_hex_left:<{8*2+7}} {line_hex_right:<{8*2+7}} |{line_char}|"
asterisk_shown = False
last_line = line
i += len(line)
line = stream.read(16)
if data:
print(f"{len(data):08x}")
if i:
yield f"{i:08x}"
def raw_hexdump(data: bytes) -> None:
for i in range(0, len(data), 16):
print(" ".join(f"{byte:02x}" for byte in data[i:i + 16]))
def hexdump(data: bytes) -> typing.Iterable[str]:
yield from hexdump_stream(io.BytesIO(data))
def raw_hexdump_stream(stream: typing.BinaryIO) -> typing.Iterable[str]:
line = stream.read(16)
while line:
yield " ".join(f"{byte:02x}" for byte in line)
line = stream.read(16)
def raw_hexdump(data: bytes) -> typing.Iterable[str]:
yield from raw_hexdump_stream(io.BytesIO(data))
def translate_text(data: bytes) -> str:
@ -239,7 +270,7 @@ def describe_resource(res: api.Resource, *, include_type: bool, decompress: bool
attrs = decompose_flags(res.attributes)
if attrs:
content_desc_parts.append(" | ".join(attr.name for attr in attrs))
content_desc_parts.append(join_flag_names(attrs))
content_desc = ", ".join(content_desc_parts)
@ -267,75 +298,80 @@ def show_filtered_resources(resources: typing.Sequence[api.Resource], format: st
for res in resources:
if decompress:
data = res.data
open_func = res.open
else:
data = res.data_raw
open_func = res.open_raw
if format in ("dump", "dump-text"):
# Human-readable info and hex or text dump
desc = describe_resource(res, include_type=True, decompress=decompress)
print(f"Resource {desc}:")
if format == "dump":
hexdump(data)
elif format == "dump-text":
print(translate_text(data))
else:
raise AssertionError(f"Unhandled format: {format!r}")
print()
elif format == "hex":
# Data only as hex
raw_hexdump(data)
elif format == "raw":
# Data only as raw bytes
sys.stdout.buffer.write(data)
elif format == "derez":
# Like DeRez with no resource definitions
attrs = list(decompose_flags(res.attributes))
if decompress and api.ResourceAttrs.resCompressed in attrs:
attrs.remove(api.ResourceAttrs.resCompressed)
attrs_comment = " /* was compressed */"
else:
attrs_comment = ""
attr_descs_with_none = [_REZ_ATTR_NAMES[attr] for attr in attrs]
if None in attr_descs_with_none:
attr_descs = [f"${res.attributes.value:02X}"]
else:
attr_descs = typing.cast(typing.List[str], attr_descs_with_none)
parts = [str(res.id)]
if res.name is not None:
parts.append(bytes_quote(res.name, '"'))
parts += attr_descs
quoted_restype = bytes_quote(res.type, "'")
print(f"data {quoted_restype} ({', '.join(parts)}{attrs_comment}) {{")
for i in range(0, len(data), 16):
# Two-byte grouping is really annoying to implement.
groups = []
for j in range(0, 16, 2):
if i+j >= len(data):
break
elif i+j+1 >= len(data):
groups.append(f"{data[i+j]:02X}")
else:
groups.append(f"{data[i+j]:02X}{data[i+j+1]:02X}")
with open_func() as f:
if format in ("dump", "dump-text"):
# Human-readable info and hex or text dump
desc = describe_resource(res, include_type=True, decompress=decompress)
print(f"Resource {desc}:")
if format == "dump":
for line in hexdump_stream(f):
print(line)
elif format == "dump-text":
print(translate_text(f.read()))
else:
raise AssertionError(f"Unhandled format: {format!r}")
print()
elif format == "hex":
# Data only as hex
s = f'$"{" ".join(groups)}"'
comment = "/* " + data[i:i + 16].decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES) + " */"
print(f"\t{s:<54s}{comment}")
print("};")
print()
else:
raise ValueError(f"Unhandled output format: {format}")
for line in raw_hexdump_stream(f):
print(line)
elif format == "raw":
# Data only as raw bytes
shutil.copyfileobj(f, sys.stdout.buffer)
elif format == "derez":
# Like DeRez with no resource definitions
attrs = list(decompose_flags(res.attributes))
if decompress and api.ResourceAttrs.resCompressed in attrs:
attrs.remove(api.ResourceAttrs.resCompressed)
attrs_comment = " /* was compressed */"
else:
attrs_comment = ""
attr_descs_with_none = [_REZ_ATTR_NAMES[attr] for attr in attrs]
if None in attr_descs_with_none:
attr_descs = [f"${res.attributes.value:02X}"]
else:
attr_descs = typing.cast(typing.List[str], attr_descs_with_none)
parts = [str(res.id)]
if res.name is not None:
parts.append(bytes_quote(res.name, '"'))
parts += attr_descs
quoted_restype = bytes_quote(res.type, "'")
print(f"data {quoted_restype} ({', '.join(parts)}{attrs_comment}) {{")
bytes_line = f.read(16)
while bytes_line:
# Two-byte grouping is really annoying to implement.
groups = []
for j in range(0, 16, 2):
if j >= len(bytes_line):
break
elif j+1 >= len(bytes_line):
groups.append(f"{bytes_line[j]:02X}")
else:
groups.append(f"{bytes_line[j]:02X}{bytes_line[j+1]:02X}")
s = f'$"{" ".join(groups)}"'
comment = "/* " + bytes_line.decode(_TEXT_ENCODING).translate(_TRANSLATE_NONPRINTABLES) + " */"
print(f"\t{s:<54s}{comment}")
bytes_line = f.read(16)
print("};")
print()
else:
raise ValueError(f"Unhandled output format: {format}")
def list_resources(resources: typing.List[api.Resource], *, sort: bool, group: str, decompress: bool) -> None:
@ -465,18 +501,20 @@ def do_read_header(ns: argparse.Namespace) -> typing.NoReturn:
if ns.format == "dump":
dump_func = hexdump
elif ns.format == "dump-text":
def dump_func(data: bytes) -> None:
print(translate_text(data))
def dump_func(data: bytes) -> typing.Iterable[str]:
yield translate_text(data)
else:
raise AssertionError(f"Unhandled --format: {ns.format!r}")
if ns.part in {"system", "all"}:
print("System-reserved header data:")
dump_func(rf.header_system_data)
for line in dump_func(rf.header_system_data):
print(line)
if ns.part in {"application", "all"}:
print("Application-specific header data:")
dump_func(rf.header_application_data)
for line in dump_func(rf.header_application_data):
print(line)
elif ns.format in {"hex", "raw"}:
if ns.part == "system":
data = rf.header_system_data
@ -488,7 +526,8 @@ def do_read_header(ns: argparse.Namespace) -> typing.NoReturn:
raise AssertionError(f"Unhandled --part: {ns.part!r}")
if ns.format == "hex":
raw_hexdump(data)
for line in raw_hexdump(data):
print(line)
elif ns.format == "raw":
sys.stdout.buffer.write(data)
else:
@ -502,17 +541,19 @@ def do_read_header(ns: argparse.Namespace) -> typing.NoReturn:
def do_info(ns: argparse.Namespace) -> typing.NoReturn:
with open_resource_file(ns.file, fork=ns.fork) as rf:
print("System-reserved header data:")
hexdump(rf.header_system_data)
for line in hexdump(rf.header_system_data):
print(line)
print()
print("Application-specific header data:")
hexdump(rf.header_application_data)
for line in hexdump(rf.header_application_data):
print(line)
print()
print(f"Resource data starts at {rf.data_offset:#x} and is {rf.data_length:#x} bytes long")
print(f"Resource map starts at {rf.map_offset:#x} and is {rf.map_length:#x} bytes long")
attrs = decompose_flags(rf.file_attributes)
if attrs:
attrs_desc = " | ".join(attr.name for attr in attrs)
attrs_desc = join_flag_names(attrs)
else:
attrs_desc = "(none)"
print(f"Resource map attributes: {attrs_desc}")
@ -557,7 +598,7 @@ def do_resource_info(ns: argparse.Namespace) -> typing.NoReturn:
attrs = decompose_flags(res.attributes)
if attrs:
attrs_desc = " | ".join(attr.name for attr in attrs)
attrs_desc = join_flag_names(attrs)
else:
attrs_desc = "(none)"
print(f"\tAttributes: {attrs_desc}")

93
src/rsrcfork/_io_utils.py Normal file
View File

@ -0,0 +1,93 @@
"""A collection of utility functions and classes related to IO streams. For internal use only."""
import io
import typing
def read_exact(stream: typing.BinaryIO, byte_count: int) -> bytes:
"""Read byte_count bytes from the stream and raise an exception if too few bytes are read (i. e. if EOF was hit prematurely).
:param stream: The stream to read from.
:param byte_count: The number of bytes to read.
:return: The read data, which is exactly ``byte_count`` bytes long.
:raise EOFError: If not enough data could be read from the stream.
"""
data = stream.read(byte_count)
if len(data) != byte_count:
raise EOFError(f"Attempted to read {byte_count} bytes of data, but only got {len(data)} bytes")
return data
if typing.TYPE_CHECKING:
class PeekableIO(typing.Protocol):
"""Minimal protocol for binary IO streams that support the peek method.
The peek method is supported by various standard Python binary IO streams, such as io.BufferedReader. If a stream does not natively support the peek method, it may be wrapped using the custom helper function make_peekable.
"""
def readable(self) -> bool:
...
def read(self, size: typing.Optional[int] = ...) -> bytes:
...
def peek(self, size: int = ...) -> bytes:
...
class _PeekableIOWrapper(object):
"""Wrapper class to add peek support to an existing stream. Do not instantiate this class directly, use the make_peekable function instead.
Python provides a standard io.BufferedReader class, which supports the peek method. However, according to its documentation, it only supports wrapping io.RawIOBase subclasses, and not streams which are already otherwise buffered.
Warning: this class does not perform any buffering of its own, outside of what is required to make peek work. It is strongly recommended to only wrap streams that are already buffered or otherwise fast to read from. In particular, raw streams (io.RawIOBase subclasses) should be wrapped using io.BufferedReader instead.
"""
_wrapped: typing.BinaryIO
_readahead: bytes
def __init__(self, wrapped: typing.BinaryIO) -> None:
super().__init__()
self._wrapped = wrapped
self._readahead = b""
def readable(self) -> bool:
return self._wrapped.readable()
def read(self, size: typing.Optional[int] = None) -> bytes:
if size is None or size < 0:
ret = self._readahead + self._wrapped.read()
self._readahead = b""
elif size <= len(self._readahead):
ret = self._readahead[:size]
self._readahead = self._readahead[size:]
else:
ret = self._readahead + self._wrapped.read(size - len(self._readahead))
self._readahead = b""
return ret
def peek(self, size: int = -1) -> bytes:
if not self._readahead:
self._readahead = self._wrapped.read(io.DEFAULT_BUFFER_SIZE if size < 0 else size)
return self._readahead
def make_peekable(stream: typing.BinaryIO) -> "PeekableIO":
"""Wrap an arbitrary binary IO stream so that it supports the peek method.
The stream is wrapped as efficiently as possible (or not at all if it already supports the peek method). However, in the worst case a custom wrapper class needs to be used, which may not be particularly efficient and only supports a very minimal interface. The only methods that are guaranteed to exist on the returned stream are readable, read, and peek.
"""
if hasattr(stream, "peek"):
# Stream is already peekable, nothing to be done.
return typing.cast("PeekableIO", stream)
elif not typing.TYPE_CHECKING and isinstance(stream, io.RawIOBase):
# This branch is skipped when type checking - mypy incorrectly warns about this code being unreachable, because it thinks that a typing.BinaryIO cannot be an instance of io.RawIOBase.
# Raw IO streams can be wrapped efficiently using BufferedReader.
return io.BufferedReader(stream)
else:
# Other streams need to be wrapped using our custom wrapper class.
return _PeekableIOWrapper(stream)

View File

@ -8,6 +8,7 @@ import types
import typing
import warnings
from . import _io_utils
from . import compress
# The formats of all following structures is as described in the Inside Macintosh book (see module docstring).
@ -108,6 +109,7 @@ class Resource(object):
_name: typing.Optional[bytes]
attributes: ResourceAttrs
data_raw_offset: int
_length_raw: int
_data_raw: bytes
_compressed_info: compress.common.CompressedHeaderInfo
_data_decompressed: bytes
@ -129,10 +131,12 @@ class Resource(object):
def __repr__(self) -> str:
try:
data = self.data
with self.open() as f:
data = f.read(33)
except compress.DecompressError:
decompress_ok = False
data = self.data_raw
with self.open_raw() as f:
data = f.read(33)
else:
decompress_ok = True
@ -148,12 +152,12 @@ class Resource(object):
@property
def resource_type(self) -> bytes:
warnings.warn(DeprecationWarning("The resource_type attribute has been deprecated and will be removed in a future version. Please use the type attribute instead."))
warnings.warn(DeprecationWarning("The resource_type attribute has been deprecated and will be removed in a future version. Please use the type attribute instead."), stacklevel=2)
return self.type
@property
def resource_id(self) -> int:
warnings.warn(DeprecationWarning("The resource_id attribute has been deprecated and will be removed in a future version. Please use the id attribute instead."))
warnings.warn(DeprecationWarning("The resource_id attribute has been deprecated and will be removed in a future version. Please use the id attribute instead."), stacklevel=2)
return self.id
@property
@ -175,11 +179,25 @@ class Resource(object):
try:
return self._data_raw
except AttributeError:
self._resfile._stream.seek(self._resfile.data_offset + self.data_raw_offset)
(data_raw_length,) = self._resfile._stream_unpack(STRUCT_RESOURCE_DATA_HEADER)
self._data_raw = self._resfile._read_exact(data_raw_length)
self._resfile._stream.seek(self._resfile.data_offset + self.data_raw_offset + STRUCT_RESOURCE_DATA_HEADER.size)
self._data_raw = _io_utils.read_exact(self._resfile._stream, self.length_raw)
return self._data_raw
def open_raw(self) -> typing.BinaryIO:
"""Create a binary file-like object that provides access to this resource's raw data, which may be compressed.
The returned stream is read-only and seekable.
Multiple resource data streams can be opened at the same time for the same resource or for different resources in the same file,
without interfering with each other.
If a :class:`ResourceFile` is closed,
all resource data streams for that file may become unusable.
This method is recommended over :attr:`data_raw` if the data is accessed incrementally or only partially,
because the stream API does not require the entire resource data to be read in advance.
"""
return io.BytesIO(self.data_raw)
@property
def compressed_info(self) -> typing.Optional[compress.common.CompressedHeaderInfo]:
"""The compressed resource header information, or None if this resource is not compressed.
@ -191,7 +209,8 @@ class Resource(object):
try:
return self._compressed_info
except AttributeError:
self._compressed_info = compress.common.CompressedHeaderInfo.parse(self.data_raw)
with self.open_raw() as f:
self._compressed_info = compress.common.CompressedHeaderInfo.parse_stream(f)
return self._compressed_info
else:
return None
@ -203,7 +222,12 @@ class Resource(object):
Accessing this attribute may be faster than computing len(self.data_raw) manually.
"""
return len(self.data_raw)
try:
return self._length_raw
except AttributeError:
self._resfile._stream.seek(self._resfile.data_offset + self.data_raw_offset)
(self._length_raw,) = self._resfile._stream_unpack(STRUCT_RESOURCE_DATA_HEADER)
return self._length_raw
@property
def length(self) -> int:
@ -228,10 +252,27 @@ class Resource(object):
try:
return self._data_decompressed
except AttributeError:
self._data_decompressed = compress.decompress_parsed(self.compressed_info, self.data_raw[self.compressed_info.header_length:])
with self.open_raw() as compressed_f:
compressed_f.seek(self.compressed_info.header_length)
self._data_decompressed = b"".join(compress.decompress_stream_parsed(self.compressed_info, compressed_f))
return self._data_decompressed
else:
return self.data_raw
def open(self) -> typing.BinaryIO:
"""Create a binary file-like object that provides access to this resource's data, decompressed if necessary.
The returned stream is read-only and seekable.
Multiple resource data streams can be opened at the same time for the same resource or for different resources in the same file,
without interfering with each other.
If a :class:`ResourceFile` is closed,
all resource data streams for that file may become unusable.
This method is recommended over :attr:`data` if the data is accessed incrementally or only partially,
because the stream API does not require the entire resource data to be read (and possibly decompressed) in advance.
"""
return io.BytesIO(self.data)
class _LazyResourceMap(typing.Mapping[int, Resource]):
@ -328,7 +369,7 @@ class ResourceFile(typing.Mapping[bytes, typing.Mapping[int, Resource]], typing.
fork = "rsrc"
else:
fork = "data"
warnings.warn(DeprecationWarning(f"The rsrcfork parameter has been deprecated and will be removed in a future version. Please use fork={fork!r} instead of rsrcfork={kwargs['rsrcfork']!r}."))
warnings.warn(DeprecationWarning(f"The rsrcfork parameter has been deprecated and will be removed in a future version. Please use fork={fork!r} instead of rsrcfork={kwargs['rsrcfork']!r}."), stacklevel=2)
del kwargs["rsrcfork"]
if fork == "auto":
@ -393,10 +434,10 @@ class ResourceFile(typing.Mapping[bytes, typing.Mapping[int, Resource]], typing.
def _read_exact(self, byte_count: int) -> bytes:
"""Read byte_count bytes from the stream and raise an exception if too few bytes are read (i. e. if EOF was hit prematurely)."""
data = self._stream.read(byte_count)
if len(data) != byte_count:
raise InvalidResourceFileError(f"Attempted to read {byte_count} bytes of data, but only got {len(data)} bytes")
return data
try:
return _io_utils.read_exact(self._stream, byte_count)
except EOFError as e:
raise InvalidResourceFileError(str(e))
def _stream_unpack(self, st: struct.Struct) -> tuple:
"""Unpack data from the stream according to the struct st. The number of bytes to read is determined using st.size, so variable-sized structs cannot be used with this method."""

View File

@ -2,6 +2,8 @@ import io
import struct
import typing
from .. import _io_utils
class DecompressError(Exception):
"""Raised when resource data decompression fails, because the data is invalid or the compression type is not supported."""
@ -44,8 +46,8 @@ class CompressedHeaderInfo(object):
raise DecompressError("Invalid header")
if signature != COMPRESSED_SIGNATURE:
raise DecompressError(f"Invalid signature: {signature!r}, expected {COMPRESSED_SIGNATURE!r}")
if header_length != 0x12:
raise DecompressError(f"Unsupported header length: 0x{header_length:>04x}, expected 0x12")
if header_length not in {0, 0x12}:
raise DecompressError(f"Unsupported header length value: 0x{header_length:>04x}, expected 0x12 or 0")
if compression_type == COMPRESSED_TYPE_8:
working_buffer_fractional_size, expansion_buffer_size, dcmp_id, reserved = STRUCT_COMPRESSED_TYPE_8_HEADER.unpack(remainder)
@ -105,87 +107,13 @@ class CompressedType9HeaderInfo(CompressedHeaderInfo):
return f"{type(self).__qualname__}(header_length={self.header_length}, compression_type=0x{self.compression_type:>04x}, decompressed_length={self.decompressed_length}, dcmp_id={self.dcmp_id}, parameters={self.parameters!r})"
if typing.TYPE_CHECKING:
class PeekableIO(typing.Protocol):
"""Minimal protocol for binary IO streams that support the peek method.
The peek method is supported by various standard Python binary IO streams, such as io.BufferedReader. If a stream does not natively support the peek method, it may be wrapped using the custom helper function make_peekable.
"""
def readable(self) -> bool:
...
def read(self, size: typing.Optional[int] = ...) -> bytes:
...
def peek(self, size: int = ...) -> bytes:
...
class _PeekableIOWrapper(object):
"""Wrapper class to add peek support to an existing stream. Do not instantiate this class directly, use the make_peekable function instead.
Python provides a standard io.BufferedReader class, which supports the peek method. However, according to its documentation, it only supports wrapping io.RawIOBase subclasses, and not streams which are already otherwise buffered.
Warning: this class does not perform any buffering of its own, outside of what is required to make peek work. It is strongly recommended to only wrap streams that are already buffered or otherwise fast to read from. In particular, raw streams (io.RawIOBase subclasses) should be wrapped using io.BufferedReader instead.
"""
_wrapped: typing.BinaryIO
_readahead: bytes
def __init__(self, wrapped: typing.BinaryIO) -> None:
super().__init__()
self._wrapped = wrapped
self._readahead = b""
def readable(self) -> bool:
return self._wrapped.readable()
def read(self, size: typing.Optional[int] = None) -> bytes:
if size is None or size < 0:
ret = self._readahead + self._wrapped.read()
self._readahead = b""
elif size <= len(self._readahead):
ret = self._readahead[:size]
self._readahead = self._readahead[size:]
else:
ret = self._readahead + self._wrapped.read(size - len(self._readahead))
self._readahead = b""
return ret
def peek(self, size: int = -1) -> bytes:
if not self._readahead:
self._readahead = self._wrapped.read(io.DEFAULT_BUFFER_SIZE if size < 0 else size)
return self._readahead
def make_peekable(stream: typing.BinaryIO) -> "PeekableIO":
"""Wrap an arbitrary binary IO stream so that it supports the peek method.
The stream is wrapped as efficiently as possible (or not at all if it already supports the peek method). However, in the worst case a custom wrapper class needs to be used, which may not be particularly efficient and only supports a very minimal interface. The only methods that are guaranteed to exist on the returned stream are readable, read, and peek.
"""
if hasattr(stream, "peek"):
# Stream is already peekable, nothing to be done.
return typing.cast("PeekableIO", stream)
elif not typing.TYPE_CHECKING and isinstance(stream, io.RawIOBase):
# This branch is skipped when type checking - mypy incorrectly warns about this code being unreachable, because it thinks that a typing.BinaryIO cannot be an instance of io.RawIOBase.
# Raw IO streams can be wrapped efficiently using BufferedReader.
return io.BufferedReader(stream)
else:
# Other streams need to be wrapped using our custom wrapper class.
return _PeekableIOWrapper(stream)
def read_exact(stream: typing.BinaryIO, byte_count: int) -> bytes:
"""Read byte_count bytes from the stream and raise an exception if too few bytes are read (i. e. if EOF was hit prematurely)."""
data = stream.read(byte_count)
if len(data) != byte_count:
raise DecompressError(f"Attempted to read {byte_count} bytes of data, but only got {len(data)} bytes")
return data
try:
return _io_utils.read_exact(stream, byte_count)
except EOFError as e:
raise DecompressError(str(e))
def read_variable_length_integer(stream: typing.BinaryIO) -> int:

View File

@ -2,6 +2,7 @@ import enum
import struct
import typing
from .. import _io_utils
from . import common
@ -73,7 +74,7 @@ def _split_bits(i: int) -> typing.Tuple[bool, bool, bool, bool, bool, bool, bool
)
def _decompress_untagged(stream: "common.PeekableIO", decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool = False) -> typing.Iterator[bytes]:
def _decompress_untagged(stream: "_io_utils.PeekableIO", decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool = False) -> typing.Iterator[bytes]:
while True: # Loop is terminated when EOF is reached.
table_index_data = stream.read(1)
if not table_index_data:
@ -93,7 +94,7 @@ def _decompress_untagged(stream: "common.PeekableIO", decompressed_length: int,
yield table[table_index]
def _decompress_tagged(stream: "common.PeekableIO", decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool = False) -> typing.Iterator[bytes]:
def _decompress_tagged(stream: "_io_utils.PeekableIO", decompressed_length: int, table: typing.Sequence[bytes], *, debug: bool = False) -> typing.Iterator[bytes]:
while True: # Loop is terminated when EOF is reached.
tag_data = stream.read(1)
if not tag_data:
@ -174,4 +175,4 @@ def decompress_stream(header_info: common.CompressedHeaderInfo, stream: typing.B
else:
decompress_func = _decompress_untagged
yield from decompress_func(common.make_peekable(stream), header_info.decompressed_length, table, debug=debug)
yield from decompress_func(_io_utils.make_peekable(stream), header_info.decompressed_length, table, debug=debug)

View File

@ -129,7 +129,36 @@ class ResourceFileReadTests(unittest.TestCase):
self.assertEqual(actual_res.name, None)
self.assertEqual(actual_res.attributes, rsrcfork.ResourceAttrs(0))
self.assertEqual(actual_res.data, expected_data)
with actual_res.open() as f:
self.assertEqual(f.read(10), expected_data[:10])
self.assertEqual(f.read(5), expected_data[10:15])
self.assertEqual(f.read(), expected_data[15:])
f.seek(0)
self.assertEqual(f.read(), expected_data)
self.assertEqual(actual_res.compressed_info, None)
actual_res_1 = rf[b"TEXT"][256]
expected_data_1 = TEXTCLIPPING_RESOURCES[b"TEXT"][256]
actual_res_2 = rf[b"utxt"][256]
expected_data_2 = TEXTCLIPPING_RESOURCES[b"utxt"][256]
with self.subTest(stream_test="multiple streams for the same resource"):
with actual_res_1.open() as f1, actual_res_1.open() as f2:
f1.seek(5)
f2.seek(10)
self.assertEqual(f1.read(10), expected_data_1[5:15])
self.assertEqual(f2.read(10), expected_data_1[10:20])
self.assertEqual(f1.read(), expected_data_1[15:])
self.assertEqual(f2.read(), expected_data_1[20:])
with self.subTest(stream_test="multiple streams for different resources"):
with actual_res_1.open() as f1, actual_res_2.open() as f2:
f1.seek(5)
f2.seek(10)
self.assertEqual(f1.read(10), expected_data_1[5:15])
self.assertEqual(f2.read(10), expected_data_2[10:20])
self.assertEqual(f1.read(), expected_data_1[15:])
self.assertEqual(f2.read(), expected_data_2[20:])
def test_textclipping_seekable_stream(self) -> None:
with TEXTCLIPPING_RSRC_FILE.open("rb") as f:
@ -239,6 +268,8 @@ class ResourceFileReadTests(unittest.TestCase):
self.assertEqual(actual_res.name, expected_name)
self.assertEqual(actual_res.attributes, expected_attrs)
self.assertEqual(actual_res.data, expected_data)
with actual_res.open() as f:
self.assertEqual(f.read(), expected_data)
self.assertEqual(actual_res.compressed_info, None)
def test_compress_compare(self) -> None:
@ -274,6 +305,14 @@ class ResourceFileReadTests(unittest.TestCase):
# The compressed resource's (automatically decompressed) data must match the uncompressed data.
self.assertEqual(compressed_res.data, uncompressed_res.data)
self.assertEqual(compressed_res.length, uncompressed_res.length)
with compressed_res.open() as compressed_f, uncompressed_res.open() as uncompressed_f:
compressed_f.seek(15)
uncompressed_f.seek(15)
self.assertEqual(compressed_f.read(10), uncompressed_f.read(10))
self.assertEqual(compressed_f.read(), uncompressed_f.read())
compressed_f.seek(0)
uncompressed_f.seek(0)
self.assertEqual(compressed_f.read(), uncompressed_f.read())
if rsrcfork.ResourceAttrs.resCompressed in compressed_res.attributes:
# Resources with the compressed attribute must expose correct compression metadata.

View File

@ -1,6 +1,7 @@
[tox]
# When adding a new Python version here, please also update the list of Python versions called by the GitHub Actions workflow (.github/workflows/ci.yml).
envlist = py{36,37,38},flake8,mypy,package
# When updating the Python versions here,
# please also update the corresponding Python versions in the GitHub Actions workflow (.github/workflows/ci.yml).
envlist = py{36,311},flake8,mypy,package
[testenv]
commands = python -m unittest discover --start-directory ./tests
@ -9,7 +10,6 @@ commands = python -m unittest discover --start-directory ./tests
deps =
flake8 >= 3.8.0
flake8-bugbear
flake8-tabs
commands = flake8
[testenv:mypy]