added small doc about tar 'pax header' format

This commit is contained in:
Denis Vlasenko 2006-11-26 17:07:38 +00:00
parent 664733f1a3
commit ec0c920a78

239
docs/tar_pax.txt Normal file
View File

@ -0,0 +1,239 @@
'pax headers' is POSIX 2003 (iirc) addition designed to fix
tar format limitations - older tar format has fixed fields
for everything (filename, uid, filesize etc) which can overflow.
pax Header Block
The pax header block shall be identical to the ustar header block
described in ustar Interchange Format, except that two additional
typeflag values are defined:
x
Represents extended header records for the following file in
the archive (which shall have its own ustar header block).
g
Represents global extended header records for the following
files in the archive. Each value shall affect all subsequent files
that do not override that value in their own extended header
record and until another global extended header record is reached
that provides another value for the same field. The typeflag g
global headers should not be used with interchange media that
could suffer partial data loss in transporting the archive.
For both of these types, the size field shall be the size of the
extended header records in octets. The other fields in the header
block are not meaningful to this version of the pax utility.
However, if this archive is read by a pax utility conforming to
the ISO POSIX-2:1993 standard, the header block fields are used to
create a regular file that contains the extended header records as
data. Therefore, header block field values should be selected to
provide reasonable file access to this regular file.
A further difference from the ustar header block is that data
blocks for files of typeflag 1 (the digit one) (hard link) may be
included, which means that the size field may be greater than
zero.
pax Extended Header
An extended header shall consist of one or more records, each
constructed as follows:
"%d %s=%s\n", <length>, <keyword>, <value>
The <length> field shall be the decimal length of the extended
header record in octets, including length string itself and the
trailing <newline>.
[skip]
atime
The file access time for the following file(s), equivalent to
the value of the st_atime member of the stat structure for a file,
as described by the stat() function. The access time shall be
restored if the process has the appropriate privilege required to
do so. The format of the <value> shall be as described in pax
Extended Header File Times.
charset
The name of the character set used to encode the data in the
following file(s).
The encoding is included in an extended header for information
only; when pax is used as described in IEEE Std 1003.1-2001, it
shall not translate the file data into any other encoding. The
BINARY entry indicates unencoded binary data.
When used in write or copy mode, it is implementation-defined
whether pax includes a charset extended header record for a file.
comment
A series of characters used as a comment. All characters in
the <value> field shall be ignored by pax.
gid
The group ID of the group that owns the file, expressed as a
decimal number using digits from the ISO/IEC 646:1991 standard.
This record shall override the gid field in the following header
block(s). When used in write or copy mode, pax shall include a gid
extended header record for each file whose group ID is greater
than 2097151 (octal 7777777).
gname
The group of the file(s), formatted as a group name in the
group database. This record shall override the gid and gname
fields in the following header block(s), and any gid extended
header record. When used in read, copy, or list mode, pax shall
translate the name from the UTF-8 encoding in the header record to
the character set appropriate for the group database on the
receiving system. If any of the UTF-8 characters cannot be
translated, and if the -o invalid= UTF-8 option is not specified,
the results are implementation-defined. When used in write or copy
mode, pax shall include a gname extended header record for each
file whose group name cannot be represented entirely with the
letters and digits of the portable character set.
linkpath
The pathname of a link being created to another file, of any
type, previously archived. This record shall override the linkname
field in the following ustar header block(s). The following ustar
header block shall determine the type of link created. If typeflag
of the following header block is 1, it shall be a hard link. If
typeflag is 2, it shall be a symbolic link and the linkpath value
shall be the contents of the symbolic link. The pax utility shall
translate the name of the link (contents of the symbolic link)
from the UTF-8 encoding to the character set appropriate for the
local file system. When used in write or copy mode, pax shall
include a linkpath extended header record for each link whose
pathname cannot be represented entirely with the members of the
portable character set other than NUL.
mtime
The file modification time of the following file(s),
equivalent to the value of the st_mtime member of the stat
structure for a file, as described in the stat() function. This
record shall override the mtime field in the following header
block(s). The modification time shall be restored if the process
has the appropriate privilege required to do so. The format of the
<value> shall be as described in pax Extended Header File Times.
path
The pathname of the following file(s). This record shall
override the name and prefix fields in the following header
block(s). The pax utility shall translate the pathname of the file
from the UTF-8 encoding to the character set appropriate for the
local file system.
When used in write or copy mode, pax shall include a path
extended header record for each file whose pathname cannot be
represented entirely with the members of the portable character
set other than NUL.
realtime.any
The keywords prefixed by "realtime." are reserved for future
standardization.
security.any
The keywords prefixed by "security." are reserved for future
standardization.
size
The size of the file in octets, expressed as a decimal number
using digits from the ISO/IEC 646:1991 standard. This record shall
override the size field in the following header block(s). When
used in write or copy mode, pax shall include a size extended
header record for each file with a size value greater than
8589934591 (octal 77777777777).
uid
The user ID of the file owner, expressed as a decimal number
using digits from the ISO/IEC 646:1991 standard. This record shall
override the uid field in the following header block(s). When used
in write or copy mode, pax shall include a uid extended header
record for each file whose owner ID is greater than 2097151 (octal
7777777).
uname
The owner of the following file(s), formatted as a user name
in the user database. This record shall override the uid and uname
fields in the following header block(s), and any uid extended
header record. When used in read, copy, or list mode, pax shall
translate the name from the UTF-8 encoding in the header record to
the character set appropriate for the user database on the
receiving system. If any of the UTF-8 characters cannot be
translated, and if the -o invalid= UTF-8 option is not specified,
the results are implementation-defined. When used in write or copy
mode, pax shall include a uname extended header record for each
file whose user name cannot be represented entirely with the
letters and digits of the portable character set.
If the <value> field is zero length, it shall delete any header
block field, previously entered extended header value, or global
extended header value of the same name.
If a keyword in an extended header record (or in a -o
option-argument) overrides or deletes a corresponding field in the
ustar header block, pax shall ignore the contents of that header
block field.
Unlike the ustar header block fields, NULs shall not delimit
<value>s; all characters within the <value> field shall be
considered data for the field. None of the length limitations of
the ustar header block fields in ustar Header Block shall apply to
the extended header records.
pax Extended Header File Times
Time records shall be formatted as a decimal representation of the
time in seconds since the Epoch. If a period ( '.' ) decimal point
character is present, the digits to the right of the point shall
represent the units of a subsecond timing granularity. In read or
copy mode, the pax utility shall truncate the time of a file to
the greatest value that is not greater than the input header
file time. In write or copy mode, the pax utility shall output a
time exactly if it can be represented exactly as a decimal number,
and otherwise shall generate only enough digits so that the same
time shall be recovered if the file is extracted on a system whose
underlying implementation supports the same time granularity.
Example from Linux kernel archive tarball:
00000000 70 61 78 5f 67 6c 6f 62 61 6c 5f 68 65 61 64 65 |pax_global_heade|
00000010 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...............|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 30 30 30 30 36 36 36 00 30 30 30 30 |....0000666.0000|
00000070 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 |000.0000000.0000|
00000080 30 30 30 30 30 36 34 00 30 30 30 30 30 30 30 30 |0000064.00000000|
00000090 30 30 30 00 30 30 31 34 30 35 33 00 67 00 00 00 |000.0014053.g...|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 00 75 73 74 61 72 00 30 30 67 69 74 00 00 00 00 |.ustar.00git....|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 00 00 00 00 00 00 00 00 00 67 69 74 00 00 00 00 |.........git....|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 00 00 00 00 00 00 00 00 00 30 30 30 30 30 30 30 |.........0000000|
00000150 00 30 30 30 30 30 30 30 00 00 00 00 00 00 00 00 |.0000000........|
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000200 35 32 20 63 6f 6d 6d 65 6e 74 3d 62 31 30 35 30 |52 comment=b1050|
00000210 32 62 32 32 61 31 32 30 39 64 36 62 34 37 36 33 |2b22a1209d6b4763|
00000220 39 64 38 38 62 38 31 32 62 32 31 66 62 35 39 34 |9d88b812b21fb594|
00000230 39 65 34 0a 00 00 00 00 00 00 00 00 00 00 00 00 |9e4.............|
00000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
...