diff --git a/bagit.xml b/bagit.xml
index 63f8c985..e3e58551 100644
--- a/bagit.xml
+++ b/bagit.xml
@@ -30,7 +30,7 @@
ipr="trust200902">
- The BagIt File Packaging Format (V1.0)
+ The BagIt File Packaging Format (V2.0)
@@ -78,9 +78,12 @@
+
+ Department of Defense
+ john.scancella@gmail.com
-
+
@@ -231,18 +234,17 @@ Ghent University, New York University, and the University of California.
- A bag MUST consist of a base directory containing the following:
+ A bag MUST consist of a .bagit directory containing the following:
a set of required and optional tag files (see );
- a subdirectory named "data", called the payload directory (see
- ); and
+ anda set of optional tag directories.
-The tag files in the base directory consist of one or more files named
+ The tag files in the .bagit directory consist of one or more files named
"manifest-algorithm.txt"
(see Sections and
),
@@ -260,54 +262,59 @@ The base directory can have any name, as illustrated by the figure below.
<base directory>/
|
- +-- bagit.txt
+ +-- [payload files]
|
+ +-- .bagit/
+ |
+ +-- bagit.txt
+ |
+-- manifest-<algorithm>.txt
- |
- +-- [additional tag files]
- |
- +-- data/
- | |
- | +-- [payload files]
- |
- +-- [tag directories]/
|
- +-- [tag files]
+ +-- [additional tag files]
+ |
+ +-- [tag directories]/
+ |
+ +-- [tag files]
- The "bagit.txt" tag file MUST consist of exactly two lines in this order:
+ The "bagit.txt" tag file MUST consist of exactly three lines in this order:
BagIt-Version: M.N
-Tag-File-Character-Encoding: ENCODING
+Tag-File-Character-Encoding: ENCODING
+Payload-Oxum: OCTETSTREAM_SUM
M.N identifies the BagIt major (M) and minor (N) version numbers.
ENCODING identifies the character set encoding used by the remaining tag files.
+ OCTETSTREAM_SUM intended for the purpose of quickly detecting incomplete bags
+ before performing checksum validation. This is strictly an optimization, and implementations MUST perform
+ the standard checksum validation process before proclaiming a bag to be valid.
+ This element MUST be in the form
+ "OctetCount.StreamCount",
+ where OctetCount is the total number of
+ octets (8-bit bytes) across all payload file content and
+ StreamCount is the total number of
+ payload files.
- ENCODING SHOULD
- be UTF-8, but
- for backwards compatibility it MAY be any
- other encoding registered in .
+ ENCODING MUST
+ be UTF-8
The bag declaration itself MUST be encoded in UTF-8 and MUST NOT contain a
Byte Order Mark (BOM) .
- The number for this version of BagIt is "1.0".
+ The number for this version of BagIt is "2.0".
-
-
- The base directory MUST contain a subdirectory named "data".
-
+
The payload directory contains the arbitrary digital content within the bag.
- The files under the payload directory are called payload files, or the payload.
+ The files under the base directory are called payload files, or the payload.
Each payload file is treated as an opaque octet stream when verifying file
correctness.
Payload files MAY be organized in arbitrary subdirectory structures
@@ -443,7 +450,7 @@ As a result, no filepath listed in a tag manifest be
A metadata element MUST consist of a label, a colon ":", a single
linear whitespace character (space or tab), and a value that is
- terminated with an LF, a CR, or a CRLF.
+ terminated with an LF, a CR, or a CRLF.
The label MUST NOT contain a colon (:), LF, or CR.
@@ -500,29 +507,19 @@ As a result, no filepath listed in a tag manifest be
The size or approximate size of the bag being transferred, followed
by an abbreviation such as MB (megabytes), GB (gigabytes), or
- TB (terabytes): for example,
+ TB (terabytes): for example,
42600 MB, 42.6 GB, or .043 TB. Compared to Payload-Oxum (described
next), Bag-Size is intended for human consumption.
This metadata element SHOULD NOT be repeated.
-
- The "octetstream sum" of the payload, which is intended for the
- purpose of quickly detecting incomplete bags before performing checksum
- validation. This is strictly an optimization, and implementations MUST perform
- the standard checksum validation process before proclaiming a bag to be valid.
- This element MUST NOT be present more than once and, if present, MUST
- be in the form "OctetCount.StreamCount",
- where OctetCount is the total number of
- octets (8-bit bytes) across all payload file content and
- StreamCount is the total number of
- payload files.
- This metadata element MUST NOT be repeated.
+
+ Deprecated. This is now contained in the bagit.txt file and its use in the bag-info.txt should be ignored.
A sender-supplied identifier for the set, if any, of bags
to which it logically belongs.
This identifier SHOULD be unique across the sender's content,
- and if it is recognizable as belonging to a globally unique scheme, the receiver
+ and if it is recognizable as belonging to a globally unique scheme, the receiver
SHOULD make an effort to honor the reference to it.
This metadata element SHOULD NOT be repeated.
@@ -560,7 +557,6 @@ External-Description: Uncompressed greyscale TIFF images from the
FOO papers colle...
Bagging-Date: 2008-01-15
External-Identifier: university_foo_001
-Payload-Oxum: 279164409832.1198
Bag-Group-Identifier: university_foo
Bag-Count: 1 of 15
Internal-Sender-Identifier: /storage/images/foo
@@ -570,7 +566,10 @@ Internal-Sender-Description: Uncompressed greyscale TIFFs created
-
+
+
+ This tag file is now deprecated and may be removed in a future version of bagit.
+
For reasons of efficiency, a bag MAY be sent with a list of files to be
fetched and added to the payload before it can meaningfully be checked
@@ -614,7 +613,7 @@ Internal-Sender-Description: Uncompressed greyscale TIFFs created
three values, and any such characters in the url
MUST be percent-encoded .
If filename includes an LF, a CR,
- a CRLF, or a percent sign (%), those characters (and only those) MUST be
+ a CRLF, or a percent sign (%), those characters (and only those) MUST be
percent-encoded as described in .
There is no
limitation on the length of any of the fields in the fetch file.
@@ -746,56 +745,11 @@ A valid bag MUST meet the following requirements:
-myfirstbag/
-|
-| manifest-md5.txt
-| (49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png)
-| (408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt)
-|
-| bagit.txt
-| (BagIt-version: 1.0 )
-| (Tag-File-Character-Encoding: UTF-8 )
-|
-\--- data/
- |
- | 27613-h/images/q172.png
- | (... image bytes ... )
- |
- | 27613-h/images/q172.txt
- | (... OCR text ... )
- ....
+ TODO
+
-
-
- This is the layout of a bag that expects the receiver to download the
- files listed in the payload manifests prior to validation. Lines of
- file content are shown with added parentheses to indicate each
- complete line.
- For brevity, this example uses MD5 rather than the recommended SHA-512.
-
-
-
-
-highsmith-tahoe/
-|
-| manifest-md5.txt
-| (102b0e6effe208ef9b29864946de9e22 data/23364a.tif )
-|
-| fetch.txt
-| (https://cdn.loc.gov/master/pnp/highsm/23300/23364a.tif
-| 216951362 data/23364a.tif )
-|
-| bagit.txt
-| (BagIt-version: 1.0 )
-| (Tag-File-Character-Encoding: UTF-8 )
-|
-| bag-info.txt
-| (Internal-Sender-Description: Download link found at )
-| ( https://www.loc.gov/resource/highsm.23364/ )
-
-
@@ -901,7 +855,7 @@ highsmith-tahoe/
There are three challenges for interoperability related to filename case:
Filesystems such as File Allocation Table (FAT) or Extended File
- Allocation Table (EXFAT) always convert filenames to uppercase:
+ Allocation Table (EXFAT) always convert filenames to uppercase:
"example.txt" will be stored as "EXAMPLE.TXT".
Many Unix filesystems save filenames exactly as provided, which allows
@@ -909,7 +863,7 @@ highsmith-tahoe/
"Example.txt" are separate files.
New Technology File System (NTFS) and Apple's Hierarchical File System
- (HFS) Plus usually preserve case when storing files but are
+ (HFS) Plus usually preserve case when storing files but are
case insensitive when retrieving them. A file saved as "Example.txt"
will be retrieved by that name but will also be retrieved as
"EXAMPLE.TXT", "example.txt", etc.
@@ -1085,7 +1039,9 @@ definitions use the core rules (e.g., DIGIT, HEXDIG, etc) as defined in
@@ -1177,14 +1133,14 @@ This document has no IANA actions.
+ target="https://web.archive.org/web/20060508015635/http://www.iwaw.net/05/papers/iwaw05-tabata.pdf">
A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method
-
-
-
-
+
+
+
+
@@ -1207,7 +1163,7 @@ This document has no IANA actions.
Unicode Consortium
-
+