Python module for reading passive acoustic monitoring binary data files created by the program PAMGuard (https://www.pamguard.org/).
The DataFile class can be used to read a PAMGuard data binary file (.pgdf)
from PyPAMGuard.datafile import DataFile
df = DataFile("path/to/mydatafile.pgdf")DataFile creates an object that represents the PAMGuard binary data file. To see the list of fields in the object, use the get_fields() method:
df.get_fields()
['data', 'fileHeader', 'module', 'fileFooter']Each field can be accessed directly, e.g. df.fileHeader. DataFile is an object that is derived from the Chunk class which provides a consistent interface. Subfields of the data file themselves are typically Chunk objects, or things that are derived from Chunks (specialized chunks or list of chunks). A Chunk represents a portion of a PAMGuard file and has an interface similar to the DataFile object itself.
Let us examine the fields that are likely to be in a PAMGuard data file:
- fileHeader - Chunk object with information about the format of the data file such as the version of PAMGuard that was used to create it, the file format version number, data start date, when the data were analyzed, etc.
- fileFooter - Chunk object with information about when the data ended, the first and last event identifier (UIDs), etc.
- data - List of Chunk objects with data about the results of the PAMGuard operation. These can be accessed with df.data[index] or abbreviated to df[index]
- module - Contains a module object. This might be a GenericModule or a specialized module object. Module objects behave like Chunk objects but have specialized code for reading PAMGuard processing plugins. They typically have header, footer, and data fields. Unlike the DataFile data field, module data need not be a list. The type of module used can be determined by examining the DataFile's fileHeader.moduleName/fileHeader.fileType or the class of the module instance: module.__class__
Data in a chunk can be accessed by using the attribute name, e.g.
df.fileHeader.file_format
7
df[3].freqLimits # shorthand for df.data[3].freqLimits
[859.375, 960.9375]If you wish to see a summary of a chunk, use print:
print(df.fileHeader)
chunk(length=115, identifier=FILE_HEADER (-1), file_format=7, pamguard='PAMGUARDDATA', version='2.02.03', branch='CORE', dataDate=2009-03-27 17:00:00, analysisDate=2022-03-16 03:05:04.220000, startSample=0, moduleType='RW Edge Detector', moduleName='Right Whale Edge Detector', streamName='Edges', extraInfoLen=0)Stop reading here unless you wish to know low-level details about PyPAMGuard. Extending PyPAMGuard for new modules requires creating new subclasses of the GenericModule class. Before discussing the class, we present a high-level overview of the classes that are likely to be used while developing new module readers.
The class binary_reader.Reader provides an interface for reading from low-level binary files. In most cases, users need not ever call the reader directly, there are convenience methods in the Chunk class that allow the definition of field names and the type of data that is to be read. Reader is covered here to provide an introduction to Type instances.
Reader instances are created using a filename: Reader("path/to/file").
Optional arguments to the constructor let the user control the endianness of the data
(do not change from the default if you are not familiar with
word order, as the default is appropriate for PAMGuard binary files)
and the input/output buffer size. Larger buffer sizes can result in
faster reads.
Users do not typically need to create an instance of the reader class, as a single reader is typically used for reading PAMGuard data files and this is created in the DataFile class.
Reader supports a number of low-level interfaces, but the most critical methods to know are read_type and read_arrays. Both of these read data from a file based on Type instances.
The class binary_reader.field_type.Type defines how many bytes will be read from the file and how they will be interpreted. Type is constructed with the following arguments:
- data_type - What type of field
- str - String specified by length encoded in file
- strN - Fixed length string, length specified in param
- int - Integer, number of bits in param
- float - Floating point, number of bits in param
- fn - Custom function, function handle in param. Function must take a single parameter, the file handle. If additional arguments are needed, they can be wrapped in a closure. If you need to learn about enclosures, read this: https://dev.to/bshadmehr/understanding-closures-in-python-a-comprehensive-tutorial-11ld The section on stateful functions should be helpful, but for the impatient there are examples below.
- param - Value or handle that further specifies the field_type. Generally indicates number of bits associated with a number, number of characters to read, or specialized file to perform reading. See data_type for interpretation.
- post: Handle to a post-processing function that is called on the value after it is read.
Argument param can be omitted for any Type does not require a parameter (e.g., "str"). Argument post may be omitted if the value does not need postprocessing. Examples of post-processing include interpreting a field as a bitmap; see the description of the Bitmap class for an example.
To read a 32 bit integer from instance reader, one could do the following:
# Assumes Type has been imported from binary_reader.field_type
val = reader.read_type(Type("int", 32))A structure can be read using read_arrays where the structure is defined as a list of Type instances:
# Read a 32 byte integer and a 10 character string
# values[0] contains the int, values[1] the str
values = reader.read_arrays([Type("int", 32), Type("strN", 10)])
# Alternatively:
valint, valstr = reader.read_arrays([Type("int", 32), Type("strN", 10)])If multiple instances are to be read, the optional argument n can be specified:
# Read 10 int32 and 10 strings
# values[0] is an list of int, values[1] is a list of str
# If the number of bytes is known instead of the number of structs
# the bytes parameter may be used insteadd of n, but this requires
# fixed-size fields which Type("str") is not.
values = reader.read_arrays([Type("int", 32), Type("strN", 10)], n=10)CAVEAT - read_arrays only functions on fixed-length structures, e.g., Type("str") may not be used.
Post-processing allows for reinterpretation of a value. Examples include treating values as bitmaps, timestamps, or enumerated types. The simplest case is when the constructor just requires the value, such as the case with modules.datatypes.PGChunk where we would specify a Type of:
# PGChunk(value) creates an instance of an enumerated type
# Example: If -1 is in the file, PGChunk(-1) will return
# PGChunk.FILE_HEADER which will be stored in id.
id = reader.reat_type(Type("int", 32, post=PGChunk))In some cases, other arguments are required to the constructor and we can construct an anonymous function that takes a single argument and calls the function/class that we wish to use with some arguments already instantiated (a closure).
Example reading a timestamp:
# Assumes from binary_reader.field_type import PamGuardTime
# Convert a 64 bit value in nanoseconds to a datetime
dt = reader.read_type(Type("int", 64, lambda v: PamGuardTime(v, units="ns")))
# Note: If the units had been ms which is the default for PamGuardTime,
# we could have used: dt = reader.read_type(Type("int", 64, PamGuardTime)Example using a Bitmap. The Bitmap constructor requires a list of bit field names. Usually these carry some type of semantic information, such as bit position 8 (the ninth bit as we start from zero) might indicate the presence of a time delay measurement. In such cases a list of names would be provided. In other cases such as a bit map indicating which channels were used, bit positions work well and we can generate a list of bit positions by wrapping a range statement in a list constructor.
# Read a 32 bit bitmap
# Assumes from bitmap import Bitmap
size = 32
bm = reader.read_type(Type("int", param=size, post=lambda v Bitmap(v, list(range(size)))))Bitmap methods such as is_set(bitfield_name) will work on object bm. See the Bitmap class for more details.
Chunks are defined in modules.chunk.Chunk. Chunks are constructed with a Reader instance argument and have methods for managing fields. General information about Chunk instances was discussed in the Usage section of this document.
The Chunk class provides several methods that can be used to read data and builds on top of the Reader class. Like Reader, the Type constructors are used, but the Type instances are wrapped in instances of binary_reader.field_type.Field. The Field class takes two arguments:
- A field name
- A type
Example: Field("version", Type("str"))
While there are many methods in the Chunk class for reading, the two that are most likely to be needed are:
- read_fields(self, fields: list[Field]) - Read each of the fields in the list and set the chunk attributes.
- read_arrays(self, fields: list[Field], n: int = 1, size_bytes: int = None) - Read one or more fixed length structures. Results are stored as arrays associated with each Field name.
The DataFile class determines the module type while reading the file header. Specifically, in the block where we process chunk identifier PGChunk.FILE_HEADER, there is a dictionary that maps the moduleName in the file header to an instance of a module processor that must be derived from class modules.genericmodule.GenericModule.
Module classes must have a constructor that takes an instance of a DataFile as their sole argument. There are four methods used for reading data in module classes:
-
read_header() - Reads all header data past the initial chunk descriptor. All chunks start with the following fields:
- Field("length", Type("int", 32))
- Field("identifier", Type("int", 32, post=PGChunk))
As these fields will have already been read, to determine that this is a module header chunk, do not attempt to read them a second time. This is the case for the other module read methods as well.
-
read_data(pgchunk) - Given the top chunk for the PAMGuard file, read module specific data.
-
read_background(chunk) - Read background noise data
-
read_footer() - Read footer information past the initial chunk descriptor.
To create a new module:
- Subclass an instance fo GenericModule, see rightwhale.RightWhaleEdgeDetector for an example.
- Add the subclass to the processors dictionary in dataclass.py.
The entry must be of the following format:
processors = {
"Right Whale Edge Detector": RightWhaleEdgeDetector,
"new entry must match fileHeader.moduleName": YourNewModuleProcessor,
}