This project works against the ChatStorage.sqlite database extracted from an iOS backup of WhatsApp. The behaviour described here is derived from the current implementation under Sources/SwiftWABackupAPI and is continuously verified by the local private regression suite maintained alongside the project.
The observations below come from two places:
- Source files under
Sources/SwiftWABackupAPI, particularlySwiftWABackupAPI.swift,Message.swift,MediaItem.swift, and supporting helpers. - A local private fixture database plus accompanying regression tests that exercise the API end-to-end.
When upgrading WhatsApp versions or altering the fixture, re-run the private regression suite and update this document with any schema or mapping changes you observe.
For an audit of which claims in this README are externally corroborated versus fixture-local, see ExternalValidationMatrix.md.
@lid is a WhatsApp identifier form seen in modern multi-device contexts. Public sources consistently describe it as a non-phone identifier, but they do not agree on a single authoritative expansion of the acronym. This project therefore treats LID as an opaque WhatsApp term and treats @lid identities as distinct from ordinary phone-number JIDs. When local client caches such as LID.sqlite are available, the runtime may sometimes resolve a @lid identity back to a phone number.
So, in the API model:
@s.whatsapp.netmeans the participant is already identified by phone-based JID.@lidmeans the participant is identified by a non-phone/private WhatsApp identity that may or may not be resolvable to a phone number from local client data.
The package now exposes two discovery layers:
getBackups()is the legacy compatibility API. It classifies candidates only asvalidBackupsorinvalidBackups.inspectBackups()is the diagnostic API. It inspectsStatus.plist,Manifest.db, and, when available,Manifest.plistto report a structuredBackupDiscoveryInfowith astatus, optionalisEncrypted, andisReady.
Current encryption detection is based on Manifest.plist["IsEncrypted"]:
status = readymeans WhatsApp data was found andIsEncrypted == falsestatus = encryptedmeans WhatsApp data was found andIsEncrypted == truestatus = encryptionStatusUnavailablemeans WhatsApp data was found butManifest.plistwas missing, malformed, unreadable, or lackedIsEncrypted
The chat and export APIs still assume the caller passes a backup that is ready
to use, typically by checking inspectBackups() first.
| Table | Purpose | Key Columns Used |
|---|---|---|
ZWAMESSAGE |
Stores WhatsApp message rows. | Z_PK, ZCHATSESSION, ZMESSAGETYPE, ZTEXT, ZMEDIAITEM, ZPARENTMESSAGE, ZISFROMME, ZGROUPMEMBER, ZMESSAGEDATE, ZFROMJID, ZTOJID |
ZWACHATSESSION |
Metadata for each chat thread. | Z_PK, ZCONTACTJID, ZPARTNERNAME, ZLASTMESSAGEDATE, ZMESSAGECOUNTER, ZSESSIONTYPE, ZARCHIVED |
ZWAMEDIAITEM |
Metadata for media attached to messages. | Z_PK, ZMEDIALOCALPATH, ZTITLE, ZMOVIEDURATION, ZLATITUDE, ZLONGITUDE, ZMETADATA |
ZWAGROUPMEMBER |
Group participant roster used to resolve sender info. | Z_PK, ZMEMBERJID, ZCONTACTNAME |
ZWAMESSAGEINFO |
Reaction payloads (ZRECEIPTINFO). |
ZMESSAGE, ZRECEIPTINFO |
All schema checks live in DatabaseHelpers.swift and DatabaseProtocols.swift; each model declares the minimal column set that the package expects to find.
The public API maps ZMESSAGETYPE into the following supported message families:
| Code | Description | Notes |
|---|---|---|
| 0 | Text | Plain messages; text preserved in ZTEXT. |
| 1 | Image | Copies media to disk when requested and exposes filename. |
| 2 | Video | Adds filename and duration (ZMOVIEDURATION). |
| 3 | Audio | Adds filename and duration. |
| 4 | Contact | Classified as Contact, but the current runtime does not expose a validated structured vCard payload. |
| 5 | Location | Emits latitude/longitude (ZLATITUDE, ZLONGITUDE). |
| 7 | Link | Keeps URL text and optional caption. |
| 8 | Document | Exposes original file name and caption. |
| 11 | GIF | Validated against WhatsApp Web examples as GIF-style media; stored as MP4 in the backup. |
| 15 | Sticker | Validated against WhatsApp Web examples as sticker-style media; typically returns a .webp filename. |
This table focuses on what the current implementation actually validates and exports, not on hypotheses from older reverse-engineering notes.
| Type | Primary discriminator | Extra fields consulted | Current API output | Notes / open questions |
|---|---|---|---|---|
Text |
ZMESSAGETYPE = 0 |
ZTEXT |
message text plus cross-cutting fields such as author, replyTo, and reactions |
Straightforward case. |
Image |
ZMESSAGETYPE = 1 |
ZMEDIAITEM, ZWAMEDIAITEM.ZMEDIALOCALPATH, ZWAMEDIAITEM.ZTITLE |
mediaFilename, optional caption |
Media copy depends on the backup manifest lookup succeeding. |
Video |
ZMESSAGETYPE = 2 |
Image fields plus ZWAMEDIAITEM.ZMOVIEDURATION |
mediaFilename, optional caption, optional seconds |
Duration is only surfaced for Video and Audio. |
Audio |
ZMESSAGETYPE = 3 |
ZMEDIAITEM, ZWAMEDIAITEM.ZMEDIALOCALPATH, ZWAMEDIAITEM.ZMOVIEDURATION |
mediaFilename, optional seconds |
Audio captions are rare, but caption may still be populated from ZTITLE if present. |
Contact |
ZMESSAGETYPE = 4 |
ZTEXT, optional ZMEDIAITEM if present |
Generic MessageInfo with messageType = "Contact" |
No validated structured contact payload is currently exposed. |
Location |
ZMESSAGETYPE = 5 |
ZMEDIAITEM, ZWAMEDIAITEM.ZLATITUDE, ZWAMEDIAITEM.ZLONGITUDE |
latitude, longitude, optional media/caption fields |
Missing coordinates remain nil, so the API does not silently turn absent data into 0.0, 0.0. |
Link |
ZMESSAGETYPE = 7 |
Primarily ZTEXT; optional ZMEDIAITEM / ZTITLE |
Link text in message, optional caption |
URL, preview metadata, and preview image are not modeled separately. |
Document |
ZMESSAGETYPE = 8 |
ZMEDIAITEM, ZWAMEDIAITEM.ZMEDIALOCALPATH, ZWAMEDIAITEM.ZTITLE |
mediaFilename, optional caption |
MIME type and document metadata are not currently surfaced. |
GIF |
ZMESSAGETYPE = 11 |
Same media fields as Video |
mediaFilename, optional caption |
Validated against WhatsApp Web examples as GIF-style media. Stored like media, but no duration is currently exposed for GIFs. |
Sticker |
ZMESSAGETYPE = 15 |
ZMEDIAITEM, ZWAMEDIAITEM.ZMEDIALOCALPATH |
mediaFilename |
Validated against WhatsApp Web examples as sticker-style media. Sticker-specific metadata is not modeled; output is essentially filename + common fields. |
Cross-cutting enrichments that may apply to many rows regardless of their type:
authorcombinesZISFROMME,ZGROUPMEMBER,ZFROMJID,ZWAGROUPMEMBER,ZWACHATSESSION,ZWAPROFILEPUSHNAME, and, when available, the WhatsAppLID.sqliteaccount cache.replyTois resolved fromZWAMESSAGE.ZPARENTMESSAGEwhen present, otherwise fromZMEDIAITEMplus the binary blob stored inZWAMEDIAITEM.ZMETADATA.reactionscome fromZWAMESSAGEINFO.ZRECEIPTINFO.mediaFilenamealways requires a second lookup into the iTunes backup manifest to resolve the hashed file path.
When building MessageInfo, the API resolves a single structured participant identity into author.
author is used only for rows treated as real user-authored messages:
- Outgoing messages (
ZISFROMME = 1) – Exposed asMessageAuthor(kind: .me, displayName: "Me", source: .owner). - Group chats –
ZWAMESSAGE.ZGROUPMEMBERis used first:ZWAGROUPMEMBER.ZMEMBERJIDprovides the strongest participant identifier.- The current runtime resolves the participant label using this exact order:
- a non-phone-like direct-chat/session label from
ZWACHATSESSION.ZPARTNERNAME - an address-book contact from
ContactsV2.sqlite - a
LID.sqliteaccount match - a linked phone JID plus WhatsApp push-name label
- a WhatsApp push name from
ZWAPROFILEPUSHNAME - a phone-like
ZWACHATSESSION.ZPARTNERNAME ZWAGROUPMEMBER.ZCONTACTNAMEas the last fallback
- a non-phone-like direct-chat/session label from
- This is intentionally quality-aware rather than a blind table order:
- a human-friendly saved/direct-chat label is preferred when it is a real name
- a WhatsApp-only push name is preferred over phone-only fallback labels and is surfaced with the familiar
~prefix - phone-only labels from
ZWACHATSESSIONorZWAGROUPMEMBERare treated as fallback, not as better labels than a human-readable push name phoneis exposed when the runtime can resolve a real phone confidently from the address book, a linked phone JID, or WhatsApp'sLID.sqlite; ambiguous@lididentities still keep the visible name but leavephoneunset
- The runtime strips bidi control characters from display labels so values such as
Túare exposed cleanly. - If
ZGROUPMEMBERis missing, the runtime falls back toZWAMESSAGE.ZFROMJID.
- Individual chats –
ZCHATSESSIONpoints toZWACHATSESSION, which supplies the participant JID and display name for incoming messages.
This behaviour lives in WABackup+Messages.swift (resolveParticipantIdentity, makeParticipantAuthor) and is covered by the invariant and regression suites.
The current display-name strategy has been validated with WhatsApp Web:
- Human-friendly saved/direct-chat labels are preferred over weaker alternatives when they exist.
- Human-readable push names can outrank phone-only fallback labels for group-message authors.
- Unsaved group participants can appear as
~Namewith secondary phone text, which matches the currentpushName,pushNamePhoneJid, andlidAccountstrategy. - Saved-contact cases can appear as bare human names with no visible phone on the label, which matches the current
addressBookand human-friendlychatSessionbranches. - Direct/self-chat UI values such as
ZWACHATSESSION.ZPARTNERNAME = '\u200eTú'are rendered without exposing the bidi control character. - Some later internal branches remain UI-indistinguishable in practice, so WhatsApp Web validates the visible precedence decisions above without proving every internal branch of the total runtime order.
Replies are encoded through media metadata rather than a direct foreign key:
- If
ZWAMESSAGE.ZPARENTMESSAGEis populated, the runtime uses it directly as the replied-to message ID. - Otherwise,
ZWAMESSAGE.ZMEDIAITEMmay reference aZWAMEDIAITEMrow whoseZMETADATAholds a protobuf-style blob. Messages without either source keepreplyTo = nil. MediaItem.extractReplyStanzaId()parses the modern top-level protobuf field that carries the quoted message stanza ID.WABackup.fetchReplyMessageIdusesMessage.fetchMessageId(byStanzaId:)to locate the originalZWAMESSAGE.Z_PK.- If found,
MessageInfo.replyTocontains the target message ID; otherwise it remainsnil.
SwiftWABackupAPITests.testMessageContentExtraction exercises this behaviour, and the current implementation has also been checked against WhatsApp Web. It resolves modern quoted replies that are visibly rendered there through ZPARENTMESSAGE and modern protobuf-style metadata.
- Reactions live in
ZWAMESSAGEINFO.ZRECEIPTINFOas binary blobs. Entries only exist for messages that received reactions. ReactionParsernow walks the nested protobuf-style receipt entries insideZRECEIPTINFO, extracting the reacting JID and emoji from structured length-delimited fields instead of scanning the blob byte-by-byte.- The runtime now emits reactions only when that structured metadata identifies both an emoji and a reacting participant JID. Ambiguous legacy blobs without a resolvable participant are ignored rather than guessed.
WABackup.fetchReactionsresolves the reacting participant using the same identity sources already used elsewhere in the API, including direct-chat data, address-book data, WhatsApp push names, andLID.sqlitefor@lididentities.- Parsed reactions become
[Reaction]values with anemojiand a structuredauthor, attached toMessageInfo.reactions. - The validated WhatsApp Web examples now line up with the current visible reaction behavior on the checked messages, including emoji plus the reacting participant's label and phone when the web shows one.
SwiftWABackupAPITests.testMessageContentExtractionexercises messages with and without reactions to confirm the parser output stays stable.
- Media files referenced in
ZWAMESSAGEare stored in the iTunes backup under hashed paths.IPhoneBackup.fetchWAFileHashqueriesManifest.db(domain = 'AppDomainGroup-group.net.whatsapp.WhatsApp.shared') to translate a relative path such asMedia/345.../file.jpginto the hash used on disk. MediaCopierthen copies the hashed file from<backup>/<hash-prefix>/<hash>to a caller-specified directory, renaming it to the original filename. Missing files raiseBackupError.fileCopyso callers can handle partial exports gracefully.- Location messages reuse the same mechanism while also surfacing
ZLATITUDE/ZLONGITUDE; video/audio messages addZMOVIEDURATIONasseconds.
- Chat/contact avatars live in
Media/Profile/<identifier>-<timestamp>.{jpg,thumb}.fetchChatPhotoFilenamelooks up the newest file viaFileUtils.latestFileand copies it to the destination directory aschat_<chatId>.ext. - Contact exports (
copyContactMedia) follow the same pattern, naming files after the contact phone number. If no entry is found, the photo filename remainsniland the API logs a debug message.
The library surfaces granular error enums so consumers can react appropriately:
BackupError– issues while scanning or copying from the iTunes backup (e.g. missing Manifest.db, copy failure).DatabaseErrorWA– database connection problems, unexpected schemas, or missing rows.DomainError– higher-level logic errors (media not found, unsupported message types).
These errors are thrown from API entry points (getBackups, inspectBackups,
connectChatStorageDb, getChat, etc.) and are covered by the happy-path
tests; you can trigger them manually by corrupting the fixture or requesting
unsupported resources.
Key tests that exercise the database assumptions:
testGetChats– Validates counts of active/archived sessions read fromZWACHATSESSION.testChatMessages– Iterates every chat, asserting message totals per type and confirming thatMessageInfomirrorsZWAMESSAGEcounters.testMessageContentExtraction– Spot-checks individual messages (text, link, document, and replies/reactions) to confirm sender resolution, reply chains, filenames, and reaction handling.testChatContacts– Validates aggregate contact counts and profile media lookups against the fixture.