Skip to content
This repository was archived by the owner on Dec 2, 2025. It is now read-only.

Storage file formats

Ryan Barrett edited this page Sep 13, 2013 · 7 revisions

When we sync data from source social networks into user storage (e.g. Dropbox), we need to store it so that it's usable by both computers and humans for the use cases we care about. Here are some proposals for those format(s), organized by end user use case.

Sync (aka online backup)

Human-readable, nicely formatted. Interlinked and easy to navigate between related items. Ideally also render media attachments (images, video).

Proposal: HTML with source-specific styling, ie render Facebook posts like Facebook does, tweets like Twitter does, etc.

Alternatives:

  • HTML with standardized, source-independent styling
  • Plain text
  • oEmbed or something similar? Didn't find much.
  • Spreadsheet, e.g. Zoho, Excel, Google Docs
  • Styled spreadsheet, e.g. Ning, Salesforce
  • Screenshot images
  • Other proprietary

HTML is by far the best bang for our buck. Source-specific formatting is familiar, builds trust since it looks like we "got it right", and might be easier to do. Generic formatting is consistent across source types, and isn't as susceptible to skew over time. We should see if SocialSafe, ThinkUp, etc have existing generic templates.

Source-specific styles can be full desktop/web experience, embedded, or archive. See examples at the bottom of this page.

Implementation detail: can't store and serve multi-file site (HTML, CSS, images, etc) on Dropbox since it doesn't support long-lived URLs to raw files. Maybe other cloud storage providers do? One workaround is embedded images (e.g. w/CDATA). Definitely not ideal. Obviously not a problem on local HDD though. Maybe just require Dropbox client to d/l to hard drive for full experience?

Republish elsewhere

Machine-readable full data. Needs conversion but no data loss.

Proposal: ActivityStreams.

Alternatives:

UI: Need sync status dashboard similar to the one for synching from sources to storage. Otherwise rendering is live in destination, so no need for our own format.

Personal analytics

Source data is same machine-readable format for republishing elsewhere. Final destination is database and/or data warehouse. Trivially shardable by user, no aggregation or joins across users, so simple (eventually sharded) Postgres or MySQL is probably fine.

UI: lots of personal analytics examples. Investigate what SocialSafe, ThinkUp, Klout (ugh), other social dashboard products do.

Appendix: current source styles

Facebook web:

facebook_web

Facebook archive:

facebook_archive

Twitter archive:

twitter_archive

Twitter web:

twitter_web

Google archive:

google _archive

Google web:

google _web

Clone this wiki locally