-
Notifications
You must be signed in to change notification settings - Fork 2
Storage file formats
When we sync data from source social networks into user storage (e.g. Dropbox), we need to store it so that it's usable by both computers and humans for the use cases we care about. Here are some proposals for those format(s), organized by end user use case.
Human-readable, nicely formatted. Interlinked and easy to navigate between related items. Ideally also render media attachments (images, video).
Proposal: HTML with source-specific styling, ie render Facebook posts like Facebook does, tweets like Twitter does, etc.
Alternatives:
- HTML with standardized, source-independent styling
- Plain text
- oEmbed or something similar? Didn't find much.
- Spreadsheet, e.g. Zoho, Excel, Google Docs
- Styled spreadsheet, e.g. Ning, Salesforce
- Screenshot images
- Other proprietary
HTML is by far the best bang for our buck. Source-specific formatting is familiar, builds trust since it looks like we "got it right", and might be easier to do. Generic formatting is consistent across source types, and isn't as susceptible to skew over time. We should see if SocialSafe, ThinkUp, etc have existing generic templates.
Source-specific styles can be full desktop/web experience, embedded, or archive. See examples at the bottom of this page.
Implementation detail: can't store and serve multi-file site (HTML, CSS, images, etc) on Dropbox since it doesn't support long-lived URLs to raw files. Maybe other cloud storage providers do? One workaround is embedded images (e.g. w/CDATA). Definitely not ideal. Obviously not a problem on local HDD though. Maybe just require Dropbox client to d/l to hard drive for full experience?
Machine-readable full data. Needs conversion but no data loss.
Proposal: ActivityStreams.
Alternatives:
- Atom
- Source-specific API data, all JSON. Facebook is Graph API, Twitter is JSON tweet objects and entities. G+ is extended ActivityStreams.
- Our own proprietary structured format, maybe in our own database schema.
- ...?
UI: Need sync status dashboard similar to the one for synching from sources to storage. Otherwise rendering is live in destination, so no need for our own format.
Source data is same machine-readable format for republishing elsewhere. Final destination is database and/or data warehouse. Trivially shardable by user, no aggregation or joins across users, so simple (eventually sharded) Postgres or MySQL is probably fine.
UI: lots of personal analytics examples. Investigate what SocialSafe, ThinkUp, Klout (ugh), other social dashboard products do.
Facebook web:
Facebook archive:
Twitter archive:
Twitter web:
Google archive:
Google web:





