Amazon Cloud Drive (ACD) is a personal network drive service from Amazon. The scripts in this project wrap around some softwares under shell for easy big folder uploading.
- Do volume compression, encryption, testing, adding recovery files and uploading automatically to a given folder, and save the directory structure, password and other info to PostgreSQL database.
- Send email notifications for errors and completion.
- Optional auto or manual step selections for interruption recovery.
Name | Language | Functions
----------- | ---------- | ------------ packer.sh | GNU Bash | Main program savetree.py | Python 3 | Convert xml directory tree to adjacency list db-init.sql | PostgreSQL | Create database
Apart from GNU Core Utils, we need:
| Softwares | Purposes |
|---|---|
| tree | Extract directory tree |
| p7zip | Compression and encryption |
| par2 | Generate recovery files |
| heirloom-mailx | Send emails |
| xmlstarlet | Parse xml files |
| tidy | Check and correct invalid xml files |
| psql | Access database |
| acd_cli | Uplaod to ACD |
| acd_cli is another Github project, others are free softwares。 |
I'm on Debian Jessie. Use your distribution's package manager where appropriate. Configuration files may reside elsewhere.
To replace database password authentication.
openssl genrsa -out rootCA.key 2048
openssl req -x509 -new -nodes -key rootCA.key -days 1024 -out rootCA.pem
Change "device" to corresponding names such as "client" and "server". The "Common Name(FQDN)" field needs to be filled with username for client, domain name or IP address for server.
openssl genrsa -out device.key 2048
openssl req -new -key device.key -out device.csr
openssl x509 -req -in device.csr -CA root.pem -CAkey rootCA.key -CAcreateserial -out device.crt -days 500
sudo aptitude install postgresql
sudo su - postgres
psql -f db-init.sql
Specify server cert and root cert
/etc/postgresql/9.4/main/postgresql.conf
ssl_cert_file = '/etc/postgresql-common/server.crt'
ssl_key_file = '/etc/postgresql-common/server.key'
ssl_ca_file = '/etc/postgresql-common/root.pem'
Force SSL on TCP connections
/etc/postgresql/9.4/main/pg_hba.conf
# IPv4 local connections:
hostssl all all 127.0.0.1/32 cert clientcert=1
# IPv6 local connections:
hostssl all all ::1/128 cert clientcert=1
This feature keeps a log of certain SQL transactions. See here for details. To use it:
-
Download audit.sql.
-
This feature requires hstore data type. Install additional PostgreSQL modules:
sudo aptitude install postgresql-contrib -
Execute the script as user postgres on the target database acd:
psql -f audit.sql acd -
Enable auditing for a table:
SELECT audit.audit_table('table name'); -
Read the logs:
SELECT * FROM audit.logged_actions;
sudo aptitude install tree p7zip-full par2 mailx xmlstarlet tidy postgresql-client python3 python3-lxml python3-pip
sudo pip3 install --upgrade git+https://github.com/yadayada/acd_cli.git
/etc/postgresql-common/postgresql.crt
/etc/postgresql-common/postgresql.key
/etc/postgresql-common/root.pem
So we can simply pass a "name" to psql.
/etc/postgresql-common/pg_service.conf
[dsn1]
dbname=acd
user=packer
host=127.0.0.1
port=5432
connect_timeout=10
client_encoding=utf8
sslmode=verify-full
sslcert=/etc/postgresql-common/postgresql.crt
sslkey=/etc/postgresql-common/postgresql.key
sslrootcert=/etc/postgresql-common/root.pem
If the scripts run on a different machine(client) from the database server(server), we have the options of direct connection, VPN, SSH tunnels and more. Here's how to set up SSH tunnels.
From client's end
ssh -N -L 5432:127.0.0.1:5432 user@server
From server's end
ssh -N -R 5432:127.0.0.1:5432 user@client
From client's end
ssh -N -o "ProxyCommand ssh -W %h:%p user@thirdhost" -L 5432:127.0.0.1:5432 user@server
From server's end
ssh -N -o "ProxyCommand ssh -W %h:%p user@thirdhost" -R 5432:127.0.0.1:5432 user@client
-
Use autossh: -f fall into background; -M autossh monitor port; -N do not execute remote commands, tunnel only; environment variable AUTOSSH_POLL, monitor packet sending interval in seconds.
AUTOSSH_POLL=30 autossh -M 12340 -f -N -o "ProxyCommand ssh -W %h:%p user@thirdhost" -L 5432:127.0.0.1:5432 user@server -
Set ssh options ClientAliveInterval and ClientAliveCountMax on the server.
-
Set ssh options ServerAliveInterval and ServerAliveCountMax on the client.
Modify send_email.sh to set subject, SMTP server and recepient.
Default setting is hours=8 ie. UTC+8. Change it accordingly.
Change work_path="$HOME" at line 103 of packer.sh. No trailing slash needed.
Save all sh files and savetree.py in a convenient place outside the target directory, such as home directory. Make sure the working directory's partition has enough free space for the archive and recovery files.
packer.sh directory_name
There are 4 steps: pack, test, par2 and upload.
packer.sh directory_name continue-from-test
This function reads the "status" field from the database, so it's rather weak. See known issues.
packer.sh directory_name auto-recover
This option is aimed at batch processing(by invoking this script), so temporary network issues don't cause unnecessary interruptions.
packer.sh directory_name skip-db-check
packer.sh directory_name auto-recover skip-db-check
batch.sh packs all immediate subdirectories in the current directory individually.
This should be a separate feature, but we'll use psql for now.
psql acd
SELECT * FROM summary;
SELECT descr, status FROM summary;
Using Common Table Expressions(CTE)
WITH RECURSIVE tree AS (
SELECT node_id, name, parent_id, ARRAY[name] AS item_array
FROM dir_tree WHERE name = 'Archive Number'
UNION ALL
SELECT dir_tree.node_id, dir_tree.name, dir_tree.parent_id, tree.item_array || dir_tree.name AS item_array
FROM dir_tree JOIN tree ON (dir_tree.parent_id = tree.node_id)
)
SELECT node_id, array_to_string(item_array, '->') FROM tree ORDER BY item_array;
Or we can use this:
WITH RECURSIVE tree AS (
SELECT node_id, ARRAY[]::BIGINT[] AS ancestors FROM dir_tree WHERE name = 'Archive Number'
UNION ALL
SELECT dir_tree.node_id, tree.ancestors || dir_tree.parent_id
FROM dir_tree, tree
WHERE dir_tree.parent_id = tree.node_id
)
SELECT * FROM tree INNER JOIN dir_tree USING (node_id) ORDER BY node_id ASC;
Reading only the "status" field from database makes it impossible to check for database access failures, user interruptions, programming errors, insufficient disk space, hardware failures and so on, so don't rely on it.
It's only endless retries now. Should account for errors other than network.
- Separate configurations from the code.
- Implement a user interface for database look ups.