-
Notifications
You must be signed in to change notification settings - Fork 89
Modules
Plowshare is designed with modularity in mind, so it should be easy for other programmers to add new modules. Study the code of any of the existing modules (i.e. 2shared) and create your own.
Some hosters are exporting a public API (formalized way for downloading or uploading), if it is available, it can save you lots of time calling this API, instead of simulating a web browser. For example: HotFile.
Table of content:
- Script template
- Downloading function
- Uploading function
- Deleting function
- Listing function
- Probing function
- Output debug messages (stderr)
- curl API
- Auxiliar APIs
- Module command-line switches
- Coding rules
- Coding style
- Testing
- External documentation
Each module implements services for one sharing site:
- anonymous download
- free/premium account download
- anonymous upload (if allowed from host)
- free/premium account upload
- free/premium account remote upload (if available from host)
- delete or kill url (anonymous or not)
- shared folder (and sub-folders) list (if available from host)
The module must declare the following global variables:
MODULE_XXX_REGEXP_URL
Depending module features, some additional variables should also be declared:
MODULE_XXX_DOWNLOAD_OPTIONS
MODULE_XXX_DOWNLOAD_RESUME
MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_COOKIE
MODULE_XXX_DOWNLOAD_SUCCESSIVE_INTERVAL
# Rare use, give additional curl options
MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_EXTRA=()
MODULE_XXX_UPLOAD_OPTIONS
MODULE_XXX_UPLOAD_REMOTE_SUPPORT
MODULE_XXX_DELETE_OPTIONS
MODULE_XXX_LIST_OPTIONS
MODULE_XXX_LIST_HAS_SUBFOLDERS
MODULE_XXX_PROBE_OPTIONS
Where XXX is the name of module (uppercase). No other global variable declaration is allowed.
Module must export one to five entries point:
xxx_download()xxx_upload()xxx_delete()xxx_list()xxx_probe()
Prototype is:
xxx_download() {
local -r COOKIE_FILE=$1
local -r URL=$2
...
}Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh. - xxx must not contain points, use underscores instead.
- Never call
curl_with_logfunction here, usecurl.
Arguments:
-
$1: cookie file (empty content at start, use it with curl) -
$2: URL string (for examplehttp://x7.to/fwupja)
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowdown will take care of this.
When a link is correct, function should return 0 and echo one or two arguments, corresponding to file URL and filename:
echo "$FILE_URL"
echo "$FILENAME"$FILENAME can be empty, or even not echoed at all. If so, plowdown will guess filename from provided $FILE_URL.
If cookie file is required for final download MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_COOKIE must be set to yes.
File URL must return the final link (that's it, a link that return a 200 HTTP code, without redirection). Use curl -I and grep_http_header_location when necessary.
Note: $FILE_URL will be encoded right after. So don't bother about weird characters. For example: spaces chars will be translated to %20 for you.
Module can return the following codes:
-
0: Everything is ok (arguments have to be echoed, see below). -
$ERR_FATAL: Unexpected result (upstream site updated, etc). -
$ERR_LOGIN_FAILED: Correct login/password argument is required. -
$ERR_LINK_TEMP_UNAVAILABLE: Link alive but temporarily unavailable. -
$ERR_LINK_PASSWORD_REQUIRED: Link alive but requires a password (password protected link). -
$ERR_LINK_NEED_PERMISSIONS: Link alive but requires some authentication (private or premium link). -
$ERR_LINK_DEAD: Link is dead (we must be sure of that). Each download function should return this value at least one time. -
$ERR_SIZE_LIMIT_EXCEEDED: Can't download link because file is too big (need permissions, probably need to be premium). -
$ERR_EXPIRED_SESSION: When cache is used. Seestorage_get,storage_setandstorage_reset.
Additional error codes (returned by plowdown only, module download function should not return these):
-
$ERR_NOMODULE: No module available for provided link. Hoster is not supported yet! -
$ERR_NETWORK: Specific network error (socket reset, curl, etc). -
$ERR_SYSTEM: System failure (missing executable, local filesystem, wrong behavior, etc). -
$ERR_CAPTCHA: Captcha solving failure. -
$ERR_MAX_WAIT_REACHED: Countdown timeout (see-t/--timeoutcommand line option). -
$ERR_MAX_TRIES_REACHED: Max tries reached (see-r/--max-retriescommand line option). -
$ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.
- If hoster asks to try again later (and you don't know how much time to wait): download function must return
$ERR_LINK_TEMP_UNAVAILABLE. - If hoster asks to try again later (and you do know how much time to wait): download function must echo wait time (in seconds) and return
$ERR_LINK_TEMP_UNAVAILABLE. - Respect time waits even if the download seems to work without them. Don't hammer website!
- Try to force english language in the website (usually using a cookie), if your are going to parse human messages (it's better to parse HTML nodes, though).
- If you provide premium download, bad login must lead to an error (
$ERR_LOGIN_FAILED). No fallout to anonymous download must be made (even if remote web site accepts it). -
MODULE_XXX_DOWNLOAD_SUCCESSIVE_INTERVALglobal variable contain delay value (in seconds) used when two successive downloads (links of the same hoster) are performed. Some hosters may behave nasty (force user to wait, declare link as dead, or sometimes worst) when successively downloading a bunch of links.
Prototype is:
xxx_upload() {
local -r COOKIE_FILE=$1
local -r FILE=$2
local -r DESTFILE=$3
...
PAGE=$(curl_with_log ...) || return
...
}Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh. - xxx must not contain points, use underscores instead.
- Use
curl_with_logfunction only one time for the file upload (it's quite conveniant to see progress), otherwise use simplycurl.
Arguments:
-
$1: cookie file (empty content at start, use it with curl) -
$2: local filename (with full path) to upload or (remote) URL -
$3: remote filename (no path)
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowup will take care of this.
When requested file has been successfully uploaded, function should return 0 and echo one or three lines.
echo "$DL_URL"
echo "$DEL_URL"
echo "$ADMIN_URL_OR_CODE"$DEL_URL and $ADMIN_URL_OR_CODE are optional (can be empty or not echoed at all).
Example1 (seen in depositfiles module):
echo "$DL_LINK"
echo "$DEL_LINK"Example2 (seen in 2shared module):
echo "$FILE_URL"
echo
echo "$FILE_ADMIN"Module can return the following codes:
-
0: Success. File successfully uploaded. -
$ERR_FATAL: Unexpected result (upstream site updated, etc). -
$ERR_LINK_NEED_PERMISSIONS: Authentication required (for example: anonymous users can't do remote upload). -
$ERR_LINK_TEMP_UNAVAILABLE: Upload service seems temporarily unavailable from upstream. Note: This status does not affect retry number (see-r/--max-retriescommand line option) but timeout if specified (see-t/--timeoutcommand line option). -
$ERR_SIZE_LIMIT_EXCEEDED: Can't upload too big file (need permissions, probably need to be premium). -
$ERR_LOGIN_FAILED: Correct login/password argument is required. -
$ERR_ASYNC_REQUEST: Asynchronous remote upload started. -
$ERR_EXPIRED_SESSION: When cache is used. Seestorage_get,storage_setandstorage_reset.
Additional error codes (returned by plowup only, module upload function should not return these):
-
$ERR_NOMODULE: Specified module does not exist or is not supported. -
$ERR_NETWORK: Specific network error (socket reset, curl, etc). -
$ERR_SYSTEM: System failure (missing executable, local filesystem, wrong behavior, etc). -
$ERR_MAX_WAIT_REACHED: Countdown timeout (see-t/--timeoutcommand line option). -
$ERR_MAX_TRIES_REACHED: Max tries reached (see-r/--max-retriescommand line option). -
$ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.
- Remember that
$2can also be a remote file. It should be checked withmatch_remote_url. Most of the time, remote upload feature is only available for premium users. If module do not support this put on top of file:MODULE_xxx_UPLOAD_REMOTE_SUPPORT=no. - Upload file size if usually limited (can be quite low for anonymous upload). Dealing with it could be nice for user! For example:
MAX_SIZE=... # hardcoded value or parse it in html page (if possible)
SIZE=$(get_filesize "$FILE")
if [ $SIZE -gt $MAX_SIZE ]; then
log_debug "file is bigger than $MAX_SIZE"
return $ERR_SIZE_LIMIT_EXCEEDED
fiPrototype is:
xxx_delete() {
local -r COOKIE_FILE=$1
local -r URL=$2
...
}Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh - xxx must not contain points, use underscores instead
- Never call
curl_with_logfunction here, usecurl.
Argument:
-
$1: cookie file (empty content at start, use it with curl) -
$2: kill/admin URL string
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowdel will take care of this.
There is not output for this function. When file has been successfully deleted, function should return 0.
Module can return the following codes:
-
0: Success. File successfully deleted. -
$ERR_FATAL: Unexpected result (upstream site updated, etc). -
$ERR_LOGIN_FAILED: Authentication failed (bad login/password). -
$ERR_LINK_NEED_PERMISSIONS: Authentication required (anonymous users can't delete files). -
$ERR_LINK_PASSWORD_REQUIRED: Link requires an admin or removal code. -
$ERR_LINK_DEAD: Link is dead. File has been previously deleted.
Additional error codes (returned by plowdel only, module delete function should not return these):
-
$ERR_NOMODULE: No module available for provided link. -
$ERR_NETWORK: Specific network error (socket reset, curl, etc). -
$ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.
- On success operation (
return 0), don't print a message;plowdelwilllog_noticefor you.
Prototype is:
xxx_list() {
local -r URL=$1
local -r RECURSE=${2:-0}
...
}Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh - xxx must not contain points, use underscores instead
- Never call
curl_with_logfunction here, usecurl.
Arguments:
-
$1: list URL (aka root folder URL) -
$2: list link and recurse subfolders (if any). If $2 is empty string, the option has not been selected.
As result, function should return 0 and echo a list of two lines.
echo "$FILE_URL"
echo "$FILENAME"$FILENAME can be empty, but echo must be done. But you usually have more that one link in the folder, so it can be complex to echo pair of line in a while loop. To simplify process, you should use list_submit() API.
Example (seen in depositfiles module):
PAGE=$(curl "$URL") || return
LINKS=$(echo "$PAGE" | parse_all_attr_quiet 'class="dl" align="center' href)
NAMES=$(echo "$PAGE" | parse_all_attr_quiet 'class="dl" align="center' title)
list_submit "$LINKS" "$NAMES" || returnlist_submit() can also accept an optional third argument: a link prefix (string) to prepend to file link. This is useful when the parsed links are relative.
Example (seen in mediafire module):
...
NAMES=$(echo "$DATA" | parse_all_tag filename)
LINKS=$(echo "$DATA" | parse_all_tag quickkey)
list_submit "$LINKS" "$NAMES" 'http://www.mediafire.com/?' || returnlist_submit() can even accept an optional fourth argument: a link suffix (string) to append to file link. This is useful when the parsed links are relative.
Example (seen in turbobit module):
...
NAMES=$(parse_all ...
LINKS=$(parse_json 'id' 'split' <<< "$JSON")
list_submit "$LINKS" "$NAMES" 'http://turbobit.net/' '.html' || returnModule can return the following codes:
-
0: Success. Folder contain one or several files. -
$ERR_FATAL: Unexpected content (not a folder, parsing error, etc). -
$ERR_LINK_TEMP_UNAVAILABLE: Links are temporarily unavailable (can't be listed actually). This is used by mirroring/multi-upload services (uploads are still beeing processed). -
$ERR_LINK_PASSWORD_REQUIRED: Folder is password protected. -
$ERR_LINK_DEAD: Folder has been deleted or does not exist or is empty.
Additional error codes (returned by plowlist only, module list function should not return these):
-
$ERR_NOMODULE: No module available for provided link. -
$ERR_NETWORK: Specific network error (socket reset, curl, etc). -
$ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.
- If hoster support subfolders: declare on top on module source:
MODULE_xxx_LIST_HAS_SUBFOLDERS=yes. - If hoster doesn't have subfolders capability (this includes mirroring/multi-upload services): declare on top on module source:
MODULE_xxx_LIST_HAS_SUBFOLDERS=no. - You should notify with a
log_errormessage if module (so, it's on plowshare's side) doesn't support recursive subfolders option. For example in zalaa module:
test "$2" && log_error 'Recursive flag not implemented, ignoring'- When recursing sub folders, don't echo folder URL (but you can
log_debugit) - When recurse subfolders option is enabled:
$ERR_LINK_DEADmeans that there is no file in all folders. - When recurse subfolders option is disabled:
$ERR_LINK_DEADmeans that there is no file in the root folder. There might be files in sub folders.
Prototype is:
xxx_probe() {
local -r COOKIE_FILE=$1
local -r URL=$2
local -r REQ_IN=$3
local REQ_OUT
...
}Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh - xxx must not contain points, use underscores instead
- Never call
curl_with_logfunction here, usecurl.
Arguments:
-
$1: cookie file (empty content at start, use it with curl) -
$2: download URL to check -
$3: capability list. One character is one feature.
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowprobe will take care of this.
-
c: link is alive (usually:
0for ok or$ERR_LINK_DEADko, see below for details) - f: file name
- i: fileid (usually included in url)
-
s: file size (in bytes, no prefix/suffix). Use
translate_sizehelper function for converting if necessary. -
h: file hash (
md5,sha1,.. hexstring format). If several algorithms are available, always use the longest digest (for example:sha1is preferred tomd5). - t: file timestamp (unspecified time format)
- v: refactored file url (can be different from input url, for example short hostname or https redirections)
Of course depending hosters, this is not always possible to get access to these information.
When a link is correct, function should return 0 and echo check link char:
echo 'c'
return 0If you can parse filename, you can return this way:
echo "$FILE_NAME"
echo 'cf'
return 0Even better, if you can parse filename and filesize, you can return this way:
echo "$FILE_NAME"
echo "$FILE_SIZE"
echo 'cfs'
return 0OR
echo "$FILE_SIZE"
echo "$FILE_NAME"
echo "csf"
return 0Order is given by last argument (a variable usually called REQ_OUT).
Module can return the following codes:
-
0: Success. Link is alive (arguments have to be echoed, see below). -
$ERR_FATAL: Unexpected content (upstream updated, parsing error, etc). -
$ERR_LINK_DEAD: Link is dead, no more information can be returned.
Additional error codes (returned by plowprobe only, module list function should not return these):
-
$ERR_NOMODULE: No module available for provided link. -
$ERR_NETWORK: Specific network error (socket reset, curl, etc). -
$ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.
Some hosters are able to return more that one hash (for example: md5 and sha1). In that case %h must return the strongest algorithm.
A module option can be added to change %h behaviour (like --md5).
- Probe function should be fast and efficient. One single
curlrequest is advised. - Using
javascriptis strongly discouraged.
Do not use echo which is reserved for function return value(s). Use log_debug() or log_error().
You can use -vN command line option switch to change debug verbosity.
Note: An intermediate verbosity level exists: log_notice(), it is reserved to core functions, do not use it inside modules.
This is probably the most important command in plowshare API set. This wrapper function is calling curl real binary (let's call it true-curl)
Arguments:
-
$1...$n: true-curl command-line arguments -
$?:0for success or$ERR_NETWORK,$ERR_SYSTEM
Note: curl_with_log is calling curl but force verbose level to 3.
This is a specific usage for module upload function (should be called one time only).
It's a good habit to always append || return for error handling.
Examples:
PAGE1=$(curl "http://www.google.com") || return
# Get remote content and take cookies (if any)
PAGE2=$(curl -c "$COOKIE_FILE" "$URL") || return
# Get remote content, provides and append cookie entries
PAGE3=$(curl -c "$COOKIE_FILE" -b 'lang=en' "$URL") || return
PAGE4=$(curl -c "$COOKIE_FILE" -b "$COOKIE_FILE" "$URL") || return
PAGE5=$(curl "${URL}?param=1") || return
# or
PAGE5=$(curl --get --data 'param=1' "$URL") || returnNotes:
- curl will add a valid User-Agent for you.
- curl exit codes are mapped to plowshare error codes. Human debug message have been added too.
- curl are mapping implicitly plowdown (or plowup) command-line switches (
--interface,--max-rate, ...)
true-curl can handle one --cookie-jar/-c option and one --cookie/-b option:
PAGE=$(curl -c "$COOKIE_FILE_1" -b "$COOKIE_FILE_2" http://...) || return$COOKIE_FILE_2: entries will be read from file and set in the HTTP request header:
Cookie: key=value... $COOKIE_FILE_1: entries will be returned from HTTP server and written to file:
Set-Cookie: key=value... $COOKIE_FILE_1 and $COOKIE_FILE_2 can be the same filename.
true-curl does not handle multiple --cookie/-b switches, but you can only have one string (key=value) and one file argument. These are source entries (read only) given to HTTP protocol (Cookie: header).
Example 1 (last -b switch will be used only):
curl -b "$COOKIE_FILE_1" -b "$COOKIE_FILE_2" http://...
// $COOKIE_FILE_1 will be ignoredExample 2 (last -b switch will be used only):
curl -b 'lang=english' -b 'user=foo' http://...
// 'lang' cookie entry will be ignoredExample 3:
curl -b 'lang=english' -b "$COOKIE_FILE" http://...
// correct example
curl -b "$COOKIE_FILE" -b 'lang=english' -b 'user=foo' http://...
// 'lang' cookie entry will be ignoredFirst example using -H/--dump-headers:
HEADERS=$(create_tempfile) || return
HTML=$(curl -H "$HEADERS" http://...) || return
rm -f "$HEADERS"If something goes wrong in curl (network issue or anything else), $HEADERS will be deleted for you.
Remember, it's only if an error occurs. On curl's success nothing is deleted (as expected).
Another classic example if using -o/--output:
CAPTCHA_URL='http://...'
CAPTCHA_IMG=$(create_tempfile '.png') || return
curl -o "$CAPTCHA_IMG" "$CAPTCHA_URL" || return
...
rm -f "$CAPTCHA_IMG"If something append when retrieving captcha image, curl will delete temporary file for you.
Here is a first case with a POST request and content type application/x-www-form-urlencoded.
DATA="action=validate&uid=123456&recaptcha_challenge_field=$CHALLENGE&recaptcha_response_field=$WORD"
RESULT=$(curl -b "$COOKIE_FILE" --data "$DATA" "$URL") || returnConsider passing several -d/--data argument instead of one (order is not important).
RESULT=$(curl -b "$COOKIE_FILE" -d 'action=validate' \
-d "uid=123456" \
-d "recaptcha_challenge_field=$CHALLENGE" \
-d "recaptcha_response_field=$WORD" \
"$URL") || returnIt is better for maintenance.
Second example with a GET request:
URL='http://ab19.hostmyfile.net/upload'
RESULT=$(curl "$URL?X-Progress-ID=12345&premium=1") || returnCan be written in a better way:
URL='http://ab19.hostmyfile.net/upload'
RESULT=$(curl --get -d 'X-Progress-ID=12345' -d 'premium=1' "$URL") || return
You can see a full list of plowshare public API here.
core.sh script provides usual auxiliar functions.
| Do not use | But use |
|---|---|
basename |
basename_file |
grep -o "^http://[^/]*" |
basename_url |
sleep |
wait (must always be ORed with return keyword) |
grep, grep -i, grep -q
|
match and matchi |
sed, awk, perl
|
parse_* or replace_all, replace
|
head -n1, tail -n1
|
first_line, last_line
|
mktemp, tempfile
|
create_tempfile |
tr '[A-Z]' '[a-z]' |
lowercase |
tr '[a-z]' '[A-Z]' |
uppercase |
sed ... |
strip (delete leading and trailing spaces, tabs), delete_last_line
|
js |
detect_javascript and javascript
|
stat -c %s |
get_filesize |
$RANDOM or $$
|
random |
md5sum |
md5 or md5_file
|
wget |
curl |
Goal here, is not calling non portable commands in modules.
Arguments:
-
$1: (optional): how many head lines to take (default is 1). This must be a strictly positive integer. -
stdin: input data (multiline text)
Results:
-
$?:0on success or$ERR_FATAL(bad argument) -
stdout: result
Examples:
$ echo "$BUFFER1"
line a
line b
line c
line d
$ echo "$BUFFER1" | first_line
line a
$ echo "$BUFFER1" | first_line 3
line a
line b
line cArguments:
-
$1: (optional): how many head lines to delete (default is 1). This must be a strictly positive integer. -
stdin: input data (multiline text)
Results:
-
$?:0on success or$ERR_FATAL(bad argument) -
stdout: result
Examples:
$ echo "$BUFFER1"
line a
line b
line c
line d
$ echo "$BUFFER1" | delete_first_line
line b
line c
line d
$ echo "$BUFFER1" | delete_first_line 2
line c
line dIt is a useful function for registered accounts because ID information is stored inside cookie. This function will send the HTML form for you, It takes 4 or 5 arguments.
Arguments:
-
$1: authentication string 'username:password' (password can contain semicolons) -
$2: cookie file (system existing file) -
$3: string to post (can contain keywords:$USERand$PASSWORD) -
$4: URL -
$5..$n(optional): Additional curl arguments -
stdin: input data (text)
Example:
# comes from command line
AUTH="mylogin:mypassword"
# important: notice simple quote, $USER and $PASSWORD must not be interpreted.
LOGIN_DATA='login=1&redir=1&username=$USER&password=$PASSWORD'
LOGIN_URL="https://xxx.com/login.php"
# or simply use $(create_tempfile)
COOKIES=/tmp/my_cookie_file
post_login "$AUTH" "COOKIES" "$LOGIN_DATA" "$LOGIN_URL" >/dev/nullResults:
-
$?:0for success;$ERR_NETWORK,$ERR_LOGIN_FAILEDfor error (no cookie return) -
stdout: HTML result of POST request
A common usage is (snippet taken from filesonic module):
LOGIN_RESULT=$(post_login "$AUTH" "$COOKIE_FILE" "$LOGIN_DATA" \
'http:///www.fileserve.com/login.php') || return
If no password is provided, post_login will prompt for one.
Warning: Having $?=0 does not mean that your account is valid, it just means that the request (in a HTTP protocol point of view) have been successful. For detecting bad login/password, you'll have to parse returned HTML content or sometimes cookie file.
Note: Sometimes, parsing LOGIN_RESULT can be useful to distinguish free account from premium account. Sometimes parsing cookie (looking for specific entry in it) can help too.
An empty $LOGIN_RESULT is not necessarily an error. You can get for example a HTTP redirection. You could eventually follow this redirection by giving '-L' option to curl:
LOGIN_RESULT=$(post_login "$AUTH" "$COOKIE_FILE" "$LOGIN_DATA" \
"$BASEURL/login.php" -L) || returnYou already have valid entries in $COOKIEFILE (language for example) and you want keeping them.
LOGIN_RESULT=$(post_login "$AUTH_FREE" "$COOKIEFILE" "$LOGIN_DATA" \
"$BASE_URL/dynamic/login.php?popup=1" -b "$COOKIEFILE") || returnWithout this additional -b "$COOKIEFILE" given to curl, cookie file would be overwritten.
Arguments:
-
$1: match regexp (like grep) -
$2: input data (text)
Results:
-
$?:0for success; not null any error -
stdout: nothing!
'I' letter stand for case-insensitive match.
Regexp are basic posix (BRE syntax). Reserved characters (to escape) are: . ** [ ] $ ^ \.
Coding convention is to use the shortest write:
match 'foo' "$HTML_PAGE" && ... // right
$(match 'foo' "$HTML_PAGE") && ... // wrong (useless subshell creation)
match '\(foo\)' "$HTML_PAGE" && ... // wrong (useless parenthesis)
if (! match 'You are ' "$HTML"); then // wrong (useless subshell creation)
...
fiTypical use:
if ! match '/js/myfiles\.php/' "$PAGE"; then
log_error "not a folder"
return $ERR_FATAL
fiif match '<h1>Delete File?</h1>' "$PAGE"; then
...
fiif match '/error\.php?code=25[14]' "$LOCATION"; then
return $ERR_LINK_DEAD
fiSimple examples:
match '[0-9][0-9]\+' 'Wait 19 seconds' // true
match '[0-9][0-9]\+' 'Wait 9 seconds' // false
match 'times\?' 'One time ago' // true
match 's/n' 'yes/no' // true
match '(euros)' '3.5 (euros)' // true
match '\[euros\]' '3.5 [euros]' // trueMore examples (seen in modules):
match '^http://download' "$LOCATION" // ^ matches beginning of line
match 'errno=999$' "$LOCATION" // $ matches end of line
match '.*/#!index|' "$URL" // . means any character
match 'File \(deleted\|not found\|ID invalid\)' "$ERROR"
// Character classes can be used too (see POSIX bracket expressions)
match 'Password:[[:space:]]*<input' "$HTML"The first function will return first match, second one will return all matches (multiline result). sed command is internally used here.
Arguments:
-
$1: filter regexp (lines to stop;.or empty to stop on every line) -
$2: parse regexp (enclose with ( ) to retrieve match) -
$3(optional): number of line to skip (default is 0) -
stdin: input data (text)
Results:
-
$?:0on success or$ERR_FATAL(non matching or empty result) -
stdout:parsed content (non null string)
Regexp are basic posix (BRE syntax). Reserved characters (to escape) are: . * [ ] $ ^ \.
Note: Remember that Bash can interpret some symbols in double quoted strings. The following characters must be escaped: $ (dollar sign),
" (double quote), backticks character. Also ! (exclamation sign) must be escaped if Bash history expansion is enabled. Use simple quote string it's easier!
Examples:
ID=$(echo "$HTML_PAGE" | parse 'name="freeaccountid"' 'value="\([[:digit:]]*\)"')
HOSTERS=$(echo "$FORM" | parse_all 'checked' '">\([^<]*\)<br')
MSG=$(echo "$RESPONSE" | parse_quiet "ERROR:" "ERROR:[[:space:]]*\(.*\)")Example using $ (end-of-line) meta-character:
# Parse: [key]='7be8933035d221026ff2245be258c763';
# Notes:
# - Don't forget to escape `[` in the match regexp.
# - [:cntrl:] is used here to match `\r` because answer comes from an Windows server.
# - `$` matches end of line.
HASH=$(echo "$PAGE" | 'Array\.downloads\[' "\]='\([[:xdigit:]]\+\)';[[:cntrl:]]$")Always keep in mind that parsing is greedy. So within a line, last occurrence will be taken. For example:
# Usual greedy behavior. Result: 789
echo 'value=123, value=456, value=789' | parse . '=\([^,]\+\)'
# Modify regex to get second value. Result: 456
echo 'value=123, value=456, value=789' | parse . '=\([^,]\+\),'
# Modify regex to get first value. Result: 123
echo 'value=123, value=456, value=789' | parse . '^value=\([^,]\+\)'
FIXME: Add example with ^
Use xxx_quiet functions when parsing failure is a normal behavior, for example, parsing an optional value.
Typical use:
OPT_RESULT=$(echo "$HTML_PAGE" | parse_quiet 'id="completed"' '">\([^<]*\)<\/font>')If you actually require a result, do not use xxx_quiet. This way you'll get a sed error message if parse fails, i.e. when your parse regexp did not capture anything.
Typical use:
WAIT_TIME=$(echo "$HTML_PAGE" | parse '^[[:space:]]*count=' "count=\([[:digit:]]\+\);") || returnNote: Don't use these functions for HTML parsing. Consider using parse_tag and parse_attr functions family
(see below Parsing HTML markers and Parsing HTML attributes).
Use the offset whenever filter regexp and parse regexp are not on the same line. A positive value will skip ahead the specified number of line whiles a negative value will apply your parse regexp to a line before the one that matched your filter regexp. See the following examples:
<div class="dl_filename">
FooBar.tar.bz2</div>We can get the right line with filtering with dl_filename and apply your filename regexp on the second line (the line after). This will give:
echo "$PAGE" | parse 'dl_filename' '\([^<]*\)' 1Example 2:
function js_fff() {
R4z5sjkNo = "http://...";
DelayTime = 60;
...Get URL with:
DL_LINK=$(echo "$PAGE" | parse 'js_fff' '"\([^"]\+\)";' 1) || returnGet counter value with:
COUNT=$(echo "$PAGE" | parse 'js_fff' '=[[:space:]]*\([[:digit:]]\+\)' 2) || return
Example 3 (negative offset):
<TD><input type="checkbox" name="file_id" value="123456"></TD>
<TD align=left><a href="http://...">FooBar.tar.bz2</a></TD>
To get the file ID that belongs to a known URL you can use:
FILE_ID=$(echo "$PAGE" | parse "$URL" '^\(.*\)$' -1 | parse_form_input_by_name 'file_id') || returnFirst retrieve the whole line that is directly before the one containing the known URL. Then parse the file ID with one of plowshare's form parsing functions (see below Parsing HTML forms).
Get basename (hostname) of an URL.
Argument:
-
$1: string (URL)
Result:
-
$?: always 0 -
stdout:basename of URL (if possible) or the same input argument
A=$(basename_url 'http://code.google.com/p/plowshare/wiki/NewModules'
# result: http://code.google.com
B=$(basename_url 'http://code.google.com/'
# result: http://code.google.com
C=$(basename_url 'abc'
# result: abcSupported protocols: http, https, ftp, ftps, file.
Check if URL is suitable for remote upload.
Argument:
-
$1: string (URL) -
$2..$n(optional): additional URI scheme names to match
Result:
-
$?: 0 on success or$ERR_FATAL(not a remote accepted URL)
Called with one single argument, http and https are accepted.
URL='http://www.foo/bar'
if match_remote_url "$URL"; then
...
fiIf you want to accept more schemes, add them to the argument list.
URL='ftp://www.foo/bar'
if match_remote_url "$URL" 'ftp'; then
...
fiArgument:
-
stdin: data (HTTP headers)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or empty string) -
stdout: parsed header (non null string)
If you think you reached the final url (let's call it $FINAL_URL) for download, and when you curl it (with -I/--head option), you got some HTTP answer like this:
HTTP/1.1 301 Moved Permanently
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /download/123/5687/final_filename.xyz
Content-type: text/html
Content-Length: 0
Connection: close
Date: Sun, 17 Jan 2010 14:34:47 GMT
Server: Apache
Use grep_http_header_location to deal with this redirection. Have a look at sendspace module:
HOST=$(basename_url "$FINAL_URL")
PATH=$(curl -I "$FINAL_URL" | grep_http_header_location) || return
echo "${HOST}${PATH}"Another example with absolute uri (comes from euroshare.eu):
HTTP/1.1 302 Found
Date: Sat, 10 Mar 2012 11:14:31 GMT
Server: Apache/2.2.16 (Debian)
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Set-Cookie: sid=61bu6nt3kkh9nsk92mg7otg501; expires=Sun, 11-Mar-2012 11:14:31 GMT; path=/
Location: http://s1.euroshare.eu/download/3598184/aXa2YWy3ytUhu3uVUsAQEgUzUDUseje3/5344113/myfile.zip
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: x-requested-with
Access-Control-Allow-Headers: x-file-name
Access-Control-Allow-Headers: content-type
Vary: Accept-Encoding
Content-Type: text/html
FILE_URL=$(curl -I "$FINAL_URL" | grep_http_header_location) || return
echo "$FILE_URL"Note: Like other *_quiet functions, grep_http_header_location_quiet is silent and do always return 0. Use this only on dedicated case. For example:
FILE_URL=$(echo "$HTML_PAGE" | grep_http_header_location_quiet) || return
if [ -z "$FILE_URL" ]; then
... # not premiumArgument:
-
stdin: data (HTTP headers)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or empty string) -
stdout: parsed filename (non null string)
Sharing websites often return their files as an attachment. curl doesn't care about Content-Disposition:. So, it will not parse this HTTP header but keeps url as name reference (see -O option documentation).
$ curl http://p123.share-site.com/download/dl.php?id=123456456
# saved filename will be: "dl.php?id=123456456"The reason for that, is that link can have multiple attachments. Note: This is a difference between curl and wget.
Note: This is not true anymore. Since curl 7.20.0, -J/--remote-header-name option has been added (you must combine it with -O/--remote-name). Plowshare does not use this for now.
Have a look at divshare module:
FILE_NAME=$(curl -I "$FILE_URL" | grep_http_header_content_disposition) || returnBefore plowdown core script make the final HTTP GET request, module is doing a HTTP HEAD request in order to parse attachment header and get filename.
$ curl -I http://p123.share-site.com/download/dl.php?id=123456456
HTTP/1.0 200 OK
Date: Sun, 28 Feb 2010 11:41:50 GMT
Server: Apache
Last-Modified: Mon, 12 Oct 2009 10:04:20 GMT
ETag: 9852859-16341905311255341860
Cache-Control: max-age=30
Content-Disposition: attachment; filename="kop_standard.pdf"
Accept-Ranges: bytes
Content-Length: 412848
Vary: User-Agent
Keep-Alive: timeout=300, max=100
Connection: keep-alive
Content-Type: application/octet-stream
Notice that some sharing sites does not an allow HTTP HEAD requests. Restricting web server is maybe a security concern?
There is a possible workaround: HTTP 1.1 protocol allow to make to HTTP GET request and specify a byte range.
FILE_NAME=$(curl -i -r 0-99 "$FILE_URL" | grep_http_header_content_disposition) || returnThis is not very classy, but this can work, except if sharing site only allow one (and only one) HTTP request to that final URL (uploaded.to for example). In that case you couldn't get attachment filename.
Retrieve a specific HTTP header.
Argument:
-
stdin: data (HTTP headers)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or empty string) -
stdout:parsed content (non null string)
| Name | HTTP header |
|---|---|
grep_http_header_content_length |
Content-Length |
grep_http_header_content_location |
Content-Location |
grep_http_header_content_type |
Content-Type |
$ curl --head http://share-site.net/wm8tbV6gZCp
HTTP/1.1 200 OK
Content-Disposition: attachment; filename="foobar"
Content-length: 5156
Content-Type: application/octet-stream
Date: Mon, 30 Sep 2013 06:29:14 GMT
ETag: "bc7f4762443939bd7dccb42370f0d932"
Last-Modified: Mon, 30 Sep 2013 06:28:44 GMT
Server: Apache
Vary: User-Agent
Connection: keep-alive
Arguments:
-
$1: entry name -
stdin: data (netscape/mozilla cookie file format)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or empty string) -
stdout:parsed content (non null string)
This is often used to get account settings. Sometimes, for premium account, remote site adds an extra key in cookie file. So it can be convenient to differ free account from premium account.
LOGIN_ID=$(parse_cookie 'Login' < "$COOKIEFILE") || return
PASS_HASH=$(parse_cookie 'Password' < "$COOKIEFILE") || return
# At this point You are sure that $LOGIN_ID and $PASS_HASH are valid (non empty)Note: Like other *_quiet functions, parse_cookie_quiet is silent and do always return 0. Use this only on dedicated case. For example:
USERNAME=$(parse_cookie_quiet 'login' < "$COOKIEFILE")
if [ -z "$USERNAME" ]; then
... # invalid account
return $ERR_LOGIN_FAILED
fiArguments:
-
$1(optional): filtering regexp. -
$2: tag name. This is case sensitive. -
stdin: data (HTML, XML)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or empty marker) -
stdout:parsed content (non null string)
| Name | Usage example |
|---|---|
parse_tag |
T=$(echo "$LINE" |
parse_tag_quiet |
Same as parse_tag but don't print on parsing error |
parse_all_tag |
n/a |
parse_all_tag_quiet |
Same as parse_all_tag but don't print on parsing error |
The _all functions are for multiline content, one tag is parsed per line.
Important: If you have several matching tags on the same line, the first one is taken.
Remember that this is line oriented, if beginning tag and ending are not on the same line, it won't work. It's not perfect, but for now, it covers all our need.
Examples:
LINE='... <a href="link1">Link number 1</a> <a href="javascript:;">Link number 2</a>'
LINK1=$(echo "$LINE" | parse_tag a) || return # First link returnedLINE='... <b></b> ...'
CONTENT=$(echo "$LINE" | parse_tag b) || return # Error: <b> content is empty# Nested elements: take the deepest one!
WAIT_MSG='<span id="foo">Wait <span id="bar">30</span> seconds</span>'
WAIT_TIME=$(echo "$WAIT_MSG" | parse_tag span) || return # 30Note: parse_tag b is equivalent to parse_tag . b and parse_tag b b.
Arguments:
-
$1(optional): filtering regexp. -
$2: attribute name -
stdin: data (HTML, XML)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or empty attribute) -
stdout:parsed content (non null string)
| Name | Usage example |
|---|---|
parse_attr |
`LINK=$(echo "$IMG" |
parse_attr_quiet |
Same as parse_attr but don't print on parsing error |
parse_all_attr |
`LINKS=$(echo "$PAGE" |
parse_all_attr_quiet |
Same as parse_all_attr but don't print on parsing error |
The _all functions are for multiline content, one attribute is parsed per line.
Quoting is handled according to HTML5 standard:
<div class="foo">
<div class = "foo" >
<div class='foo'>
<div class=foo>
<div class = foo >Note: In XHTML, all attribute values must be quoted using double quote marks.
Important: If you have several matching attribute on the same line, the last one is taken (parsing is greedy).
Examples:
IMG='<img href="http://foo.com/bar.jpg" alt="">'
CONTENT=$(echo "$IMG" | parse_attr img alt) || return # Error: 'alt' content is emptyPAGE='<a href="http://...">click here to download</a>'
LINK=$(echo "$PAGE" | parse_attr 'download' 'href') || return
log_debug "[$LINK]" # [http://...]IMG='<img href="http://foo.com/bar.jpg" id = image_id>'
ID=$(echo "$IMG" | parse_attr 'id') || returnNote: parse_attr b is equivalent to parse_attr . b and parse_attr b b.
Some websites return page as a single big line of HTML (without any EOL). As parse_xxx functions are per-line oriented, proper parsing can be difficult. Two functions exist:
break_html_lines and break_html_lines_alt (more aggressive) to split single line HTML.
There are 3 helper functions.
Arguments:
-
$1: input (X)HTML data -
$2: 1-based index or string -
stdin: data (HTML, XML)
Result:
-
$?: 0 on success or$ERR_FATAL(non matching or no such form) -
stdout:parsed content (non null string)
| Name | Usage example |
|---|---|
grep_form_by_order |
FORM_HTML=$(grep_form_by_name "$PAGE" 2) |
grep_form_by_name |
FORM_HTML=$(grep_form_by_name "$PAGE" 'named_form') |
grep_form_by_id |
FORM_HTML=$(grep_form_by_name "$PAGE" 'id_form') |
You are strongly encouraged to append regular || return error handling.
Note: grep_form_by_order can take a negative index (as argument $2). Get last form of page with -1. Giving 0 or null string will default to 1.
Tip: On some websites HTML data contain commented HTML or JS code. It can sometimes be useful to strip HTML comments. There is a function doing this named strip_html_comments (input data on stdin, filtered data on stdout).
As other parse functions, input argument is through stdin.
| Name | Usage example |
|---|---|
parse_form_action |
`ACTION=$(echo "$FORM_HTML" |
parse_form_input_by_id |
`VALUE=$(echo "$FORM_HTML" |
parse_form_input_by_name |
`VALUE=$(echo "$FORM_HTML" |
parse_form_input_by_type |
`VALUE=$(echo "$FORM_HTML" |
Example:
FORM_URL=$(grep_form_by_order "$HTML_PAGE" 1 | parse_form_action) || return
# We are sure here, that $HTML_PAGE has a form with an action attribute
# We can safely use $FORM_URL now
Note: parse_form_input_by_id_quiet, parse_form_input_by_name_quiet and parse_form_input_by_type_quiet are available.
Like other *_quiet functions, there's no error message and do always return 0.
You generally use them when you want to parse a html form field with possible empty value. For example:
FORM_SID=$(echo "$FORM_HTML" | parse_form_input_by_id_quiet 'sid')
# $FORM_SID can be empty for anonymous users and it can be defined
# (non empty: session id defined) for account user.
core.sh script provides some functions.
Captchas are solved using --captchamethod command line option (in plowdown, plowup and plowdel). If not defined, it is autodetected (look for an image viewer and prompt for answer).
Arguments:
-
$1: local image file (any format) or URL (which doesn't require cookies) -
$2: captcha type or hint -
$3(optional): minimum length -
$4(optional): maximum length
Current captcha types:
-
recaptcha(better userecaptcha_process()to get reload feature) -
solvemedia(better usesolvemedia_captcha_process()to get reload feature) - digits
- letters
Results:
-
stdout(2 lines) : captcha answer (ascii text) / transaction id -
$?: 0 for success, or$ERR_CAPTCHA,$ERR_FATAL,$ERR_NETWORK
Typical usage ($CAPTCHA_IMG is a valid image file):
local WI WORD ID
WI=$(captcha_process "$CAPTCHA_IMG" ocr_digit) || return
{ read WORD; read ID; } <<< "$WI"
rm -f "$CAPTCHA_IMG"
Note: If something goes wrong ($? is not 0), argument image file is deleted.
Argument:
-
$1: site key
Results:
-
stdout(3 lines) : captcha answer (ascii text) / recaptcha challenge / transaction id -
$?: 0 for success, or$ERR_CAPTCHA,$ERR_FATAL,$ERR_NETWORK
Typical usage:
local PUBKEY WCI CHALLENGE WORD ID
PUBKEY='6Lftl70SAAABAItWJueKIVvyG5QfLgmAgtKgVbDT'
WCI=$(recaptcha_process $PUBKEY) || return
{ read WORD; read CHALLENGE; read ID; } <<< "$WCI"
Argument:
-
$1: site key
Results:
-
stdout(2 lines) : verified challenge / transaction id -
$?: 0 for success, or$ERR_CAPTCHA,$ERR_FATAL,$ERR_NETWORK
Each time you call captcha_process or recaptcha_process, you get a transaction id as result. Once captcha result submitted, module function must acknowledge or not acknowledge captcha transaction reply (some solving captcha services can refund credits on wrong answer).
Validation captcha answer is made through two functions: captcha_ack or captcha_nack.
Argument:
-
$1: transaction id
Typical usage:
if match ... wrong captcha ...; then
captcha_nack $ID
log_error "Wrong captcha"
return $ERR_CAPTCHA
fi
captcha_ack $ID
log_debug "correct captcha"Note: A module must not loop in case of wrong captcha, just captcha_nack and return $ERR_CAPTCHA. The retry mechanism is made at upper level with plowdown -r policy.
Stands for JavaScript Object Notation. Official format standard is RFC4627.
If you know nothing about JSON, try this:
curl http://twitter.com/users/bob.json | python -mjson.tool
Simple and limited JSON parsing. sed command in internally used here. This is really a poor line-oriented parser (instead beeing tree oriented).
Arguments:
-
$1: variable name (string) -
$2(optional): preprocess option. Accepted values are:joinandsplit. -
stdin: input JSON data
Results:
-
$?:0on success or$ERR_FATAL(non matching or empty result) -
stdout:parsed content (non null string)
Important notes:
- Single line parsing oriented (user should strip newlines first): no tree model
- Array and Object types: basic poor support (depth 1 without complex types)
- String type: no support for escaped unicode characters (
\uXXXX) but two-character escaped sequences are handled (for exemple:\t) - No non standard C/C++ comments handling (like in JSONP)
- If several entries exist on same line: last occurrence is taken (lile
parse_attr), but: consider precedence (order of priority): number, boolean/empty, string. - If several entries exist on different lines: all are returned (it's a
parse_all_json)
Simple usage:
FILE_URL=$(echo "$JSON" | parse_json 'downloadUrl') || returnJSON='{"name":"foo","attr":["size":123,"type":"f","url":"http:\/\/www.bar.org\/4c0476"]}'
# ARR='["size":123,"type":"f"]'
ARR=$(parse_json 'attr' <<< "$JSON")
# URL='http://www.bar.org/4c0476'
# (as you can see, it does not care about hierarchy)
URL=$(parse_json 'url' <<< "$JSON")
Arguments:
-
$1: name (string) -
$2: input data (json data)
Results:
-
$?:0for success; not null any error -
stdout: nothing!
This will literally match for true boolean token, "true" string token or any number will be considered as false.
# Assuming that a curl request can result one of two $JSON content:
# JSON='{"err":"Entered digits are incorrect."}'
# JSON='{"ok":true,"dllink":"http:\/\/www.share-me.com\/..."}'
if ! match_json_true 'ok' "$JSON"; then
ERR=$(echo "$JSON" | parse_json_quiet err)
test "$ERR" && log_error "Remote error: $ERR"
return $ERR_FATAL
fi
log_debug "ok answer..."
Arguments:
-
stdin: input JavaScript code
Results:
-
$?:0on success or$ERR_FATAL(js error) -
stdout: result
Example:
JS='print("Hello World!");'
RESULT=$(javascript <<< "$JS") || return
log_debug "result: '$RESULT'"Modules using javascript function need to add on top the module function (for example zippyshare_download) this line:
detect_javascript || returnImportant note: Don't use classes that are not in javascript core engine. For example:
var strJson = '{"City":"Paris", "Country":"France"}';
var objJson = JSON.parse(strJson);
var dump = JSON.stringify(objJson, null, 2);
print(dump);rhino interpreter will know JSON object, but not spidermonkey:
ReferenceError: JSON is not defined
When entering module function, dedicated module arguments will be processed according to module variables:
MODULE_XXX_DOWNLOAD_OPTIONSMODULE_XXX_UPLOAD_OPTIONSMODULE_XXX_DELETE_OPTIONSMODULE_XXX_LIST_OPTIONSMODULE_XXX_PROBE_OPTIONS
Assuming module source contains:
MODULE_XXX_DELETE_OPTIONS="
AUTH,a,auth,a=USER:PASSWORD,User account"Assuming user is invoking plowdel with an account:
$ plowdel -a 'user:password' 'http://www.sharing-site.com/?delete=12D45G5'xxx_delete will be called with the environment variable defined:
AUTH='user:password'
AUTH,a,auth,a=USER:PASSWORD,Premium account
AUTH_FREE,b,auth-free,a=USER:PASSWORD,Free account
Most of the time, when a module can deal with both free and premium, we will see a single option:
AUTH,a,auth,a=USER:PASSWORD,User account
For delete, it's quite usual that authentication is mandatory for deleting files, you'll see:
AUTH,a,auth,a=USER:PASSWORD,User account (mandatory)
LINK_PASSWORD,p,link-password,S=PASSWORD,Used in password-protected files
NOMD5,,nomd5,,Disable md5 authentication (use plain text)
Ask for password if not supplied:
log_debug "File is password protected"
if [ -z "$LINK_PASSWORD" ]; then
LINK_PASSWORD=$(prompt_for_password) || return
fiLINK_PASSWORD,p,link-password,S=PASSWORD,Protect a link with a password
DESCRIPTION,d,description,S=DESCRIPTION,Set file description
TOEMAIL,,email-to,e=EMAIL,<To> field for notification email
FROMEMAIL,,email-from,e=EMAIL,<From> field for notification email
INCLUDE,,include,l=LIST,Provide list of host site (comma separated)
COUNT,,count,n=COUNT,Take COUNT hosters from the available list. Default is 5.
PRIVATE_FILE,,private,,Do not allow others to download the file
FOLDER,,folder,s=FOLDER,Folder to upload files into (account only)
ADMIN_CODE,,admin-code,s=ADMIN_CODE,Admin code (used for file deletion)
| Name | Description |
|---|---|
| a | Authentication string (user:password or user) |
| n | Positive integer (>0) |
| N | Positive integer or zero (>=0) |
| s | Non empty string |
| S | Any string |
| t | Non empty string, multiple command-line switch allowed |
| e | Email address string |
| l | Comma-separated list, strip leading & trailing spaces |
| f | Filename (with read access) |
Reserved argument types (should not be used in modules):
| Name | Description |
|---|---|
| c | Choice list (argument must match a string) |
| C | Same as c type, but empty string is allowed |
| r | Speed rate. Allowed suffixes: Ki, K, k, Mi, M, m. |
| R | Disk size. Allowed suffixes: Mi, m, M, MB, Gi, G, GB. |
| F | Executable (search in $PATH and $HOME/.config/plowshare/exec) |
| D | Directory (with write access) |
Assuming module source contains:
MODULE_XXX_UPLOAD_OPTIONS="
INCLUDE,,include,l=LIST,Provide list of host site (comma separated)"Assuming user is invoking plowup this way:
$ plowup xxx --include 'first, second,thir d' myfile.foo
xxx_upload will be called with the environment variable defined:
# This is an array
INCLUDE=( 'first' 'second' 'thir d')- Consider module option variables (
AUTH,LINK_PASSWORD, ...) as read only, don't reassign them. - Because of command-line parsing, modules options with the same name must have the same argument type, this is important. For example: if module1 has an option
--useapiwith types(non empty string), module2 can't have option--useapiwith typen(positive integer).
WAIT_TIME=$(echo $WAIT_HTML | parse 'foo' '.. \(...\) ..')Won't give you expected answer if $WAIT_HTML is multiline (which is most of the time the case).
You should write instead:
WAIT_TIME=$(echo "$WAIT_HTML" | parse 'foo' '.. \(...\) ..')Consider this example for understanding:
$ MYS=$(seq 3)
$ echo "$MYS"
1
2
3
$ echo $MYS
1 2 3
$ echo $MYS | xxd
0000000: 3120 3220 330a 1 2 3.
More information about word splitting.
Unfortunately, this is not correct:
local HTML_PAGE=$(curl "$URL") || returnIf curl function returns an error, it won't be catched by || return because of the local keyword.
local HTML_PAGE
...
HTML_PAGE=$(curl "$URL") || returnis correct.
$ set -- test
$ [ -z "$1" ] && echo empty || echo nonempty
nonempty
$ set --
$ [ -z "$1" ] && echo empty || echo nonempty
empty$ set -- test
$ [ -z "$1" ] || echo nonempty && echo empty
nonempty
empty
$ set --
$ [ -z "$1" ] || echo nonempty && echo empty
emptyLooks like "&& ||" is better than "|| &&". But imagine that echo empty does not return $?=0:
$ set --
$ [ -z "$1" ] && echo empty; false || echo nonempty
empty
nonemptyFinally, classic if/then/else/fi is not so bad!
if [ -z "$1"]; then
echo empty
else
echo nonempty
fiSee also shellcheck.net note.
Don't put '&&' test as last statement of a function. For example:
myhoster_upload() {
...
echo "$DL_URL"
[ -n "$PUBLIC_FILE" ] && echo "$DEL_URL"
}If $PUBLIC_FILE is not empty, myhoster_upload() will return $?=0. This is good.
But if $PUBLIC_FILE is empty, echo is not performed (as wished) and myhoster_upload() will return $?=1. Plowup will assume this is a $ERR_FATAL module return. This is not what we want! We only want to display the download link and not the delete link (because it's not available).
So prefer this:
myhoster_upload() {
...
echo "$DL_URL"
[ -z "$PUBLIC_FILE" ] || echo "$DEL_URL"
}A paranoid version:
myhoster_upload() {
...
echo "$DL_URL"
[ -z "$PUBLIC_FILE" ] || echo "$DEL_URL"
return 0
}Plowshare is running on lots of unix/linux systems. There is always several ways to write bash code. We try to keep compatibility with busybox shell.
Things to take care or avoid in your module functions:
- no
awkinvocation - no
xargsinvocation - no
grep -v(invert match) invocation - no
wcinvocation (wc -ccan be easily replaced with bash internal string manipulation) - no infinite loops like
while true;orwhile :;. - no
tr -d, try using bash internal replacement. For example:${MYSTRING//$'\n'}orreplace_allfor multiline content.
Bash specific construct to avoid:
- no bash regexp:
[[ =~ ]](requires bash >=3.0). This is an historic choice not using it. Behavior has changed (see E14 in bash FAQ). - no
+=string (or array) concatenation operator (requires bash >=3.1) - no for loop expand sequence:
for i in {1..10} ; do ... ; done(requires bash >=3.0). You can useseqinstead. - no
printf -v(requires bash >=3.1).
BSD specific pitfalls:
-
base64 -dis a GNU coreutils only short option. Usebase64 --decodeinstead. -
BSD sedhas less feature thanGNU sed(can't use\?or\rfor example). Try to use parse_* functions instead. -
stat -cis only available on GNU. Uselsinstead. -
readlink -fis only available on GNU.
Busybox specific pitfalls:
-
grep -oandgrep -w(word-regexp) are not supported by old versions of busybox. Do not use them. -
sleepwiths/msuffixes or even fractional argument (example:sleep 1m). BusyBox may not be compiled withCONFIG_FEATURE_FANCY_SLEEPoption. -
trwith classes (such as[:upper:]). Busybox may not be compiled withCONFIG_FEATURE_TR_CLASSESoption. -
seddoes not support\xNNescaped sequences. Tested on Busybox 1.13, 1.18 and 1.19.3. -
seddoes not support\rescaped sequence before version 1.19 commit. Don't use it, find another way! -
seddo support\s,\S,\w,\W(these are GNU extensions). But prefer using the equivalent:[[:space:]],[^[:space:]],[[:alnum:]_],[^[:alnum:]_].
Try being compliant with bash 3.x. Interesting reading:
- Do not create temporal files unless necessary, don't forget to delete it if you used one.
-
curlcalls should not be invoked with--silentoption.curlwrapper function take care of verbose level.
It's because we want to be portable as much as possible. We loose flexibility, but it can be run on slow and old embedded hardware, this is the original starting point of the project. But maybe plowshare with bash 4.0 as minimum requirement will pop-up one day...
- GPL-compatible license.
- No tabs, use 4 spaces. Also use 4 spaces after splitted
\lines - Line lengths should stay within 80 columns.
- Comments (like ruby) are written in english. No extra empty line before function declaration. No boxes or ascii art stuff.
- Always declare (with
localkeyword) variables you are using.
- Uppercase variables, this is an historical choice, let's keep traditions. We suggest using underscore in it. For example:
MARY_POPPINS(instead ofMARYPOPPINS). This is optional but recommended (especially for names with more than 7 characters). For exampleAPIURL,DESTFILEandFILEURLare accepted.COOKIEFILEis accepted too (butCOOKIE_FILEis prefered). - Use appropriate names to ease maintainability. For example:
FILE_URL(instead ofMARY_POPPINS). Don't use too long variable name: for exampleUPLOADED_FILE_JSON_DATAis too descriptive,JSON_DATAorJSONis enough. - For form parsing, usual names are:
FILE_ID,FILE_NAME,FILE_URL,BASE_URL,FORM_HTML,FORM_URL(action parameter),FORM_xxx(input field name in uppercase),ADMIN_URL,DELETE_ID,WAIT_TIME. - Usual names for curl results are
HTML,PAGE,RESPONSE,JSON,STATUS.
Remark: The choice of uppercase is historical and you can disagree with this approach. Convention is lowercase for internal or temporary variables and uppercase for environment or global variables. This convention avoids accidentally overriding environmental variables.
-
if/thenconstruct andwhile/doare on the same line. -
Restrict usage of curly braces:
test "$FILE_URL" || { log_error "location not found"; return $ERR_FATAL; }should be written:
if test "$FILE_URL"; then
log_error "location not found"
return $ERR_FATAL
fi3a. In comment, insert a space character of # symbol
#get id of file (wrong)
# Get id of file (right)3b. Avoid meaningless comments
# wait 15 seconds
wait 15 seconds || return- Proper indentation on continued lines
HTML=$(curl -b "$COOKIE_FILE" 'http://www.foo.bar/long...url...') \
|| returnshould be written:
HTML=$(curl -b "$COOKIE_FILE" \
'http://www.foo.bar/long...url...') || return- Simple quote strings as much as possible If there is no variable referencing of course!
local BASE_URL="http://shareme.com" # wrong
local BASE_URL='http://shareme.com' # right- Don't quote unless required
return "$ERR_LINK_TEMP_UNAVAILABLE" # wrong
return $ERR_LINK_TEMP_UNAVAILABLE # rightTest and retest your module. Little check-list of possible cases:
- File not found
- File temporarily unavailable
- File unavailable (server busy), come back in X minutes
- Download (quota) limit reached
- Your IP address is already downloading a file
- Password protected link
- Premium link download only
- etc.
Other concerns:
- Check for geographical location aware sites, it can affect url TLD
- Don't send incomplete script or nearly-working stuff.
- Don't use illegal or patented content, if you want to make some test, use material here.
- Advanced Bash-Scripting Guide (the bible)
- Bash hackers (very interesting page about bash version features)
- Greg's Wiki (very interesting page about bash pitfalls)
- Interesting mediawiki website (Freddy Vulto)
- Blog about shell scripting