____ _ _ _ ____ _ _
| _ \ __ _| |_ __ _| | __ _| | _____ / ___| ___ _ __(_)_ __ | |_ ___
| | | |/ _` | __/ _` | |/ _` | |/ / _ \ \___ \ / __| '__| | '_ \| __/ __|
| |_| | (_| | || (_| | | (_| | < __/ ___) | (__| | | | |_) | |_\__ \
|____/ \__,_|\__\__,_|_|\__,_|_|\_\___| |____/ \___|_| |_| .__/ \__|___/
|_|
Datalake scripts is developed by Datalake developers to help use the Datalake API
You can use this repository either as a library or as a CLI
With Python 3.6+:
$ pip install datalake-scripts
$ pip3 install datalake-scripts
The library requires to first create a Datalake instance and then to use the defined Classes' methods
The library tutorial is available in the following link
Example :
from datalake import Datalake, AtomType, Output
dtl = Datalake(username='username', password='password')
dtl.Threats.lookup(
atom_value='mayoclinic.org',
atom_type=AtomType.DOMAIN,
hashkey_only=False,
output=Output.JSON
)The cli can be used with:
$ ocd-dtl <command> <parameter>Check ocd-dtl -h for help, including the list of commands available.
You can also use a script directly by using the following command: <script_name> <script_options>.
/!\ Make sure to use utf-8 without BOM when providing a file as input (
-i, --inputparameter)
Common parameters for all commands:
-e, --env <preprod|prod>: Datalake environment. Default is prod-o, --output <OUTPUT_PATH>: will set the output file as the API gives it. No default-D, --debug: will raise the verbosity of the program (by displaying additional DEBUG messages). Default log level is INFO-q, --quiet: will quiet the verbosity of the program (but will still show ERROR / WARNING messages). Default log level is INFO
Commands can also have additionary mandatory or optional parameters
For information about each command and more, please check the documentation directory
There are two methods of authentication:
- The first one is the use of the username and password. Every request to the API, will then use fresh tokens periodically created with these credentials.
- The second one is the use of a long term token. You can create long term token through the GUI, it can have more restricted permissions than your account. You can create several long term tokens for one account.
In case you don't want to enter credentials for each commands and you are on a secured terminal, set those variables:
OCD_DTL_LONGTERM_TOKENa long term token associated to your Datalake account. Please note that if this variable is set, then the long term token will be used for every request to the Datalake API, even if you set the username and passsword environment variables below. This is important because some endpoints / requests do not accept long term tokens but need fresh tokens (ie a Datalake instance with username and password). Check for the need of fresh tokens in each endpoint description here
or
OCD_DTL_USERNAMEemail address used to login on Datalake API/GUI.OCD_DTL_PASSWORDpassword used to login on Datalake API/GUI.
These last two are independent and one can be used without the other if you wish.
You can set up following environment variables :
HTTP_PROXYHTTPS_PROXY
We use the format accepted by the requests python library. See its documenation for other possible kinds of proxy to set up.
For throttling the requests, those two environment variables can be used:
OCD_DTL_QUOTA_TIMEdefines, in seconds, the time before resetting the requests limit, default is 1 second.OCD_DTL_REQUESTS_PER_QUOTA_TIMEdefines the number of request to do at maximum for the given time, default is 5 queries. We recommend to lower theOCD_DTL_REQUESTS_PER_QUOTA_TIMEvalue, if you encounter too many 429 errors.
Please don't exceed the quota marked here for each endpoint
Only network errors and HTTP response code 429, 500, 502, 503 and 504 trigger retries. You may control the number of retries using the environment variable OCD_DTL_MAX_RETRIES, which defaults to 3.
To develop on this repository, please refer to this file