@Author: atediarjo@gmail.com{.email}
Internal Tooling for DataBrew for Cloud ETL.
The purpose of this R package is to enable user to be able to retrieve and store research data at scale in DataBrew AWS Environment.
-
To enable
cloudbrewrto run on your machine, make sure that you have AWS CLI installed. Installation Link -
SSO Access - please reach out to DataBrew team (atediarjo@gmail.com{.email}, joebrew@gmail.com{.email})
Installation can be done through Github installation:
devtools::install_github('databrew/cloudbrewr')There are two procedures in AWS Session Authentication
- Interactive: RStudio, RStudio Shell
- Non-Interactive: Bash, Terminal (Not the one in RStudio)
When running your script through non-interactive session (terminal/bash/virtual-machines) please manually export the access keys from the Command line or programmatic access in the SSO Portal
In bash:
export AWS_ACCESS_KEY_ID='SOME_ACCESS_KEY'
export AWS_SECRET_ACCESS_KEY='SOME_SECRET_ACCESS_KEY'
export AWS_SESSION_TOKEN='SOME_SESSION_TOKEN'Your RScript:
library(cloudbrewr)
cloudbrewr::aws_login() # no need to define role name if using manual exportIf you are running in DataBrew AWS Compute Resources (EC2, Lambda, ECS) Please use IAM Role-Based Access
Run this in RStudio:
library(cloudbrewr)
cloudbrewr::aws_login(
role_name = 'SSO_ROLE_NAME_FROM_WEB_PORTAL',
profile_name = 'AWS_PROFILE_NAME_OF_CHOICE'
)-
Pass in your role name this will be arbitrary based on user. For example,
bohemia-box-team-s3-rolefor box access orbohemia-ento-team-s3-rolefor ento folder access -
After you run this code snippet, you will be redirected to our AWS SSO Portal in your default browser
-
Input username and password that was created from the email invitation
-
Click
Allowon the following window prompt -
Voila! Your R session is now connected to AWS
-
This command will create a profile in
~/.aws/config, where AWS will use it as a reference for future authentication
An example of how you can get an S3 object and read it as a .csv file
object <- cloudbrewr::aws_s3_get_object(
bucket = 'bohemia-datalake',
key = 'bohemia-minicensus/clean_minicensus_main.csv',
)
# read based on object metadata
read.csv(object$file_path)To read more about other features, check out our team Vignettes here in Github Pages
-
Check ~/.aws/config and see whether you do not have the same profile name with different AWS Account set up, as this package will append to your config file
-
Check your AWS access by using the STS service
To run STS in R:
library(paws)
sts <- paws::sts()
sts$get_caller_identity()Clarify that your account is DataBrew AWS Account 354598940118
- For further issues, post to Github Issues
Although this is an edge-case issue, this issue can happen due to the improper aws configure process, where AWS profile information parsed by this package is not written to .aws/config file. Thus user will be required to do manual setup by creating both the folder and file in their home directory.
- In the home directory, create a folder named
.awsand a create an empty text file nameconfigunder it - Add this SSO settings, based on your parameter into the
configfile
[profile my-dev-profile]
sso_start_url = https://my-sso-portal.awsapps.com/start
sso_region = us-east-1
sso_account_id = 123456789011
sso_role_name = readOnly
region = us-west-2
output = json
- Restart everything (R, Command Prompt) and log in as usual
In-Dev by atediarjo@gmail.com{.email}