diff --git a/Chapter_1/01-Chapter1.Rmd b/Chapter_1/01-Chapter1.Rmd
index e1af6d7..62499f5 100644
--- a/Chapter_1/01-Chapter1.Rmd
+++ b/Chapter_1/01-Chapter1.Rmd
@@ -754,7 +754,430 @@ Test Your Knowledge
Try creating your own Github profile, set up a practice repo with subfolders, and a detailed READ.md file paralleling the suggested formatting and content detailed above for your own data analyses!
:::
-# 1.4 Data Wrangling in Excel
+# 1.4 Cloud Computing
+This training module was developed by Alexis Payton, Jessie Chappel, Timothy M. Weigand, and Julia E. Rager.
+
+All input files (script, data, and figures) can be downloaded from the [UNC-SRP TAME2 GitHub website](https://github.com/UNCSRP/TAME2). Additional training materials and access to accounts with $500 credit can be found though the [NIH Cloud Lab](https://cloud.nih.gov/resources/cloudlab/).
+
+## Introduction to Training Module
+
+Traditionally, computing operations (including data analyses and visualizations) have been performed using on-premises infrastructure on local computers. This approach requires organizations to continuously update software, manage data privacy and storage, and pay for hardware and platforms regardless of whether demand fluctuates. Gaining in popularity over recent years, cloud computing provides a flexible and scalable solution to many of these challenges.
+
+At the core of cloud computing are two fundamental components: **storage** and **compute**. Cloud storage allows organizations to save and retrieve data remotely, eliminating the need for on-premises hardware. Major cloud providers offer scalable object storage solutions (AWS S3, Google Cloud Storage, and Azure Blob Storage) which provide durable and secure storage for datasets, models, and results of any size. For compute, cloud platforms provide virtual machine services that allow users to run analyses on demand, provisioning servers with customizable CPU, memory, and storage configurations and paying only for the resources they use. The most widely used of these services include AWS EC2 (Elastic Compute Cloud), Google Compute Engine, and Azure Virtual Machines. For data science and machine learning workflows specifically, managed cloud environments such as AWS SageMaker, Google Vertex AI, and Azure Machine Learning build on these capabilities by offering fully integrated platforms for building, training, and deploying machine learning models, removing much of the overhead of configuring and managing underlying infrastructure.
+
+In this module, we'll explore the following concepts:
+
++ Deployment Models
++ Creating a Cloud Account
++ Storing Data in the Cloud
++ Computing in the Cloud
+
+## Introduction to Cloud Computing
+
+**Cloud computing** refers to the on-demand delivery of computing services over the internet, often with pay-as-you-go pricing. Common examples of cloud-based services that you likely use in your daily life include databases (e.g., PubMed), web servers (e.g., Apache), storage (e.g., iCloud), email (e.g., Gmail), software (e.g., Microsoft 365), analytics (e.g., Ingenuity Pathway Analysis), and artificial intelligence (e.g., ChatGPT).
+
+Some key benefits of cloud computing are:
+
++ It **defers cost** by shifting the burden of building, maintaining, and powering servers to a third-party cloud provider (at least for public and hybrid clouds).
++ It **increases reliability** in the event of a disaster or system failure. Cloud providers typically maintain multiple redundant servers so that if one becomes compromised, others can serve as backups. Providers also employ staff who are well-versed in the ever-changing landscape of cloud security and infrastructure.
++ It **improves scalability** by allowing customers to use (and pay for) only the infrastructure their project requires. Many providers let you configure the number of virtual machines, computing power, and memory to match your needs.
++ It **enhances collaboration** by enabling teams to access and work on shared resources virtually, regardless of location.
+
+Traditionally, these resources were stored **on premises**, meaning infrastructure and data were maintained locally, either on a personal computer or entirely within an organization's own facilities, rather than accessed through the cloud. The following section describes several **deployment models** that outline how computing resources can be divided between on-premises and cloud environments.
+
+
+## Deployment Models
+Deployment models are paradigms that dictate the accessiblity, management and ownership of the infrastructure. Three commonly used models are described below:
+
+1. A **Public Cloud** is when a cloud provider makes computing resources available over the public internet. The public cloud provider is responsible for owning, managing, and maintaining the infrastructure necessary to run these services. Some public clouds are completely free, while others operate on a pay-as-you-go model, offering flexibility and scalability based on the needs of the project. Well-known public cloud providers include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud.
+
+2. A **Private Cloud** is when both the software and hardware are dedicated to a single customer or organization. The infrastructure can either be housed on-site or managed through a third-party cloud provider, but access remains restricted to that organization.
+
+3. A **Hybrid Cloud** combines features of both public and private clouds. This model is well-suited for organizations such as federal agencies that require the protection of sensitive data within a private cloud, while still being able to use public cloud resources to share non-sensitive data with the public or partner organizations.
+
+```{r 01-Chapter1-1-bis, echo=FALSE, out.width = "55%", fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/deployment_models.png")
+```
+
+We've covered the basics of cloud computing and various applications that provide this service, but [Cloud Computing in 6 Minutes](https://www.youtube.com/watch?v=M988_fsOSWo) provides great visuals that further describe these topics. From here, we'll focus more on public clouds and applications that support them.
+
+
+
+## Public Cloud Providers
+
+Amazon Web Services (AWS) leads the cloud computing market, offers the widest range of services, and has strong interoperability with third-party organizations. Other popular public cloud providers include Microsoft Azure and Google Cloud. The National Institutes of Health (NIH) offers a training program to help researchers integrate cloud computing technologies into their work through the [NIH Cloud Lab](https://cloud.nih.gov/resources/cloudlab/), which provides guided resources for AWS, Microsoft Azure, and Google Cloud.
+
+
+Another public cloud option worth noting is [Binder](https://jupyter.org/binder), which creates a coding environment directly from a public GitHub repository. This allows users to interact with Jupyter notebooks and datasets housed in that repository without needing to install any software locally. Binder is particularly useful for teaching, as students and researchers can launch a fully configured environment — with data and packages already loaded — and avoid the operating system conflicts that often arise when setting up software on local computers.
+
+
+```{r 01-Chapter1-2-bis, echo=FALSE, out.width = "65%", fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/binder.png")
+```
+
+Choosing the right cloud provider depends largely on the needs of your project and organization. For a detailed comparison of features, pricing, and use cases, see [AWS vs Azure vs Google: Cloud Services Comparison](https://www.varonis.com/blog/aws-vs-azure-vs-google). For the purposes of this module, we will focus on AWS.
+
+
+
+
+## Creating a Cloud Account
+
+For this tutorial, we will use AWS for our cloud provider. To get started with AWS, head to [aws.amazon.com](https://aws.amazon.com) and click **Create an AWS Account**. You will be prompted to enter your email address, create a password, and provide billing information. AWS offers a [Free Tier](https://aws.amazon.com/free/) that gives new users access to a range of services at no cost for the first 12 months, which is a good way to explore the platform before committing to paid resources. Once your account is created and verified, you will have access to the AWS Management Console, where you can launch and manage cloud services. Alternatively, the NIH Strides initiative provides access to [NIH Cloud Lab](https://cloud.nih.gov/resources/cloudlab/) accounts that come with $500 in credits to explore clouding computing.
+
+## Identity and Access Management (IAM)
+
+Identity and Access Management (IAM) is the AWS service that controls who and what has
+access to your AWS resources. For a full overview, see the
+[AWS IAM Getting Started Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started.html).
+
+As a personal AWS account holder, you have full access to IAM. The two things you will
+interact with most are **Roles** and **Policies**:
+
+- **Roles** are assigned to AWS services (such as a compute instance through EC2) to grant them permission
+ to interact with other AWS services.
+- **Policies** define what actions are allowed. You attach policies to roles to grant
+ specific permissions (for example, allowing an EC2 compute instance to read and write to storage via S3).
+
+For this tutorial, you will need to create an IAM role for your EC2 instance so that it
+can access S3. To do this, navigate to **IAM → Roles → Create Role**. Select
+**AWS Service** as the trusted entity and choose **EC2** as the use case. On the next
+screen, search for and attach the **AmazonS3FullAccess** policy, give your role a name
+(such as `ec2-s3-access`), and click **Create Role**. You can then attach this role to
+any EC2 instance at launch, or after the fact via
+**EC2 → Actions → Security → Modify IAM Role**.
+
+For more information see:
+
+- [AWS IAM Documentation](https://docs.aws.amazon.com/iam/)
+- [IAM Roles for EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html)
+- [AWS Managed Policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html)
+
+## Storing Data in the Cloud
+
+As the computing will be done on the cloud, the first step is to upload all necessary data to the cloud. When working in the cloud, you rent access to a remote computer (known as an instance) that runs on servers maintained by the cloud provider. Much like a personal computer, this instance has its own processing power, memory, and storage, but it exists virtually and can be started, stopped, and scaled up or down as needed. There are two types of storage associated with cloud instances: permanent storage and temporary storage. Permanent storage, via **Amazon Simple Storage Service (S3)**, keeps your data safe and accessible regardless of whether your instance is running or not, making it ideal for long-term storage of datasets, results, and backups. Temporary storage, such as **Amazon Elastic Block Store (EBS)** or local instance storage, is only available while your instance is active and is used during computation to hold intermediate files and data that require fast read and write access. Note that that Amazon Elastic Block Store (EBS) is not set up independently. EBS volumes are virtual hard drives that are attached directly to compute instances, meaning they are configured as part of the EC2 instance setup process rather than through S3 or any separate service and will be covered in the next section.
+
+### 1. Creating a S3 Bucket
+
+Before uploading any data, you will need to create an S3 bucket. A bucket is simply a container in the cloud where your files and datasets will be stored, similar to a folder on your personal computer. To get started, log in to your AWS account and navigate to the S3 service, which can be found by searching "S3" in the AWS search bar at the top of the screen.
+
+```{r 01-Chapter1-3-bis, echo=FALSE, out.width = "100%",fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/s3_bucket.png")
+```
+
+Once on the S3 dashboard, click the Create bucket button to begin. You will be presented with several configuration options. For most users, the following settings are recommended:
+
+**General Configuration** Select the region closest to your location as the AWS Region. Under Bucket Type, select **General purpose**, which is recommended for most use cases. For the **Bucket namespace**, select **Account Regional namespace (recommended)**, which ensures your bucket name is unique to your account. Finally, enter a **Bucket name prefix**, keeping it short and descriptive such as "myproject-data".
+
+**Object Ownership** Leave this set to **ACLs disabled (recommended)**. This means your account will own all objects stored in the bucket and access is managed through policies rather than individual file permissions.
+
+**Block Public Access** Leave **Block all public access** turned on. This ensures your data remains private and is the recommended setting for research and analysis purposes.
+
+**Bucket Versioning** This can be left as **Disabled** for most use cases. Enabling versioning allows AWS to keep a history of changes to your files, which can be useful but will increase storage costs over time.
+
+**Default Encryption** Leave this set to the default **Server-side encryption with Amazon S3 managed keys (SSE-S3)**. This ensures your data is automatically encrypted when stored, adding a layer of security without any additional configuration required.
+
+Once you have reviewed these settings, scroll to the bottom of the page and click **Create bucket**. Your bucket will now appear in your S3 dashboard and is ready to receive data.
+
+```{r 01-Chapter1-4-bis, echo=FALSE, out.width = "100%",fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/aws_create_bucket.png")
+```
+
+
+
+To find your bucket at any time, navigate back to the S3 service and select Buckets from the left-hand menu, where all of your buckets will be listed by name.
+
+```{r 01-Chapter1-5-bis, echo=FALSE, out.width = "100%",fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/aws_bucket.png")
+```
+
+
+### 2. Uploading to an S3 Bucket
+
+Data can be uploaded to or downloaded from S3 through the AWS Management Console or programmatically via the AWS Command Line Interface (CLI). Importantly, data stored in S3 persists independently of any virtual machine, meaning that even if you stop or terminate your compute instance, your data will remain safely stored in your account.
+
+As a cost reference, AWS S3 storage is priced at approximately $0.023 per GB per month for standard storage (as of March 13, 2026). For example, storing a 10 GB dataset for one month would cost roughly $0.23. Data transfer and request fees may also apply, so it is worth reviewing the [AWS S3 pricing page](https://aws.amazon.com/s3/pricing/) before uploading large datasets.
+
+**Uploading via the AWS Management Console**
+
+The AWS Management Console provides a straightforward point-and-click interface for uploading files and is recommended for users who are new to cloud storage or are uploading a small number of files.
+
+1. Log in to your AWS account and navigate to the S3 service by searching "S3" in the AWS search bar.
+2. Select **Buckets** from the left hand menu and click on the name of the bucket you created in the previous step.
+3. Once inside your bucket, click the **Upload** button.
+4. Click **Add files** to select individual files from your computer, or **Add folder** to upload an entire folder and its contents. You can also drag and drop files directly into the upload window.
+5. Once your files have been added, a list of the selected items will appear on the screen. Review this list to confirm the correct files have been selected.
+6. Leave all other settings as their defaults unless you have specific access or encryption requirements.
+7. Click **Upload** at the bottom of the page. A progress bar will appear showing the status of your upload. Once complete, a confirmation message will be displayed and your files will be visible inside the bucket.
+
+
+**Uploading via the Command Line Interface (CLI)**
+
+For users who are comfortable working with a command line terminal, the AWS CLI provides a faster and more flexible method for uploading data, particularly for large datasets or when automating workflows. There are two ways to use the AWS CLI: through **AWS CloudShell**, a browser-based terminal built into the AWS Management Console that requires no setup, or through your **local terminal**, which requires installation and credential configuration.
+
+**Option 1: AWS CloudShell (Recommended for Organizational Accounts)**
+
+If your AWS account is managed by an organization, CloudShell is the simplest option as it comes pre-configured with your credentials, requiring no installation or setup.
+
+To open CloudShell, log in to the AWS Management Console and click the **CloudShell icon** in the toolbar at the top of the screen (it looks like a small terminal window). A terminal will open at the bottom of your screen after a few moments.
+
+Before uploading files from your local computer, you must first transfer them into the CloudShell environment. To do this, click the **Actions** menu in the top right corner of the CloudShell panel and select **Upload file**. Browse to the file on your computer and confirm the upload. The file will appear in the home directory of your CloudShell session and can then be moved to S3.
+
+To upload a single file, use the following command, replacing the file name and bucket name with your own:
+```bash
+aws s3 cp filename.csv s3://your-bucket-name/
+```
+
+To upload an entire folder and all of its contents, use the following command:
+```bash
+aws s3 cp folder-name/ s3://your-bucket-name/folder-name/ --recursive
+```
+
+To verify that your files have been uploaded successfully, you can list the contents of your bucket with the following command:
+```bash
+aws s3 ls s3://your-bucket-name/
+```
+
+**Option 2: Local Terminal**
+
+If you prefer to upload files directly from your local computer without going through CloudShell, you can install the AWS CLI and configure it with your credentials. Note that if your account is managed by an organization, you may need to contact your IT or cloud administrator to obtain credentials before proceeding.
+
+First, install the AWS CLI by following the instructions on the [AWS CLI documentation page](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) for your operating system. Once installed, open your terminal and run the following command to configure your credentials:
+```bash
+aws configure
+```
+
+You will be prompted to enter four pieces of information:
+```
+AWS Access Key ID [None]: YOUR_ACCESS_KEY
+AWS Secret Access Key [None]: YOUR_SECRET_KEY
+Default region name [None]: YOUR_REGION
+Default output format [None]: json
+```
+
+Your Access Key ID and Secret Access Key can be generated by logging in to the AWS Management Console, clicking your account name in the top right corner, and selecting **Security Credentials**. Scroll down to the **Access keys** section and click **Create access key**. Copy both values and enter them when prompted by `aws configure`.
+
+> **Important:** This is the only time AWS will display your Secret Access Key. If you lose it, you will need to create a new one. Never share your credentials or commit them to a code repository.
+
+Once configured, your credentials are saved locally and will be used automatically for all future AWS CLI commands. To verify the configuration is working correctly, run the following command, which will list all S3 buckets in your account:
+```bash
+aws s3 ls
+```
+
+You can then upload files using the same commands as in Option 1, replacing the file name and bucket name with your own:
+```bash
+aws s3 cp /path/to/your/file.csv s3://your-bucket-name/
+```
+
+### 3. Downloading from an S3 Bucket
+
+Data can be downloaded from S3 back to your local computer or to a cloud instance at any time. To download a file via the Console, navigate to your bucket, check the box next to the file you wish to download, and select **Download** from the **Actions** menu.
+
+To download a file to your CloudShell environment, use the following command, then click the **Actions** menu in the CloudShell panel and select **Download file** to save it to your local computer:
+```bash
+aws s3 cp s3://your-bucket-name/file.csv ./
+```
+
+To download a file directly to your local machine using the local CLI, use the following command:
+```bash
+aws s3 cp s3://your-bucket-name/file.csv /path/to/local/destination/
+```
+
+### 4. Addiitonal Information for S3
+Additional infromation can be found at: https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html
+
+## Computing in the Cloud
+
+Now that your data is stored in the cloud, the next step is to select and launch a virtual machine using **Elastic Compute Cloud (EC2)**. EC2 allows you to rent virtual servers, referred to as instances, with configurable amounts of CPU, memory, storage, and networking capacity. You can choose from a wide range of instance types depending on the computational demands of your analysis. For example, a general-purpose instance (e.g., `t3.medium`) is appropriate for lightweight tasks, while a compute-optimized instance (e.g., `c5.4xlarge`) is better suited for parallelized or memory-intensive analyses. AWS provides a [guided walkthrough](https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#GetStarted) that covers the available options.
+
+### 1. Creating an EC2 Instance
+
+**Step 1: Navigate to EC2**
+Log in to the AWS Management Console and search for "EC2" in the search bar at the top of the screen. Click on **EC2** in the results to open the EC2 dashboard. From here, click the **Launch instance** button to begin the setup process.
+
+**Step 2: Name Your Instance**
+Enter a descriptive name for your instance in the **Name** field at the top of the page, such as "my-analysis-instance". This helps you identify it later if you have multiple instances running.
+
+**Step 3: Choose an Amazon Machine Image (AMI)**
+An Amazon Machine Image (AMI) is the operating system and software that will be pre-installed on your instance. For general data analysis, select **Amazon Linux**, which is free tier eligible and comes with many common tools pre-installed. A range of other operating systems are also available including Windows, Ubuntu, and macOS, and can be browsed in the AMI selection list.
+
+**Step 4: Choose an Instance Type**
+The instance type determines how much processing power and memory your virtual machine will have. For light data analysis tasks, **t2.micro** or **t3.micro** are free tier eligible and sufficient for getting started. For more demanding analyses involving larger datasets, consider a **t3.medium** or **t3.large**, which provide more memory and processing power at a modest hourly cost. A full list of instance types and their pricing can be found on the [AWS EC2 pricing page](https://aws.amazon.com/ec2/pricing/).
+
+**Step 5: Configure a Key Pair**
+A key pair is used to securely connect to your instance from your terminal. Click **Create new key pair**, enter a name for your key pair, leave the settings as their defaults, and click **Create key pair**. A file with a **pem** extension will automatically download to your computer. Store this file somewhere safe as it cannot be downloaded again and will be needed every time you connect to your instance.
+
+**Step 6: Configure Storage**
+Under the **Configure storage** section, you will see that a default EBS volume has already been attached to your instance. The default size is 8 GB, which is sufficient for most small analyses. If you anticipate working with larger datasets directly on the instance, increase this value by typing a new number into the size field.
+
+**Step 7: Assign an IAM Role**
+Before launching, it is important to assign an IAM role to your instance. Without one, your instance will have no permissions to interact with any other AWS service — meaning you will not be able to read from or write to S3, or use any other AWS service from within the instance. Note that an IAM role has nothing to do with connecting to your instance via SSH; you can always SSH in with your .pem key regardless of whether a role is attached. The IAM role purely controls what AWS services the instance itself can access once it is running.
+
+To assign a role, expand the **Advanced details** section and find the **IAM instance profile** dropdown. Select the role you created earlier (such as `ec2-s3-access`). If you have not yet created a role, see the [IAM section](#identity-and-access-management-iam) of this guide before proceeding.
+
+**Step 8: Launch the Instance**
+Leave all remaining settings as their defaults and click **Launch instance**. AWS will take a moment to start your instance. You can monitor its status from the EC2 dashboard — once the **Instance state** shows **Running** and the **Status check** shows **2/2 checks passed**, your instance is ready to connect to.
+
+```{r 01-Chapter1-6-bis, echo=FALSE, out.width = "100%",fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/launch_ec2.png")
+```
+
+
+### 2. Connecting to an EC2 Instance
+
+You can monitor your instance's status by navigating back to the EC2 dashboard and selecting
+**Instances** from the left hand menu. Once the **Instance state** column shows **Running**,
+your instance is ready to use.
+
+There are several ways to connect to an EC2 instance. The right method depends on your
+account type and setup:
+
+---
+
+**Option 1: SSH**
+
+SSH (Secure Shell) allows you to connect to your instance from your local terminal and
+control it as if you were typing directly on it. This is the most common method for personal
+AWS accounts and requires your **.pem** key file.
+
+Open your terminal and navigate to the folder where your **.pem** key file was saved. First,
+set the correct permissions on the key file:
+```bash
+chmod 400 your-key-name.pem
+```
+
+Then connect using the following command, replacing the IP address with your instance's
+**Public IPv4 address**, which can be found by selecting your instance in the EC2 dashboard:
+```bash
+ssh -i your-key-name.pem ec2-user@your-public-ip-address
+```
+
+> **Note:** If you are using an Ubuntu AMI, replace `ec2-user` with `ubuntu`. For other
+> AMIs, the default username may vary — check the AMI documentation if you are unsure.
+
+---
+
+**Option 2: EC2 Instance Connect (Browser-Based)**
+
+EC2 Instance Connect allows you to connect directly from your browser without needing a
+.pem key file. This is a convenient option for quick access or if you have misplaced your
+key file.
+
+To use it, select your instance in the EC2 dashboard, click the **Connect** button at the
+top of the page, select the **EC2 Instance Connect** tab, and click **Connect**. A terminal
+will open in your browser.
+
+> **Note:** EC2 Instance Connect requires your instance to have a public IP address and the
+> correct security group rules allowing inbound SSH traffic on port 22.
+
+---
+
+**Option 3: AWS Systems Manager Session Manager**
+
+Session Manager allows you to connect to your instance through the AWS Management Console
+without needing SSH, a public IP address, or an open port. This is the recommended method
+for **federated or organizational accounts** (such as university or institutional AWS
+accounts) where SSH access may be restricted by VPN or firewall rules.
+
+To connect via Session Manager:
+
+1. Navigate to **AWS Systems Manager** by searching for it in the AWS search bar.
+2. Select **Session Manager** from the left hand menu.
+3. Click **Start session**.
+4. Select your instance from the list and click **Start session**. A terminal will open
+ in your browser.
+
+Alternatively, you can access Session Manager directly from the EC2 dashboard by selecting
+your instance, clicking the **Connect** button, and choosing the **Session Manager** tab.
+
+> **Note:** Session Manager requires your EC2 instance to have an IAM role with the
+> **AmazonSSMManagedInstanceCore** policy attached, and the SSM agent must be running on
+> the instance. Amazon Linux 2 and Amazon Linux 2023 AMIs come with the SSM agent
+> pre-installed.
+
+---
+
+Once connected by any of the above methods, you will see a command prompt indicating that you are now working inside your cloud instance and can begin running your analysis.
+
+### 3. Running in an EC2 Instance
+
+TODO: Installing packlages for both R and Python, Access Your S3 Data, Run Analysis, Save Your Results Back to S3
+
+### 4. Creating a SageMaker Notebook
+
+**Amazon SageMaker** provides a fully managed environment for data analysis and machine learning, removing the need to manually configure and connect to a virtual machine. Rather than working in a terminal, SageMaker allows you to work in a **Jupyter Notebook**, a familiar browser-based interface where you can write and run code interactively alongside text, figures, and outputs. This makes SageMaker particularly well suited for exploratory data analysis and iterative workflows however the evvironment is restricted to Python.
+
+**Step 1: Navigate to SageMaker**
+
+Log in to the AWS Management Console and search for "SageMaker" in the search bar. Click on **Amazon SageMaker** in the results to open the SageMaker dashboard. Links to tutorials and documentation are included on this landing page.
+
+**Step 2: Open SageMaker Studio**
+
+From the landing page, click **Open** to launch SageMaker Studio Classic in a new browser tab.
+
+```{r 01-Chapter1-7-bis, echo=FALSE, out.width = "100%",fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/sagemaker_studio.png")
+```
+
+**Step 3: Create a New Notebook**
+
+Once inside SageMaker Studio, click **Notebooks** in the left hand menu which will open a new page. The select **Create notebook** and you will be redirected to a new Jupyter Notebook.
+
+```{r 01-Chapter1-8-bis, echo=FALSE, out.width = "100%",fig.align='center'}
+knitr::include_graphics("Chapter_1/Module1_4_Input/sagemaker_notebook.png")
+```
+
+**Step 4: Managing Your SageMaker Notebook**
+
+The specific properties of the notebook can be managed via the left-hand menu, represented by a series of icons. This menu provides access to several key configuration areas:
+
+- **Files & Data** View and manage the files and datasets shared with the notebook, including the ability to upload additional files from your local machine.
+- **Compute Instance** View and change the EC2 instance currently powering your notebook environment.
+- **Python Packages** Browse the Python packages currently installed in your environment, and search for and install additional packages as needed.
+
+The currently selected instance type is also displayed at the top right of the notebook interface. Clicking on it allows you to switch instances based on the demands of your analysis. Choosing the right instance is important both for performance and cost management. A full list of SageMaker instance pricing can be found on the [AWS SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/).
+
+### 4. Running a SageMaker Notebook Analysis
+
+TODO: Access Your S3 Data, Run ANalysis, Save Your Results Back to S3
+
+
+
+
+## Best Practices
+
+When working in AWS, keeping costs under control requires deliberate habits. A few key recommendations:
+
++ **Stop instances when not in use.** EC2 instances continue to accrue charges as long as they are running, even if no computation is actively taking place. Always stop or terminate your instance when you are finished working.
++ **Be strategic about data transfers.** AWS charges for data moving in and out of the cloud. Rather than uploading and downloading files repeatedly, prepare all your input data locally, upload everything at once, run your full analysis, and then download your results in a single batch.
++ **Monitor your usage.** Use the AWS Cost Explorer and set up billing alerts to avoid unexpected charges. It is easy to lose track of running instances or stored data that is no longer needed.
++ **Clean up unused resources.** Delete S3 objects, snapshots, and other resources you no longer need to avoid ongoing storage fees.
+
+
+
+## Concluding Remarks
+
+Cloud computing offers researchers a powerful and flexible alternative to traditional on-premises computing. By leveraging services like AWS S3 for storage and EC2 for computation, analyses that would be impractical on a local machine — due to data size, memory requirements, or processing time — become manageable and reproducible. As cloud platforms continue to evolve and become more accessible, familiarity with these tools is increasingly valuable for researchers across disciplines. We encourage you to explore the [NIH Cloud Lab](https://cloud.nih.gov/resources/cloudlab/) and the broader AWS documentation to continue building these skills beyond this module.
+
+
+
+### Additional Resources
+
++ [Cloud Computing](https://www.ibm.com/think/topics/cloud-computing)
++ [On Premises vs Cloud](https://www.cleo.com/blog/knowledge-base-on-premise-vs-cloud)
++ [Domain Specfic Tutorials through NIH](https://github.com/STRIDES/NIHCloudLabAWS/tree/main/?tab=readme-ov-file#bio)
+
+
+
+
+
+:::tyk
+Try creating your own AWS profile, choose a server with the right CPU...
+:::
+
+
+# 1.5 Data Wrangling in Excel
This training module was developed by Alexis Payton, Elise Hickman, and Julia E. Rager.
@@ -787,7 +1210,7 @@ Open Microsoft Excel and prior to **ANY** edits, click “File” --> “Save As
Let's first view what the dataset currently looks like:
```{r 01-Chapter1-28, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image1.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image1.png")
```
@@ -820,7 +1243,7 @@ Before we can begin organizing the data, we need to remove the entirely blank ro
After removing the blank rows, the file should look like the screenshot below.
```{r 01-Chapter1-29, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image2.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image2.png")
```
@@ -845,7 +1268,7 @@ Metadata explains what each column represents in the dataset. Metadata is now a
+ Within the metadata tab, create three columns: the first, "Column Identifier", contains each of the column names found in the data tab; the second, "Code", contains the individual variable/ abbreviation for each column identifier; the third, "Description" contains additional information and definitions for abbreviations.
```{r 01-Chapter1-30, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image3.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image3.png")
```
@@ -869,7 +1292,7 @@ For this dataset, the following variables were edited:
Once the categorical data have been abbreviated, add those abbreviations to the metadata and describe what they symbolize.
```{r 01-Chapter1-31, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image4.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image4.png")
```
@@ -889,7 +1312,7 @@ Analysis-specific subjects are created to give an ordinal subject number to each
R reads in spaces between words as periods, therefore it’s common practice to replace spaces with underscores when doing data analysis in R. Avoid using dashes in column names or anywhere else in the dataset.
```{r 01-Chapter1-32, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image5.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image5.png")
```
@@ -906,7 +1329,7 @@ These data will likely be shared with collaborators, uploaded onto data depositi
For example, here, it is nice to bold column identifiers and center the data, as shown below:
```{r 01-Chapter1-33, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image6.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image6.png")
```
@@ -921,7 +1344,7 @@ The subject identifier column labeled, “Group_Subject_No”, combines the subj
+ Label the second column "Group_Subject_No" and delete the first column.
```{r 01-Chapter1-34, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image7.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image7.png")
```
## Separate Subject Demographic Data from Experimental Measurements
@@ -936,14 +1359,14 @@ A wide format contains values that **DO NOT** repeat the subject identifier colu
**Wide Format**
```{r 01-Chapter1-35, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image8.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image8.png")
```
A long format contains values that **DO** repeat the subject identifier column. For this dataset, that means a new column was created entitled "Variable" containing all the mediator names and a column entitled "Value" containing all their corresponding values. In the screenshot, an additional column, "Category", was added to help with the categorization of mediators in R analyses.
**Long Format**
```{r 01-Chapter1-36, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image9.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image9.png")
```
The reason a long format is preferred is because it makes visualizations and statistical analyses more efficient in R. In the long format, we were able to add a column entitled "Category" to categorize the mediators into "AL Biomarker" or "Cytokine" allowing us to more easily subset the mediators in R. Read more about wide and long formats [here](https://towardsdatascience.com/long-and-wide-formats-in-data-explained-e48d7c9a06cb).
@@ -958,23 +1381,23 @@ To do this, a power query in Excel will be used. Note: If you are working on a M
3. It will ask you to choose a data source. Click "Blank table" in the bottom row.
4. Paste the data into the table. (Hint: Use the shortcut Ctrl + "v"). At this point, your screen should look like the screenshot below.
```{r 01-Chapter1-37, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image10.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image10.png")
```
5. Click "Use first row as headers" and then click "Next" in the bottom right hand corner.
6. Select all the columns with biomarker names. That should be the column "Cortisol" through the end.
```{r 01-Chapter1-38, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image11.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image11.png")
```
7. Click the "Transform" button in the upper left hand corner. Then click "Unpivot columns" in the middle of the pane. The final result should look like the sceenshot below with all the biomarkers now in one column entitled "Attribute" and their corresponding values in another column entitled "Value".
```{r 01-Chapter1-39, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image12.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image12.png")
```
8. To save this, go back to the "Home" tab and click "Close & load". You should see something similar to the screenshot below.
```{r 01-Chapter1-40, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image13.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image13.png")
```
9. In the upper right with all the shaded tables (within the "Table" tab), click the arrow to the left of the green table until you see one with no shading. Then click the table with no colors.
@@ -983,7 +1406,7 @@ knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image13.png")
We have successfully wrangled our data and the final dataset now looks like this:
```{r 01-Chapter1-41, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image14.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image14.png")
```
@@ -993,17 +1416,17 @@ A PivotTable is a tool in Excel used to summarize numerical data. It’s called
To create a PivotTable, start by selecting all of the data. (Hint: Try using the keyboard shortcut mentioned above.) Click "Insert" tab on the upper left-hand side, click "PivotTable", and click "OK". The new PivotTable should be available in a new sheet as seen in the screenshot below.
```{r 01-Chapter1-42, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image15.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image15.png")
```
A PivotTable will be constructed based on the column headers that can be dragged into the PivotTable fields located on the right-hand side. For example, what if we were interested in determining if there were differences in average expression between non-smokers and cigarette smokers in each category of biomarkers? As seen below, drag the "Group" variable under the "Rows" field and drag the "Value" variable under the "Values" field.
```{r 01-Chapter1-43, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image16.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image16.png")
```
Notice that it automatically calculates the sum of the expression values for each group. To change the function to average, click the "i" icon and select "Average". The output should mirror what's below with non-smokers having an average expression that's more than double that of cigarette smokers.
```{r 01-Chapter1-44, echo=FALSE, fig.width=4, fig.height=5, fig.align='center'}
-knitr::include_graphics("Chapter_1/Module1_4_Input/Module1_4_Image17.png")
+knitr::include_graphics("Chapter_1/Module1_5_Input/Module1_5_Image17.png")
```
diff --git a/Chapter_1/Module1_4_Input/aws_bucket.png b/Chapter_1/Module1_4_Input/aws_bucket.png
new file mode 100644
index 0000000..55d5f4c
Binary files /dev/null and b/Chapter_1/Module1_4_Input/aws_bucket.png differ
diff --git a/Chapter_1/Module1_4_Input/aws_create_bucket.png b/Chapter_1/Module1_4_Input/aws_create_bucket.png
new file mode 100644
index 0000000..8180460
Binary files /dev/null and b/Chapter_1/Module1_4_Input/aws_create_bucket.png differ
diff --git a/Chapter_1/Module1_4_Input/binder.png b/Chapter_1/Module1_4_Input/binder.png
new file mode 100644
index 0000000..396352e
Binary files /dev/null and b/Chapter_1/Module1_4_Input/binder.png differ
diff --git a/Chapter_1/Module1_4_Input/deployment_models.png b/Chapter_1/Module1_4_Input/deployment_models.png
new file mode 100644
index 0000000..538e512
Binary files /dev/null and b/Chapter_1/Module1_4_Input/deployment_models.png differ
diff --git a/Chapter_1/Module1_4_Input/launch_ec2.png b/Chapter_1/Module1_4_Input/launch_ec2.png
new file mode 100644
index 0000000..0416dd5
Binary files /dev/null and b/Chapter_1/Module1_4_Input/launch_ec2.png differ
diff --git a/Chapter_1/Module1_4_Input/s3_bucket.png b/Chapter_1/Module1_4_Input/s3_bucket.png
new file mode 100644
index 0000000..0e0be85
Binary files /dev/null and b/Chapter_1/Module1_4_Input/s3_bucket.png differ
diff --git a/Chapter_1/Module1_4_Input/sagemaker_notebook.png b/Chapter_1/Module1_4_Input/sagemaker_notebook.png
new file mode 100644
index 0000000..db52d3c
Binary files /dev/null and b/Chapter_1/Module1_4_Input/sagemaker_notebook.png differ
diff --git a/Chapter_1/Module1_4_Input/sagemaker_studio.png b/Chapter_1/Module1_4_Input/sagemaker_studio.png
new file mode 100644
index 0000000..a8ab6fe
Binary files /dev/null and b/Chapter_1/Module1_4_Input/sagemaker_studio.png differ
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image1.png b/Chapter_1/Module1_5_Input/Module1_5_Image1.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image1.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image1.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image10.png b/Chapter_1/Module1_5_Input/Module1_5_Image10.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image10.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image10.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image11.png b/Chapter_1/Module1_5_Input/Module1_5_Image11.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image11.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image11.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image12.png b/Chapter_1/Module1_5_Input/Module1_5_Image12.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image12.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image12.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image13.png b/Chapter_1/Module1_5_Input/Module1_5_Image13.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image13.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image13.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image14.png b/Chapter_1/Module1_5_Input/Module1_5_Image14.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image14.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image14.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image15.png b/Chapter_1/Module1_5_Input/Module1_5_Image15.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image15.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image15.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image16.png b/Chapter_1/Module1_5_Input/Module1_5_Image16.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image16.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image16.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image17.png b/Chapter_1/Module1_5_Input/Module1_5_Image17.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image17.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image17.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image2.png b/Chapter_1/Module1_5_Input/Module1_5_Image2.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image2.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image2.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image3.png b/Chapter_1/Module1_5_Input/Module1_5_Image3.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image3.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image3.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image4.png b/Chapter_1/Module1_5_Input/Module1_5_Image4.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image4.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image4.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image5.png b/Chapter_1/Module1_5_Input/Module1_5_Image5.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image5.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image5.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image6.png b/Chapter_1/Module1_5_Input/Module1_5_Image6.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image6.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image6.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image7.png b/Chapter_1/Module1_5_Input/Module1_5_Image7.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image7.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image7.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image8.png b/Chapter_1/Module1_5_Input/Module1_5_Image8.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image8.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image8.png
diff --git a/Chapter_1/Module1_4_Input/Module1_4_Image9.png b/Chapter_1/Module1_5_Input/Module1_5_Image9.png
similarity index 100%
rename from Chapter_1/Module1_4_Input/Module1_4_Image9.png
rename to Chapter_1/Module1_5_Input/Module1_5_Image9.png