AI6 Workshop 1

Corndel - Level 6 Applied AI Engineering

AI6 Workshop 1

This is the first workshop of the Level 6 AI/ML Engineer programme. Read it together with model_card.md, risks_grid.md and lessons_learned.md. Also consider the EU AI Act Cheatsheets that we compiled for you.

Phase 1: Run the Pipeline in JupyterLab

1. Create a JupyterLab space

Access the AWS Management Console by your preferred method. For example, launch an AWS AI Cloud Sandbox from Pluralsight.

The AWS Management Console is a web-based interface that allows users to access and manage AWS services visually. It provides tools for configuring, monitoring, and deploying cloud resources without needing to use the command line.

Pluralsight is an online learning platform that specialises in cloud computing education, offering courses and hands-on labs for AWS, Azure, Google Cloud, and other technologies. You should be able to access it with the PluralSight credentials you were given when you joined the programme. If you haven't got these, or you've had trouble with the account, please let your Coach know and they'll be able to help.

Navigate to the Amazon SageMaker AI service in the AWS environment. (Note that Amazon SageMaker AI was formerly named Amazon SageMaker. The latter is still available in the search in AWS Console, but you will only receive a notice of the name change if you navigate there. You should use the former.)

Amazon SageMaker AI is a fully managed service from AWS that simplifies the process of building, training, and deploying machine learning (ML) models at scale. It provides an integrated development environment, pre-built algorithms, automated model tuning, and tools for data preparation, monitoring, and governance, making it easier for developers and data scientists to create production-ready AI solutions.

Click "Studio" under "Applications and IDEs" on the left.
Click "Open Studio", which will open SageMaker Studio in a new tab.
Select JupyterLab from the icons on the left. Create a new JupyterLab Space by clicking the "Create JupyterLab space" button.

JupyterLab is an open-source web-based interactive development environment for working with notebooks, code, and data. It supports multiple programming languages (like Python, R, and Julia) and provides a flexible interface for data science, machine learning, and scientific computing workflows.

Name your JupyterLab e.g. QuickLoan and click "Create space".
Open your JupyterLab space by clicking "Run space". This can take a minute or so. (Pay attention to the notification at the bottom of the screen, which will give you a time estimate for completion.)
Once the space is ready, click "Open JupyterLab".

2. Upload the Code (and a copy of the data)

Using the file browser on the left, upload the deploy_pipeline.py script and the src folder (with its contents) from the GitHub repository into your JupyterLab environment.
Upload a copy of the cs-training.csv file from the AI6-Workshop-1-PDE/data directory here as well, so that it's in the top-level folder (not in src but adjacent to it).

3. Create and Run a Notebook

In the JupyterLab menu, click the "Python 3 (ipykernel)" button under the "Notebook" heading in the Launcher tab, or click File -> New -> Notebook. Select the default Python 3 kernel.
A new .ipynb notebook will open. It's important that this has opened at the top level (i.e. adjacent to the src folder, not within it), otherwise subsequent commands won't work. Your folder structure should look like this:

src/
    evaluate.py
    process.py
user-default-efs/
cs-training.csv
deploy_pipeline.py
Untitled.ipynb

In the first cell of the notebook, type the following command that executes your script: !python deploy_pipeline.py. (You could alternatively run this from a new Terminal, if you're feeling adventurous!)
Click the "Run" button (a ▶ play icon) in the notebook toolbar to execute the cell.
You will see the script's output directly in the notebook. While it's still running, you can navigate to SageMaker Studio -> Pipelines, then click through to your pipeline, and finally through to a specific execution of that pipeline, or order to see the visual graph of your pipeline running.
It will take 10-15 minutes for the pipeline to complete.

Phase 2: Reflect on the Ethics of this Pipeline

Your coach will guide you in your discussion of the Ethics task related to this pipeline. You can see it running when you go back to Sagemaker Studio and click on "Pipelines" in the menu on the left. The pipeline should say "Running" for approx. 10 minutes, after which time it should say "Succeeded".

We have created for you a cheat sheet (above) of the most important ethical tasks at each stage.

By now, your pipeline has been executing for several minutes. The data has been processed, and the XGBoost model is being trained on a powerful cloud instance. Soon, a file named model.tar.gz will be created and saved to S3. While the pipeline automates the technical steps, this is the perfect time to reflect on what this model file truly represents and the responsibilities that come with it.

The Model as a Regulated Asset

A trained model file like model.tar.gz isn't just a technical asset; it's a concentration of data and decision-making logic, making it subject to numerous policies and regulations. For the QuickLoan model you are building, key considerations would include:

Data Privacy: The cs-training.csv file contains sensitive financial information. Even if personally identifiable information is removed, the model is still a derivative of this data and falls under regulations like GDPR. Policies must govern its access to ensure it cannot be reverse-engineered to reveal information about the individuals in the training set.

Intellectual Property: The trained model is a valuable corporate asset for the fictional "QuickLoan" company. The model.tar.gz file would be protected as a trade secret. Internal policies would strictly control who can access or copy this file to protect the company's investment.

Fairness and Safety: Since this is a financial model for loan applications, fairness regulations from bodies like the UK's Financial Conduct Authority (FCA) are paramount. The model you are creating cannot be an unexplainable "black box." In a real-world scenario, the company would be required to prove that the model's decisions are fair and not discriminatory based on protected characteristics.

🔎 Mini Exercise: Model.tar.gz Meets the EU AI Act

Your pipeline has created a model.tar.gz file — a compressed artefact holding the trained QuickLoan model.
Imagine you are presenting this file to a regulator or compliance officer.

Risk Category

Using the EU AI Act cheatsheet, how would you categorise this model?

Is it high-risk (e.g. access to essential services like credit scoring)? Could it even approach unacceptable risk in certain contexts?

Obligations Check

Pick three obligations from the cheatsheet (e.g. data governance, transparency, human oversight).

How would you evidence compliance for this specific file?

What documentation, testing or oversight mechanisms would you provide?

Lifecycle Thinking

Remember: the model file isn’t static. It will be versioned, deployed, monitored, and eventually decommissioned.

Using the Council of Europe Framework Convention cheatsheet, what safeguards should be applied at different lifecycle stages (e.g. testing, monitoring, documenting risks)?

⚖️ Prompt for Discussion

If this model.tar.gz were leaked, misused, or audited tomorrow, what would be the biggest ethical and legal risks for QuickLoan?
What steps could you take today to reduce those risks?

Security and Export Controls: While less likely for this specific model, advanced AI models can be classified as dual-use technology. In such cases, transferring a model.tar.gz file across international borders could be restricted under national export control laws.

As you watch your pipeline complete in the AWS console, consider how these policies would shape the way your model.tar.gz file is stored, versioned, and ultimately deployed. This intersection of technology, law, and ethics is central to the role of a modern AI Engineer.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
EU AI Cheatsheets		EU AI Cheatsheets
data		data
discussions-and-exercises		discussions-and-exercises
src		src
Ethics First AI.png		Ethics First AI.png
Ethics First AI.png:Zone.Identifier		Ethics First AI.png:Zone.Identifier
README.md		README.md
README.md:Zone.Identifier		README.md:Zone.Identifier
deploy_pipeline.py		deploy_pipeline.py
deploy_pipeline.py:Zone.Identifier		deploy_pipeline.py:Zone.Identifier
lessons_learned.md		lessons_learned.md
lessons_learned.md:Zone.Identifier		lessons_learned.md:Zone.Identifier
model_card.md		model_card.md
model_card.md:Zone.Identifier		model_card.md:Zone.Identifier
risks_grid.md		risks_grid.md
risks_grid.md:Zone.Identifier		risks_grid.md:Zone.Identifier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Corndel - Level 6 Applied AI Engineering

AI6 Workshop 1

Phase 1: Run the Pipeline in JupyterLab

1. Create a JupyterLab space

2. Upload the Code (and a copy of the data)

3. Create and Run a Notebook

Phase 2: Reflect on the Ethics of this Pipeline

The Model as a Regulated Asset

🔎 Mini Exercise: Model.tar.gz Meets the EU AI Act

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Corndel - Level 6 Applied AI Engineering

AI6 Workshop 1

Phase 1: Run the Pipeline in JupyterLab

1. Create a JupyterLab space

2. Upload the Code (and a copy of the data)

3. Create and Run a Notebook

Phase 2: Reflect on the Ethics of this Pipeline

The Model as a Regulated Asset

🔎 Mini Exercise: Model.tar.gz Meets the EU AI Act

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages