Code for lecture USW 2025
Every folder contains exercises related to the PCÜ sessions throughout the semester.
Create a venv by running python -m venv venv, activate (source venv/bin/activate on WSL/Linux)
and install all requirements for the exercises (pip install -r requirements.txt).
Activate the virtual environment on Windows with PowerShell:
.\venv\Scripts\Activate.ps1or on Windows with Command Prompt:
.\venv\Scripts\activate.batEvery exercise that uses Jupyter Notebooks requires the Jupyter server to run locally. Start the server with the following command:
jupyter notebookIn this first exercise, we develop a supervised machine learning model for spam detection, based on the following dataset: SMS SPAM Collection.
This excercise will be carried out with Jupyter Notebooks.
This exercise uses the scrapy framework. Find more information about the architecture of the framework here.
Also, in this tutorial we cover website rendering with JavaScript.
Javascript rendering requires to install a headless browser. Therefore
- Run
pip install -r requirements.txtagain - On linux:
sudo playwright install-deps(on Windows:playwright install-deps) - Install headless chromium using
playwright install chromium
Once the installation is finished, you are ready to run the code.
- Go into the directory
02_web_and_news_scraping/scrapy_tutorial - Run
scrapy crawl htw_berlinin the command prompt/bash