Amsterdam University College -- Text Mining -- Winter/Spring 2021.
You can use the Hello World notebooks to check that everything is working.
| Week | Topic | Materials |
|---|---|---|
| 1 | Introduction and Python refresher | slides + notebooks 1, 2, 3, 4, 5 |
| 2 | Introduction to NLP and NLP pipelines | slides + notebook |
| 3 | Language modelling | slides + notebooks 1, 2 |
| 4 | Vector space semantics | slides + notebook |
| 5 | Word embeddings | slides + notebook |
| 6 | Machine learning fundamentals and PyTorch | slides + notebook |
| 7 | Text classification | |
| 8 | Advanced architectures and NER | |
| 9 | Web scraping and APIs | notebook |
| 10 | Recommender systems | slides + notebook |
| 11 | Creating annotated corpora and sentiment analysis | slides + notebook |
| 12 | Clustering and topic modelling | slides + notebook |
| 13 | Trendy research topics |
See the projects folder for info.
- Clone the repository locally:
git clone https://github.com/Giovanni1085/AUC_TMCI_2021.git - Get updates (from time to time):
git pull - Create a conda environemnt:
conda create -n myenv python=3.7 anaconda(wheremyenvis the envirnoment name) - Activate it:
conda activate myenv - Install packages (see the
requirements.txtfile), e.g.conda install pandas - Launch a Jupyter notebook:
jupyter notebook
- More on conda enviroments
- Conda cheatsheet
- Getting started with Jupyter notebooks
- On using git and GitHub for version control
Alternatively, use Binder (link above).
A more detailed guide to setup your environment, with multiple options.
- The previous-year edition of this course.
- Michael Repplinger, who ran the 2018/19 edition and Gianluca Lebani, who ran the 2017/18 edition.
- Giovanni Colavizza and Matteo Romanello, Applied Data Analysis course for the Oxford Digitial Humanities Summer School
- James Hetherington and Giovanni Colavizza, Research Software Engineering with Python
Everything in this repository which is not already attributed to someone else is released under CC BY 4.0.