- In class: syllabus overview, intro/transcription exercise, Voyant
-
Before class:
- Read: Farhad Manjoo, "How Do You Know a Human Wrote This?"
- Read: Emily M. Bender and Timnit Gebru et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”
- Spend at least 30 minutes playing AI Dungeon
-
In class: Discussion of readings
- Before class:
- HW0 due: Video intro (on Canvas)
- In class: GPT-2 exercise; intro to Python, Colab
- Before class:
- HW1 due: Strings, lists, and dictionaries
- In class: web scraping and HTML parsing using BeautifulSoup
- Before class:
- Read: Xavier Adam, “An Illustrated Introduction to APIs” and “API Whispering 101”
- HW2 due: APIs
- In class: Scraping song lyrics using the Genius API
- Before class:
- Read: David Zentgraf, "What Every Programmer Absolutely, Positively Needs to Know about Encodings and Character Sets to Work with Text"; Aditya Mukerjee, “I Can Text You A Pile of Poo But I Can’t Write My Name”
- Optional but interesting: Miriam Sweeney and Kelsea Whaley, “Technically White: Emoji Skin-Tone Modifiers as American Technoculture”
- In class: text parsing and regex with song lyrics
- Before class:
- Watch: Andrew Norman Wilson, "Workers Leaving the Googleplex"
- Optional (will be covered in class): Lilly Irani, “Justice for ‘Data Janitors’”; Ishan Misra et al., "Seeing Through the Human Reporting Bias"
- Quiz 1 due: Creating a dataset using an API
- In class: Discussion of readings
- Before class:
- Read: Ethan Reed, “Poems with Pattern and VADER, Part 1: Quincy Troupe and Part 2: Nikki Giovanni"; Maria Antoniak et al, “Narrative Paths and Negotiation of Power in Birth Stories”
- In class: sentiment analysis
-
Before class:
- Read: Leonardo Nicoletti and Sahiti Sarva, “When Women Make Headlines”; Maarten Sap et al., “Connotation Frames of Power and Agency in Modern Films”
-
In class: word counts, n-grams, lexicons
- Before class:
- Read: Matt Daniels, “The Language of Hip Hop”; Sara Key, “Yelp Reviewers’ Authenticity Fetish is White Supremacy in Action”
- HW3 due: NLP 101
- In class: intro to scikit-learn and TF-IDF
- Quiz 2 due: Sentiment analysis of Yelp reviews
- Before class:
- Read: Catherine D’Ignazio and Lauren Klein, “The Numbers Don’t Speak for Themselves,” from Data Feminism; Timnit Gebru et al., “Datasheets for Datasets”
- In class: Discussion of data and context, final project brainstorming session (datasets)
- Before class:
- Read: Lucy Li and David Bamman, “Gender and Representation Bias in GPT-3 Generated Stories”; Richard Jean So, “Consecration: The Canon and Racial Inequality,” from Redlining Culture (Canvas)
- In class: topic modeling
- Before class:
- Optional (will be discussed in class): Lauren Klein and Sandeep Soni, “How Words Lead to Justice”; Laura K. Nelson, “Leveraging the Alignment Between Machine Learning and Intersectionality” (Canvas)
- HW4 due: word embeddings
- In class: word embeddings, discussion of papers
- Before class:
- Quiz 3 due: Exploratory research exercise
- In class: Pandas, paper catchup, more project brainstorming (research questions)
- Before class:
- Read: Lucy Li and David Bamman, “Characterizing English Variation across Social Media Communities with BERT”
- In class: Guest lecture, Lucy Li, UC Berkeley
- Before class:
- Final project prep (FPP) #1 due: Datasheet
- In class: more project brainstorming (methods)
- Before class:
- No reading or homework for this class meeting; start working on your project proposals!
- In class: classification
- Final project prep (FPP) #2 due: Project proposal
- Before class:
- Read: Catherine D’Ignazio et al., “Feminicide and Machine Learning”; Terra Blevins et al., “Automatically Processing Tweets from Gang-Involved Youth: Towards Detecting Loss and Aggression”; Dan Sinykin, “How Capitalism Changed American Literature”
- In class: discussion of papers
- Before class:
- Read: Ben Schmidt, "Genre, Manifolds, and AI"; Matthew Wilkens, "Genre, Computation, and the Varieties of 20th Century U.S. Fiction" (Canvas)
- In class: clustering
- Before class:
- Revisit Li and Bamman (from 10/13 class meeting); Hoyt Long, “Learning to Live with Machine Translation” (PDF); Suchin Gururangan et al., “Whose Language Counts as High Quality?”
- In class: sentiment analysis with BERT and next sentence prediction
-
Before class:
- FPP #3 due: Final project first pass
-
In class: classification with BERT
This syllabus draws from previous iterations of QTM 340 taught by myself and Dan Sinykin. It also incorporates materials and resources developed by Melanie Walsh, Jinho Choi, Alison Parrish, David Mimno, David Bamman, Ryan Cordell, and Ben Schmidt, as well as suggestions and other input from Heather Froehlich, Ted Underwood, Jacob Eisenstein, Jim Casey, Taylor Arnold, Lauren Tilton, Lisa Rhody, Eileen Clancy, and the Colored Conventions Project Team.