This repository includes code for the course project of UW CSE599J class Data Centeric Machine Learning. The topic is unsupervised domain adaptation for binary classification, under the assumption of covariate shift, on Amazon reviews dataset [Ni et. al (2019)] using transport maps learned via the machine learning architecture of normalizing flows. We use masked autoregressive flows [Papamakarios et al. (2021)] specifically to learn transport maps between source and target domain features.
Let
Text embeddings are created by:
- Finetuning a BERT model on a set of random set of 20,000 data points for an empirical risk minimization task.
- Using the penultimate layer of BERT to obtain the last hidden layer embeddings.
The code for finetuning BERT, creating splits for source and target domain, and obtaining embedding can be found in
./create_dataset.ipynb
The code for training masked autoregressive flows
./amazon.ipynb