Xinyan Yang (xy975)
Wenjie Sun (ws854)
This python code provides:
- A code to extract information from the British National Corpus (BNC)
- A code to translate the .xml file to .txt for the BNC
- A code to move all the .txt file into one single folder for the BNC
- A code to move all the .txt file into one single folder for the Open American National Corpus (OANC)
This R code provides:
- A code to intergrate the BNCmeta data with the BNC
- A code to filter desired the corpus for BNC
- A code to import the corpus for OANC
- A code to covert both corpus into dfm
- A code to calculate the FRE score and plot it
- A code to use bag-of-words frame work to compute the weights of each features in both corpus
- A code to analyze the tokens that were not included in the Dale-Chall's Easy Word List
This R code provides:
- A code to analyze the bi-grams features and its findings
- A code to generate the csv file of bigrams that has different frequency in both languages (full_df_bi.csv)