textasdata final project

Author:

Xinyan Yang (xy975)
Wenjie Sun (ws854)

Pre-processing Data

Pre-process.ipynb

This python code provides:

A code to extract information from the British National Corpus (BNC)
A code to translate the .xml file to .txt for the BNC
A code to move all the .txt file into one single folder for the BNC
A code to move all the .txt file into one single folder for the Open American National Corpus (OANC)

Analysis 1.Rmd

This R code provides:

A code to intergrate the BNCmeta data with the BNC
A code to filter desired the corpus for BNC
A code to import the corpus for OANC
A code to covert both corpus into dfm
A code to calculate the FRE score and plot it
A code to use bag-of-words frame work to compute the weights of each features in both corpus
A code to analyze the tokens that were not included in the Dale-Chall's Easy Word List

Analysis 2.R

This R code provides:

A code to analyze the bi-grams features and its findings
A code to generate the csv file of bigrams that has different frequency in both languages (full_df_bi.csv)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.ipynb_checkpoints		.ipynb_checkpoints
csv outputs		csv outputs
Analysis 1.Rmd		Analysis 1.Rmd
Analysis 1.nb.html		Analysis 1.nb.html
Analysis 2.R		Analysis 2.R
BNC Corpus.Rmd		BNC Corpus.Rmd
British National Cropus - DataFrame.R		British National Cropus - DataFrame.R
Combined.R		Combined.R
FRE.png		FRE.png
Final V2.Rmd		Final V2.Rmd
Pre-process.ipynb		Pre-process.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

textasdata final project

Author:

Pre-processing Data

Pre-process.ipynb

Analysis 1.Rmd

Analysis 2.R

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ws854/textasdata

Folders and files

Latest commit

History

Repository files navigation

textasdata final project

Author:

Pre-processing Data

Pre-process.ipynb

Analysis 1.Rmd

Analysis 2.R

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages