**Miroo Lee (mil136@pitt.edu) 12-13-2021 **
This is Miroo Lee's project repo for Data Science (LING 2340). The goal of this project is to investigate how L2 learners' speech develop rhythmic properties of prosody by examining temporal modifications of phonetic segments as a function of lexical stress and domain-initial boundary lengthening.
The data set I started my project with comes from the PELIC speech corpus from the University of Pittsburgh. The PELIC speech corpus is a large learner corpus, and the current project examined 2-minute semi-spontaneous monologues by Korean students. You can find more information about the corpus from here.
In addition to this README file, there are four folders and eleven other files.
In the root folder:
final_report.mddescribes the results of the data analysis.README.mdis the current document you are reading.LICENSEdescribes the licensing term for the project..gitignorehas git ignored file entries.project_plan.mddescribes the initial plan for the project.project_progress.mdshows three progress reports throughout the semester.presentation.pdfis the slides of the presentation I gave at the end of the semester. This presentation only included the preliminary data analysis. More detailed results are documented infinal_report.mdsearch_wav.Rmdcontains codes for identifying wav file names by filtering L1, level, and task type.search_wav.mdsame as the above but in md file.KOR_mono.csvis an output ofsearch_wav.md. It is a list of two-minute monologue speech files of Korean speakers who were enrolled for three semesters.KOR_mono_scripts.csvis another output ofsearch_wav.md. It is a list of transcripts for the corresponding speech files.export_from_three_tires.praatis a Praat script that compiles annotated information from multiple praat textgrids to a single txt file.wordList.csvcontains a list of words found in wav_SAMPLES. The list also contains syllable structure and lexical stress information of each word.new_wordList.csvcontains a list of words found in three wav files from the speaker ea4.data_analysis.Rmdcontains codes for data cleaning & analysis.data_analysis.mdsame as the abobe but in md file.plotshas plots fromdata_analysis.Rmd.scratchpadhas codes I tried and documented for my project.wavhas 129 wav files identified onKOR_mono.csv.wav_SAMPLEShas subset of wav files which are annotated in textgrid files fromwav.