Skip to content

jvlab/low-level_language_processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This document accompanies the code available online at http://www.github.com/jvlab/low-level_language_processing for the analysis pipeline reported in Low level language processing in severe brain injury patients (Jain, Conte, Voss, Victor, and Schiff, Brain Communications, 2023) and serves as a guide to analyze electroencephalography (EEG) data to assess the processing of language at lower levels. Specifically, this document follows steps to analyze differential phoneme class-specific responses (DPR) and tracking of the natural speech envelope (NSE). The pipeline for DPR can be extended for studying responses to syllables and words, provided the analysis involves responses with uniform time intervals. Sample analysis files for both analyses are available as makeFigures_1_2_3_HC for DPR and makeSupplementaryFigures_2_3_HC for NSE in the scripts folder. They generate the pipeline figures provided for healthy controls in the manuscript for DPR analysis and in the supplementary material for the NSE analysis.

The code was written in MATLAB 2018b, and we recommend using the same as other versions might face issues such as function deprecation and change of name of functions. Regardless of the MATLAB version, the code requires installation of the DSP toolbox in MATLAB (installation details not provided here).

There are 4 folders: sampleInputFiles contains the input files provided for demo analysis. code contains the scripts used for analysis. scripts contains the sample files and other files used by them for plotting. It also contains demo scripts for both analyses, which can run other datasets with custom arguments. eeglab contains the version of EEGLAB used for making topoplots.

The main folder has file setupPath.m, which should be run first to set up appropriate paths. After this, the user can run sample scripts makeFigures_1_2_3_HC for DPR and makeSupplementaryFigures_2_3_HC for NSE from anywhere, generating the left panels of Figures 1, 2, and 3 and Supplementary Figures 2 and 3, respectively.

The accompanying code was written by Parul Jain, who can be contacted at jainparul321@gmail.com.

1. Differential phoneme class-specific responses (DPR)

This section walks through the analysis pipeline for DPR by following the healthy control example reported in Figure 1 (left panels). The results can be generated by running the makeFigures_1_2_3_HC script.

1.1 Required files

Four text files are required for this analysis: one containing the EEG responses to the audio stimulus for a trial and three files that will be used to map the annotated phonemes to their phoneme classes for analysis.

Note: The three files used for mapping phonemes (1.1.2-1.1.4) must have a header line naming each column, the columns must be in the order provided in the sample files, a delimiter must separate each column, and all files must use the same delimiter. If your file has phonetic symbols, all three files must be saved using UTF16BE encoding required for encoding the phoneme symbols. You can save the text file with encoding in notepad by clicking “Save as” in the File menu and choosing the encoding from the drop-down list. For sublime text, choose “Save with encoding” from File menu and choose UTF-16-BE or UTF-16-BE + BOM.

1.1.1 file containing EEG responses

This file contains the timestamp, record IDs, and EEG responses from at least all the recorded channels and triggers, marking the start and end of the stimulus.

The file for the sample HC (Healthy Control) named sample_HC_data.txt is in the folder sampleInputFiles. Metadata and timestamps are edited to de-identify the dataset. Since the file was exported from XLTek, the first 15 lines contain metadata. We skip these lines and start reading from the first record on line 16. Each record has the timestamp, unique record ID, the 128 channels in the box, and the trigger marker at the end. The marker assumes the value ON for the record corresponding to the start and end of the stimulus and is OFF otherwise. The Presentation software tracks this marker, which feeds the audio to the box via a separate input. We used a 37-electrode montage to record the EEG responses, corresponding to the first 37 channels in the text file.

1.1.2 file containing phoneme start and end times

Annotation of an audio file – manual or automatic – results in a text file with the start and end times of the entities (phonemes, words, etc.) specific to the annotation software/scheme.

We used the P2FA toolkit, which annotated phonemes, words, and silence and exported the annotation as phonemeTimes.txt. We use the file and drop the extra entities when we map the phonemes to their classes.

1.1.3 file containing phoneme-to-phoneme class mapping

This file maps the annotated phonemes to their classes. By including mapping for only the entities of interest, we can drop the other entities (silence, words). The labels can be changed to accommodate the classification of interest. For example, you can classify the phonemes as vowels and consonants or study them as individual phonemes (identity mapping).

We mapped all phonemes to the five phoneme classes used for analysis in file phonemeClassMap.txt. The phonemes are classified based on the manner of articulation as approximants, fricatives, nasals, plosives, and vowels.

1.1.4 file containing phoneme class to phoneme class-ID (numeric) mapping

This file maps the phoneme classes to unique numerical values for ease of processing in the analysis. The mapping from phonemes to numbers is done in two steps to retain human-readable mappings for recordkeeping. Again, the mappings can be changed to accommodate the classification of interest. For example, each phoneme will have a unique numeric ID to study phonemes individually.

We created a file numbering the phonemes from 1 to 5 in alphabetical order in file classIDMap.txt.

1.2 Labelling phonemes with their phoneme class IDs for analysis

We generate a MATLAB array with three columns containing the start time, phoneme class ID, and stop time of the phoneme instances using the files in 1.1.2-1.1.4.

First, we create a structure with the file paths for the input file and the delimiter, in this case, tab (default value ‘\t’). The structure allows us to keep track of the parameters and files used for analysis.

Next, we pass this structure to loadAnnotationFile, which generates the desired array and its human-readable form in three steps. a. The first step reads the file containing phoneme start and end times (section 1.1.2) using the delimiter. b. Next, it reads the phoneme-to-phoneme-class mapping file (section 1.1.3) and uses it to replace the phoneme values with their phoneme class. While doing this, it discards the entities with no mapping. The resulting array has a human-readable mapping format and is returned as the second output of the function. c. Finally, it reads the file containing the phoneme class to ID mapping (section 1.1.4) and uses it to replace the phoneme class values with a unique numeric ID. The resulting array with mapped phoneme classes is the first output and is used for further analysis. All phoneme classes without any ID mapping are discarded.

In makeFigures_1_2_3_HC.m, file setupAnnotationSpecs loads the structure annotationSpecs containing the default values: for the analysis. The structure has four fields containing details of the three files and their delimiter. For the analysis done in the paper, including in Figures 1-3, 'phonemeTimes.txt' (default value of field timeMarkerFile) contains the phoneme time markers extracted by annotating the audio file (section 1.1.2), 'phonemeClassMap.txt' (default value of field entityClassMapFile) contains the mapping of phonemes to phoneme classes (section 1.1.3), 'classIDMap.txt' (default value of field entityClassIDMapFile) contains the mapping of phoneme classes to unique integer IDs (section 1.1.4), and '\t' (default value of field delimiter) is the delimiter used in the three described files. This structure is passed as input to loadAnnotationFile, which gives two outputs: structure idTaggedTimeMarkers with field data containing the phoneme class-ID and their start and end times and filed annotationSpecs with specifications for recordkeeping, and array classifiedTimeMarkers containing the same details as idTaggedTimeMarkers.data in a human-readable format with phoneme class labels instead of phoneme class IDs.

1.3 Extraction of phoneme class-specific responses

Using the array generated in section 1.2, we extract the phoneme class-specific responses from the input dataset file (section 1.1.1) in three steps.

1.3.1 Importing subject dataset into MATLAB

We first load the dataset completely, irrespective of the triggers, by using readXLTekRecordingTextFile, which takes a single input, a structure containing details about the dataset and parameters for processing. The structure contains details including the name of the file to import, the number of header lines to skip, the regex of the records, the maximum number of electrodes that can be recorded using the head box to indicate the number of columns, and an array containing channel numbers recorded to read the appropriate columns. Other fields used in the following steps and discussed in the relevant sections are also included in the same structure for record keeping. The code also checks for skipped or doubled records to flag corrupt input files. It returns the structure recordingData containing fields for data, dataSpecs, and other record details included in the input file.

The dataset should ideally be filtered to reduce noise after this step. One way is using the filterData function, which uses a 60 Hz Notch filter and a band-pass filter. You can also use some other MATLAB function or write your own.

In makeFigures_1_2_3_HC.m, setupDataSpecs.m loads structure dataSpecs with the default values: the file name (fileName) is sample_HC_data.txt, the frequency of data collection (dataCollectionFrequency) is 250, the number of header lines to skip (numHeaderLines) is 15, the maximum number of electrodes that can be recorded using the head-box (numHeadboxElectrodes) is 128, the regex of the records (textFormat) is set to read date, time, record ID, and all 128 channels recorded, and the indices of recorded channels (channelsRecorded) is 1:37. Some default values for other steps are also described by dataSpecs and will be discussed in the relevant sections.

This structure is passed to readXLTekRecordingTextFile as an input, and the output is the structure recordingData with its data field containing the EEG response. The output is filtered using filterData.m with the frequency range 2-15, and the filtered output replaces the unfiltered EEG in the data field of structure recordingData.

1.3.2 Extracting response to a stimulus using trigger markers

After reading the recorded data in a MATLAB structure, passing it to extractStimulusSpecificResponse function produces a structure containing the response specific to the stimulus extracted using the triggers. The function uses a trigger defined in the dataset specification structure, marking the start and end of the stimulus in the file. The relevant response is extracted and standardized per channel with zero mean and unit variance. It returns a structure containing the standardized responses and dataset specification details.

In makeFigures_1_2_3_HC, extractStimulusSpecificResponse is called with the structure recordingData as input and produces the structure stimulusSpecificResponse as output. The data field of the output structure contains the response specific to the audio stimulus, extracted using the defined trigger marker. The trigger (triggerString) is defined by dataSpecs and passed on to recordingData in 1.3.1.

Note: extractStimulusSpecificReponse extracts response to only one stimulus with start and end marked using the trigger. If the paradigm has multiple stimuli, for example, forward audio, rest, and backward audio. In that case, the response to forward and backward audio must be exported in separate files such that each file has only two triggers, marking the start and end of the forward (or backward) audio. If both are included in the same file, readRecordingTextFile function will read it, but extractStimulusSpecificResponse will throw an error telling you that the file has four triggers but needs two. Similarly, if only one of the markers makes it into the text file while extracting data from the input file, the readRecordingTextFile will read it, but extractStimulusSpecificResponse will throw an error telling you that the file only has one trigger but needs two.

1.3.3 Extracting phoneme class-specific responses using time markers

We can extract phoneme class-specific responses by passing the time markers obtained in section 1.2 and stimulus-specific responses extracted in section 1.3.2 as inputs to extractEntitySpecificResponses.

The function uses several details included in the dataset specification structure to accomplish this: the interval to be extracted (in seconds) referenced to the start of a phoneme, all noisy intervals to be discarded (in seconds) referenced to the start of a stimulus, frequency at which data was collected, and a threshold to automatically remove any extracted phoneme class-specific response with amplitude more than it as containing artifacts. Phonemes are short and vary in length (mean interval 74 ms, range 4 ms to 312 ms), but we extracted uniform responses starting 200 ms before and lasting 500 ms after the start of a phoneme. We defined intervals to be discarded by manually inspecting the stimulus-specific response in EEGLAB for artifacts. Datasets with a lot of artifacts were discarded entirely. The data used here were collected at 250 Hz. We used ten standard deviations as the threshold for the automated removal of artifact-ridden responses. Since the stimulus-specific response is standardized, the threshold value used is 10.

In makeFigures_1_2_3_HC, extractEntitySpecificResponses is called with structures stimulusSpecificResponse (from section 1.3.2) and idTaggedTimeMarkers (from section 1.2), producing structure phonemeSpecificResponse. The output structure has a field for phoneme-specific response and another field tracking their respective phoneme-class IDs. The mappings of phonemes are provided in the field annotationSpecs.

Some default parameters used for extracting phonemes are defined in setupDataSepcs.m: two intervals are discarded (intervalsToDiscard) ([0 3; 145 150] in seconds), the extraction interval for phonemes in seconds (extractionInterval) is [-0.2 0.5], and the cutoff of amplitude for automatic removal of phoneme responses (maxCutoff) is ten units.

Note: Using extractEntitySpecificResponses, extraction can be done at the level of other entities, such as syllables and words, given extraction analysis interval.

1.4 Statistical analysis and false discovery rate (FDR) correction

We conduct pairwise statistical testing for all channels and time-points to measure the difference in the extracted responses to phoneme classes. This testing can be done using theoretical p-value estimates looked up from a table or by generating null datasets by shuffling the labels of entities. The pairwiseStatisticalTesting function takes in the structure containing the phoneme class-specific responses and a structure with the specifications for statistical testing. These specifications include the handle to the statistical test function, IDs of phoneme class pairs to compare, and a flag to select empirical (value true) or theoretical (value false) estimation of p-values. IDs of phoneme class-pairs can be a list of sets of pairs or string ‘all’ to compare all pairs. If the flag for empirical estimation of p-values is set to true, p-values are estimated using shuffled datasets and the number of shuffles and blocks to break the responses in also needs to be provided. If the number of blocks is 1, the shuffling destroys short and long-range correlations. With more blocks, short-range correlations can be retained. The statistical test specification structure also allows for defining seeds for shuffling such that the results are reproducible. If the flag for empirical estimation of p-values is set to false, p-values are theoretically estimated using Wilcoxon ranksum test. For either value of the p-value estimation flag, the result is a structure with the spatiotemporal arrays of p-values and statistic values and all specification details (data, annotation, and statistical). For empirical testing, the second output provides statistic values for null distribution.

For FDR correction, we must define the rate and select the strategy (across time, space, or both). Here, we used a rate 0.05 and corrected it across time for each channel using the Benjamini-Hochberg method. Along with this structure, we feed p-Values to allPairsFDRCorrection, which returns a map with spatiotemporal points with a significant difference in responses to the individual phoneme class pairs and the FDR-corrected cutoff p-Values. The resulting significance maps can be combined across pairs, repeats, and subjects.

In makeFigures_1_2_3_HC, setupStatisticalTestSpecs.m loads the structure statisticalTestSpecs with the default values: the test function used (statisticalTestFunction) is the handle to estimateRanksum.m, the test is two-tailed (tailed set to 'both'), IDs to compare (idPairsToCompare) is set to 'all' to compare all phoneme class-pairs, and the p-value estimation flag (empiricalEstimateFlag) is false, which invokes the Wilcoxon ranksum test. If the flag is set to ture, default values of the number of shuffles (numShuffles = 1000), the number of blocks to shuffle (numBlocks = 10), and seeds for shuffling (seeds = 1:1000) are also initialized.

After loading statisticalTestSpecs, pairwiseStatisticalTesting is called with structures statisticalTestSpecs and phonemeSpecificResponses (from section 1.3), which results in the structure statisticalTestResults.

Next, for FDR correction, setupFDRSpecs.m loads the structure fdrSpecs containing the default specifications: rate (rate) 0.05 applied temporally (strategy = 'time').

Finally, allPairsFDRCorrection is called with statisticalTestResults and fdrSpecs, producing significanceMaps containing the details about significant interactions for each phoneme pair. These maps can be averaged across trials and subjects.

1.5 Visualization

First, loadColors loads the color specifications used in the paper.

1.5.1 Visualization of Figure 1A

To plot the responses to a phoneme class, makeResponsePlot can be called with responses to the class from phonemeSpecificResponses, and a structure with the specification of the figure. To highlight temporal windows, markSignificanceInResponsePlot can be called with the spatiotemporal map of significance for the phoneme pair from significanceMaps, and a structure with the specification of the figure.

In makeFigures_1_2_3_HC, setupResponsePlotSpecs loads default specifications for the response plot as figureSpecs. Since we plot two phoneme pairs (plosives and vowels) and the windows where their responses are significantly different, we call makeResponsePlot twice with different colors, once for each phoneme class (from phonemeSpecificResponses). The color of plot can be changed in the field color of figureSpecs. To highlight temporal windows with significance, we change the figure color to yellow and call markSignificanceInResponsePlot with the comparison of plosives and vowels and figureSpecs.

1.5.2 Visualization of Figure 2A and 3A

Script makeSummaryPlot makes the summary plots reported in the paper. It takes two arguments, the 2D significance array to be plotted and the specifications of the figure.

In makeFigures_1_2_3_HC, setuptSummaryPlotSpecs loads default specifications for the summary plot. We change the ylimits of the subfigures by changing field ylim which is a structure with 3 fields, both corresponding to the spatiotemporal plot, time corresponding to the temporal plot, and space corresponding to the spatial bar plot and the topoplot. Then, we call makeSummaryPlot with the significance array of the phoneme-pair to plot (from significanceMaps) and figureSpecs to plot Figure 1B. Similarly, for Figure 1C, we change ylim values and call makeSummaryPlot with the average of the significance array across phoneme pairs (from significanceMaps) and figureSpecs.

2 Tracking of Natural Speech Envelope (NSE)

This section walks through the analysis pipeline for NSE by following the healthy control example reported in Jain et al. in Supplementary Figures 2 and 3. The results can be generated by running makeSupplementaryFigures_2_3_HC script, and takes about 1 hour to run.

2.1 Required files

In addition to the dataset file, we need the audio stimulus as a .wav file to extract the speech envelope. The audio stimulus is provided as alice.wav in folder sampleInputFiles.

2.2 Extraction of NSE

To extract the speech envelope from the audio file, we use preprocessStimulusNSE and call it with a structure containing details about the stimulus such as the name of the stimulus file, frequency band to filter (same as data), frequency of data to resample the envelope, and file name in which we will save the output. This script does the following to extract the envelope: a. Read the audio file, which results in an array with separate inputs to the left and right ear. b. Calculate the magnitude of the Hilbert transform of the inputs. This step is the most time-consuming in this process. c. Filter the magnitude to retain frequencies of interest. d. Resample the filtered output at the same frequency as data collection.

It returns the input structure with added fields corresponding to the steps above. Also, this structure is saved to a file that can be loaded for consecutive runs or other trials. Since there are two inputs, one for each ear, we use the average envelope for further analysis. Averaging first and then extracting gives comparable results. As this is a time-consuming step, we provide the file with extracted NSE band-pass filtered to 2-15 Hz (to match phoneme analysis) and downsamples to 250 Hz (frequency of EEG collection) in file alice_B2Hz_15Hz_F250Hz.mat. This reduces the runtime from 1 hour to less than 20 minutes.

In makeSupplementaryFigures_2_3_HC, setupStimulusSpecs loads the structure stimulusSpecs with the default details about the stimulus and its processing: the stimulus file (fileName) 'alica.wav', the frequency for band-pass filtering (frequencyBand) is the same as data i.e., 2-15 Hz, and the frequency of the EEG recording (frequency) 250Hz to resampled the envelope, and name of the file (processedFileName) where the extracted NSE will be saved. This structure is the input to preprocessStimulusNSE, which calculates the speech envelope in the data field of returned structure stimulus. For recordkeeping, stimulus specifications are returned in the field stimulusSpecs of structure stimulus.

2.3 Measuring cross-correlation between NSE and EEG responses to it

To evaluate how NSE is tracked in EEG responses, we measure the cross-correlation between the two signals using estimateNSE and compare it to a null distribution generated by shuffling the NSE segments while keeping the EEG responses static. The function takes three inputs: stimulus-specific response extracted in section 1.3.2, speech envelope estimated in section 2.2, and a structure containing details for the estimation. The structure contains the block size for the segments, maximum lag for cross-correlation, number of shuffles, and number of blocks for creating null datasets. The function returns a structure containing the cross-correlation values. The null dataset can be accessed as the second output.

In makeSupplementaryFigures_2_3_HC, setupNSESpecs.m load the structure nseSpecs containing the default specification: the block size of segments (blockSize) is 2, the maximum lag for cross-correlation (maxLag) in seconds is 0.5, the number of shuffles (numShuffles) is 10,000, and the number of blocks (numBlocks) is 10. Then, estimateNSE is called with stimulusSpecificResponse (from section 1.3.2), stimulus (from section 2.2), and nseSpecs. The first output nseResponse contains cross-correlation values in the field data and corresponding p-values in the field pValues. The second output shuffledNSEResponse contains the values for the null dataset.

2.4 Statistical analysis and FDR correction

The p-values for cross-correlation values are estimated using the shuffled datasets and returned in the field pValues in the output of estimateNSE. FDR correction is done as in section 1.4 and returns a map containing spatiotemporal points marking significant cross-correlation between NSE and its EEG response. These spatiotemporal maps can be averaged across trials and subjects.

2.5 Visualization

Similar to section 1.5, we first call loadColors. Panels corresponding to healthy controls in Supplementary Figures 2 and 3 can be made following steps in sections 1.5.1 and 1.5.2.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages