add a function--ApplyAddAdditiveNoise#10
add a function--ApplyAddAdditiveNoise#10LvHang wants to merge 23 commits intopegahgh:xvector-feat-extractionfrom
Conversation
| // In the version, we ask the noise_cols >= input_cols. If mfcc, the cols are equal. | ||
| // If raw data, we ask the noise_cols > input_cols. | ||
| int32 input_rows = input_eg.NumRows(), input_cols = input_eg.NumCols(); | ||
| KALDI_ASSERT(noise_eg.NumCols() >= input_cols); |
There was a problem hiding this comment.
The dimension of noise eg and input should be equal. noise_eg.NumCols() == input_cols
There was a problem hiding this comment.
@pegahgh
Hi, pegah.
I know the noise_eg.NumCols() == input_cols should be equal in feature domain, such as mfcc.
I let noise_eg.NumCols() > input_cols, just because I want to do something like you write in ApplyPerturbation(). It makes the dimensionality of the noise_eg will a little longer than input_eg. It maybe useful in raw-data situation.
I just want to make sure. Thanks a lot.
Hang
There was a problem hiding this comment.
For now, we can focus on MFCC domain, and if it gives us improvement, we can switch to raw waveform.
We may need to write different Perturbation class for raw waveform as we have more flexibility in raw waveform domain.
|
@pegahgh |
| // This function add the noise to the orginial signal. We should not normalize | ||
| // the signal level of the orginial signal. According to SNR, we rescale the noise | ||
| // and add it. So that the perturbed signal is created. | ||
| void ApplyAddAdditiveNoise(const int32 &SNR, |
There was a problem hiding this comment.
Change the name to ApplyAdditiveNoise
| start_col_ind, input_cols)); | ||
| // compute the energy of noise and input | ||
| Matrix<BaseFloat> input_energy_mat(input_rows, input_cols); | ||
| input_energy_mat.AddMatMatElements(1.0, input_eg, input_eg, 1.0); |
There was a problem hiding this comment.
Although input_energy_mat initialized with zero, it should be AddMatMatElements(1.0, input_eg, input_eg, 0.0),
There was a problem hiding this comment.
It is not a good idea to design the code like this. You should write this function in signal-distort.h and add-noise and snr should be added as options to XvectorPerturbationOptions struct.
The function should be ApplyAdditiveNoise(const VectorBase input, const VectorBase noise, BaseFloat snr, Vector *noisy_input)
In class PerturbXvectorSignal, you have applyDistortion which is a general function, which applies all type of distortions to input.
Then it applies distortions w.r.t opts_.
You need to add a function PerturbExamples(const XvectorOptions opts, const Matrix &input_egs, Matrix *perturbed_egs)
and this function called in nnet3-xvector-signal-perturb.cc and it generates object from PerturbXvectorSignal and vectorize the input and calls ApplyDistortion to apply different type of distortions on input.
|
@pegahgh |
src/feat/signal-distort.cc
Outdated
| void PerturbXvectorSignal::ApplyAdditiveNoise(const MatrixBase<BaseFloat> &input_eg, | ||
| const Matrix<BaseFloat> &noise_eg, | ||
| const int32 &SNR, | ||
| Matrix<BaseFloat> *perturb_eg) { |
src/feat/signal-distort.cc
Outdated
| // and add it. So that the perturbed signal is created. | ||
| void PerturbXvectorSignal::ApplyAdditiveNoise(const MatrixBase<BaseFloat> &input_eg, | ||
| const Matrix<BaseFloat> &noise_eg, | ||
| const int32 &SNR, |
There was a problem hiding this comment.
You don't need to define snr. SNR is defined in XvectorPerturbOptions and you can use opts_.snr.
Also you should not use uppercase in defining function variables.
The names of variables (including function parameters) and data members are all lowercase, with underscores between words.
src/feat/signal-distort.cc
Outdated
| const kaldi::nnet3::NnetIo &noise_eg_io = noise_eg.io[0]; | ||
| Matrix<BaseFloat> noise_eg_mat; | ||
| noise_eg_io.features.CopyToMat(&noise_eg_mat); | ||
| int32 SNR = opts_.snr; |
There was a problem hiding this comment.
Add these lines nnet3-perturb-egs binary.
I told you that You have to just call PerturbExamples function in nnet3-perturb-egs.cc. You should put loop for reading egs in nnet3 binary not here!
PerturbExamples should be defined as a separate function not a function of this class. The point is that in PertubEgs function, you create object from class PerturbXvectorSignal and call ApplyDistortion.
src/feat/signal-distort.h
Outdated
| #include "feat/resample.h" | ||
| #include "matrix/matrix-functions.h" | ||
| #include "cudamatrix/cu-matrix.h" | ||
| #include "nnet3/nnet-example.h" |
There was a problem hiding this comment.
remove it, it is a wrong dependency!
src/feat/signal-distort.h
Outdated
|
|
||
| void ApplyAdditiveNoise(const MatrixBase<BaseFloat> &input_eg, | ||
| const Matrix<BaseFloat> &noise_eg, | ||
| const int32 &SNR, |
| const Matrix<BaseFloat> &input_egs, | ||
| Matrix<BaseFloat> *perturb_egs) { | ||
| //new a PerturbXvectorSignal object and call ApplyDistortion | ||
| PerturbXvectorSignal perturb_xvector(opts); |
There was a problem hiding this comment.
- Change perturb_egs to perturbed_egs
- Change perturb_xvector to perturb_egs.
|
@pegahgh |
src/feat/signal-distort.cc
Outdated
| void PerturbXvectorSignal::ApplyDistortion(const MatrixBase<BaseFloat> &input_egs, | ||
| Matrix<BaseFloat> *perturb_egs) { | ||
| // conduct ApplyAdditiveNoise | ||
| if (!opts_.add_noise_rspecifier.empty()) { |
There was a problem hiding this comment.
I think the best strategy is to have a add-noise option in PertubXvectorOption as noise rspecifier not noise examples.
--add-noise=noise.scp, where noise.scp corresponds to features for different noises. You can randomly select different noises.
Then you no longer need to pass noise matrix to PerturbExample and you can easily pass noise rspecifier using --add-noise option.
You don't need to change ApplyAdditiveNoise class. You just need to check if add-noise is not empty in ApplyDistortion and the read matrix of noise using BaseFloatMatrixReader and pass it to ApplyAdditiveNoise.
|
@pegahgh |
src/feat/signal-distort.cc
Outdated
| if (!opts_.add_noise.empty()) { | ||
| // choose a noise from the noise.scp/ark | ||
| // 1) we need to record the keys of noise_egs | ||
| std::vector<std::string> list_noise_egs; |
There was a problem hiding this comment.
It is no longer noise_egs, the name should be noise_list!
src/feat/signal-distort.cc
Outdated
| noise_seq_reader.Close(); | ||
|
|
||
| // 2) we random choose an noise example | ||
| int32 num_noise_egs = list_noise_egs.size(); |
There was a problem hiding this comment.
num_noises is better name for num_noise_egs!
src/feat/signal-distort.cc
Outdated
| ApplyAdditiveNoise(input_egs, *noise_egs_, perturb_egs); | ||
| // conduct others | ||
| // TODO | ||
| } else { // deal with the opts_.noise_egs situation |
There was a problem hiding this comment.
no need for else condition!
We can compose several different perturbation to generated perturbed_egs.
src/feat/signal-distort.cc
Outdated
| } | ||
| } | ||
|
|
||
| // This function is a entrance. It calls ApplyDistortion to apply different |
src/feat/signal-distort.h
Outdated
| opts->Register("noise-egs", &noise_egs, "If supplied, the additive noise is added to input signal."); | ||
| opts->Register("rand_distort", &rand_distort, "If true, the signal is slightly changes" | ||
| "using some designed FIR filter with no zeros."); | ||
| opts->Register("add-noise", &add_noise, "specify a file contains some noise egs"); |
There was a problem hiding this comment.
change the definition! e.g. Noise rspecifier for additive noises, if nonempty, the additive noise randomly selected and added to input egs.
| } | ||
| }; | ||
|
|
||
| class PerturbXvectorSignal { |
There was a problem hiding this comment.
Add Comment about PerturbXvectorSignal class
src/feat/signal-distort.h
Outdated
| public: | ||
| PerturbXvectorSignal(XvectorPerturbOptions opts): opts_(opts) { }; | ||
|
|
||
| inline void SetNoiseEgs(const Matrix<BaseFloat> &noise_egs) { |
There was a problem hiding this comment.
remove this. You don't need noise_egs_ as private member of class.
src/feat/signal-distort.h
Outdated
| XvectorPerturbOptions opts_; | ||
| // if we want use many examples in once ApplyDistortion, we can expand the point | ||
| // to a point vector. | ||
| const Matrix<BaseFloat> *noise_egs_; |
src/feat/signal-distort.cc
Outdated
| std::string key_noise_eg = list_noise_egs[index_noise_eg]; | ||
| RandomAccessBaseFloatMatrixReader noise_random_reader(opts_.add_noise); | ||
| Matrix<BaseFloat> noise_eg_mat = noise_random_reader.Value(key_noise_eg); | ||
| SetNoiseEgs(noise_eg_mat); |
There was a problem hiding this comment.
remove this line, and also change noise_egs_mat to noise_mat and key_noise_eg to noise_name!
src/feat/signal-distort.cc
Outdated
| Matrix<BaseFloat> noise_eg_mat = noise_random_reader.Value(key_noise_eg); | ||
| SetNoiseEgs(noise_eg_mat); | ||
|
|
||
| ApplyAdditiveNoise(input_egs, *noise_egs_, perturb_egs); |
There was a problem hiding this comment.
You can directly use noise_mat, why do you use noise_egs_?
| input_end_point - input_start_point + 1); | ||
| SubVector<BaseFloat> noise_part(noise, noise_start_point, | ||
| noise_end_point - noise_start_point + 1); | ||
| Vector<BaseFloat> selected_noise(input_part.Dim()); |
| while (the_rest > noise_part.Dim()) { | ||
| selected_noise.Range(selected_noise.Dim()-the_rest, | ||
| noise_part.Dim()).CopyFromVec(noise_part); | ||
| the_rest = the_rest - noise_part.Dim(); |
There was a problem hiding this comment.
the_rest is not appropriate name.
| "nnet3-fvector-perturb-signal --noise-range-file=uttid.range.n --add-noise-list=" | ||
| "scp:noise.scp --input-channel=0 input.wav output.wav\n"; | ||
| "nnet3-fvector-perturb-signal --noise=scp:noise.scp --noise-range=" | ||
| "\"head -n 5 a.noiserange | tail -n 1\" --input-channel=0 input.wav " |
There was a problem hiding this comment.
don't write head -n ... in the usage. Write actual values in range line for description.
| controller.push_back(NoiseController(wav_t_start, wav_t_end, noise_uttid, | ||
| noise_t_start, noise_t_end, snr)); | ||
| } | ||
| if (noise_range != "") { |
| BaseFloat scale_factor = sqrt(input_energy/ noise_energy/ (pow(10, snr/20)) ); | ||
| output.Range(input_start_point, input_part.Dim()).AddVec(scale_factor, selected_noise); | ||
| } | ||
| ApplyNoise(noise, controller, input, &output); |
There was a problem hiding this comment.
Rename name "controller" in the code.
Then you can combine wav2pertubedwav genereted using different perturbation function in future. 2.2. You should use ranges.scp and wav2perturbedwav and data/train/wav.scp as input to python file generate-pertub-wav-specifier.py, and it will generate new wav.scp e.g. data/train_pertubed/wav.scp |
|
@pegahgh Thanks |
d846e5d to
c6ec39e
Compare
|
@pegahgh Hang |
|
Hi Pegah,
I will deal with update the utt2spk and so on in data directory. The examples have been sent to your mailbox. Hang |
|
@pegahgh Best wishes, |
| wav_extended_files[current_wav_index], | ||
| args.noise, | ||
| perturbed_range_contents[current_perturbed_wav_index]),file=g) | ||
| g.close() |
There was a problem hiding this comment.
All variable names you used is not informative, but we can fix it later.
|
I expect the code to compile and run properly.
|
|
Hi Pegah,
Best wishes, |
|
Hi Pegah, Best wishes, |
| "<utterance-id> <approx-num-frames>)") | ||
| parser.add_argument("oriutt2allutt", | ||
| help="oriutt2allutt to be used as input (format is: " | ||
| "<ori-utt-id> <ori-utt-id> <p1-utt-id> ... <pn-utt-id>)") |
There was a problem hiding this comment.
Do you mean origutt2allutt? it is better to name it utt2perturbedutt!
| f.close() | ||
|
|
||
| f = open(args.egs_dir + "/temp/" + prefix + "outputs." + str(job + 1), "w") | ||
| if f is None: |
There was a problem hiding this comment.
I don't really like temp for directory name. you can rename it as aux_info.
| utt_len = lengths[utt_index] | ||
| offset = GetRandomOffsets(utt_len, args.frames_per_chunk) | ||
| this_egs.append( (utt_index, offset) ) | ||
| all_egs.append(this_egs) |
There was a problem hiding this comment.
In some situations, the variable all_egs gets large cause some io problem and it may slow down the program.
So it maybe better to write down and read egs for different archives. What is the running speed for allocate_egs in your case?
Also do you discard the utterances, that you generated pairs for them?
There was a problem hiding this comment.
From line 180 to 189:
I generate a 'egs list' for each archive. Each item in the 'egs list' is a tuple (utt, offset).
In later code, I will use each tuple to generate the final information which will be parsed by nnet3-fvector-get-egs.cc.
I will use 'utt' to randomly select two utterances from ' ... ' to generate ' '. And 'offset' is used to assign "", which means the distance between the beginning of utterance and the start frame.
So, at last, I will generate the information as followed:
I'm not sure the running speed. I just use a few utterance to test the code is right or not. I use the same method with David to do this part.
| this_egs = [ ] # this will be an array of 2-tuples (utterance-index, start-frame). | ||
| for n in range(this_num_egs): | ||
| utt_index = RandomUttAtLeastThisLong(args.frames_per_chunk) | ||
| utt_len = lengths[utt_index] |
There was a problem hiding this comment.
you should have a variable exclude_utt and add utt_index to exclude_utt and then just choose random utterance using (all_utt - excluded_utt) for each archive.
|
|
||
| std::string utt_a = fields[0], | ||
| utt_b = fields[1], | ||
| start_frame_str = fields[4], |
There was a problem hiding this comment.
fields[2] and fields[3]. You just copied from David's code without changing the name!!
There was a problem hiding this comment.
What do you mean?
The form of the information is:
<source-utterance1> <source-utterance2> <relative-archive-index> <absolute-archive-index> <offset> <frame-length>
So fields[2] and fields[3] are same with David's '<relative-archive-index> <absolute-archive-index>'.
Note: <relative-archive-index> is the zero-based offset of the archive-index within the subset of archives that a particular ranges file corresponds to; and is the 1-based numeric index of the destination archive among the entire list of archives, which will form part of the archive's filename (e.g. egs/egs.<absolute-archive-index>.ark); <absolute-archive-index> is only kept for debug purposes so you can see which archive each line corresponds to.
|
@LvHang
Meanwhile, we can start preparing our raw-waveform setup for fvector(I think all codes for this experiment is also ready. am I right?? |
|
|
Hi Pegah, Two problems:
Best wishes |
@pegahgh
Hi Pegah,
I have already finished the function about add additive noise with the option "--add-noise".
Please check it.
Thank you for your guidance.
Hang