Skip to content

add a function--ApplyAddAdditiveNoise#10

Open
LvHang wants to merge 23 commits intopegahgh:xvector-feat-extractionfrom
LvHang:xvector-feat-extraction
Open

add a function--ApplyAddAdditiveNoise#10
LvHang wants to merge 23 commits intopegahgh:xvector-feat-extractionfrom
LvHang:xvector-feat-extraction

Conversation

@LvHang
Copy link

@LvHang LvHang commented Nov 29, 2016

@pegahgh
Hi Pegah,
I have already finished the function about add additive noise with the option "--add-noise".
Please check it.
Thank you for your guidance.

Hang

// In the version, we ask the noise_cols >= input_cols. If mfcc, the cols are equal.
// If raw data, we ask the noise_cols > input_cols.
int32 input_rows = input_eg.NumRows(), input_cols = input_eg.NumCols();
KALDI_ASSERT(noise_eg.NumCols() >= input_cols);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dimension of noise eg and input should be equal. noise_eg.NumCols() == input_cols

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pegahgh
Hi, pegah.
I know the noise_eg.NumCols() == input_cols should be equal in feature domain, such as mfcc.
I let noise_eg.NumCols() > input_cols, just because I want to do something like you write in ApplyPerturbation(). It makes the dimensionality of the noise_eg will a little longer than input_eg. It maybe useful in raw-data situation.
I just want to make sure. Thanks a lot.
Hang

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we can focus on MFCC domain, and if it gives us improvement, we can switch to raw waveform.
We may need to write different Perturbation class for raw waveform as we have more flexibility in raw waveform domain.

@LvHang
Copy link
Author

LvHang commented Nov 29, 2016

@pegahgh
Hi, Pegah
I modify it. Please check it and review others. Thanks!
Hang

// This function add the noise to the orginial signal. We should not normalize
// the signal level of the orginial signal. According to SNR, we rescale the noise
// and add it. So that the perturbed signal is created.
void ApplyAddAdditiveNoise(const int32 &SNR,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the name to ApplyAdditiveNoise

start_col_ind, input_cols));
// compute the energy of noise and input
Matrix<BaseFloat> input_energy_mat(input_rows, input_cols);
input_energy_mat.AddMatMatElements(1.0, input_eg, input_eg, 1.0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although input_energy_mat initialized with zero, it should be AddMatMatElements(1.0, input_eg, input_eg, 0.0),

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a good idea to design the code like this. You should write this function in signal-distort.h and add-noise and snr should be added as options to XvectorPerturbationOptions struct.
The function should be ApplyAdditiveNoise(const VectorBase input, const VectorBase noise, BaseFloat snr, Vector *noisy_input)

In class PerturbXvectorSignal, you have applyDistortion which is a general function, which applies all type of distortions to input.
Then it applies distortions w.r.t opts_.

You need to add a function PerturbExamples(const XvectorOptions opts, const Matrix &input_egs, Matrix *perturbed_egs)
and this function called in nnet3-xvector-signal-perturb.cc and it generates object from PerturbXvectorSignal and vectorize the input and calls ApplyDistortion to apply different type of distortions on input.

@LvHang
Copy link
Author

LvHang commented Dec 4, 2016

@pegahgh
Hi Pegah,
According to your suggestions, I modify the files. Maybe it also has some unsatisfying points.
Could you give me some suggestion?
Thank you very much for your patient and guidance.
Hang

void PerturbXvectorSignal::ApplyAdditiveNoise(const MatrixBase<BaseFloat> &input_eg,
const Matrix<BaseFloat> &noise_eg,
const int32 &SNR,
Matrix<BaseFloat> *perturb_eg) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name should be perturbed_eg

// and add it. So that the perturbed signal is created.
void PerturbXvectorSignal::ApplyAdditiveNoise(const MatrixBase<BaseFloat> &input_eg,
const Matrix<BaseFloat> &noise_eg,
const int32 &SNR,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to define snr. SNR is defined in XvectorPerturbOptions and you can use opts_.snr.
Also you should not use uppercase in defining function variables.
The names of variables (including function parameters) and data members are all lowercase, with underscores between words.

const kaldi::nnet3::NnetIo &noise_eg_io = noise_eg.io[0];
Matrix<BaseFloat> noise_eg_mat;
noise_eg_io.features.CopyToMat(&noise_eg_mat);
int32 SNR = opts_.snr;
Copy link
Owner

@pegahgh pegahgh Dec 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add these lines nnet3-perturb-egs binary.
I told you that You have to just call PerturbExamples function in nnet3-perturb-egs.cc. You should put loop for reading egs in nnet3 binary not here!
PerturbExamples should be defined as a separate function not a function of this class. The point is that in PertubEgs function, you create object from class PerturbXvectorSignal and call ApplyDistortion.

#include "feat/resample.h"
#include "matrix/matrix-functions.h"
#include "cudamatrix/cu-matrix.h"
#include "nnet3/nnet-example.h"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it, it is a wrong dependency!


void ApplyAdditiveNoise(const MatrixBase<BaseFloat> &input_eg,
const Matrix<BaseFloat> &noise_eg,
const int32 &SNR,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove SNR

const Matrix<BaseFloat> &input_egs,
Matrix<BaseFloat> *perturb_egs) {
//new a PerturbXvectorSignal object and call ApplyDistortion
PerturbXvectorSignal perturb_xvector(opts);
Copy link
Owner

@pegahgh pegahgh Dec 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Change perturb_egs to perturbed_egs
  2. Change perturb_xvector to perturb_egs.

@LvHang
Copy link
Author

LvHang commented Dec 6, 2016

@pegahgh
Hi Pegah,
Accorading to our discussion, I modify the files.
I check some other option structures. In general, they will be the one-dimensional data type, such as int, double, string and so on. So I didn't add a matrix to XvectorPerturbOptions. I add a private point in class PerturbXvectorSignal. Maybe we can discuss and find a better solution.
Please check the binaries. Thank you very much for your guidance.
Hang

void PerturbXvectorSignal::ApplyDistortion(const MatrixBase<BaseFloat> &input_egs,
Matrix<BaseFloat> *perturb_egs) {
// conduct ApplyAdditiveNoise
if (!opts_.add_noise_rspecifier.empty()) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change option name to add_noise

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best strategy is to have a add-noise option in PertubXvectorOption as noise rspecifier not noise examples.
--add-noise=noise.scp, where noise.scp corresponds to features for different noises. You can randomly select different noises.
Then you no longer need to pass noise matrix to PerturbExample and you can easily pass noise rspecifier using --add-noise option.
You don't need to change ApplyAdditiveNoise class. You just need to check if add-noise is not empty in ApplyDistortion and the read matrix of noise using BaseFloatMatrixReader and pass it to ApplyAdditiveNoise.

@LvHang
Copy link
Author

LvHang commented Dec 6, 2016

@pegahgh
Hi Pegah,
I modify it. Now, we pass a filename, such as noise.scp, to add-noise option. And we choose a noise matrix in ApplyDistortion function. I hope I understand your intention in a right way.
Please check it. Thanks a lot for your patience.
Hang

if (!opts_.add_noise.empty()) {
// choose a noise from the noise.scp/ark
// 1) we need to record the keys of noise_egs
std::vector<std::string> list_noise_egs;
Copy link
Owner

@pegahgh pegahgh Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is no longer noise_egs, the name should be noise_list!

noise_seq_reader.Close();

// 2) we random choose an noise example
int32 num_noise_egs = list_noise_egs.size();
Copy link
Owner

@pegahgh pegahgh Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_noises is better name for num_noise_egs!

ApplyAdditiveNoise(input_egs, *noise_egs_, perturb_egs);
// conduct others
// TODO
} else { // deal with the opts_.noise_egs situation
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for else condition!
We can compose several different perturbation to generated perturbed_egs.

}
}

// This function is a entrance. It calls ApplyDistortion to apply different
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the comment!

opts->Register("noise-egs", &noise_egs, "If supplied, the additive noise is added to input signal.");
opts->Register("rand_distort", &rand_distort, "If true, the signal is slightly changes"
"using some designed FIR filter with no zeros.");
opts->Register("add-noise", &add_noise, "specify a file contains some noise egs");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the definition! e.g. Noise rspecifier for additive noises, if nonempty, the additive noise randomly selected and added to input egs.

}
};

class PerturbXvectorSignal {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Comment about PerturbXvectorSignal class

public:
PerturbXvectorSignal(XvectorPerturbOptions opts): opts_(opts) { };

inline void SetNoiseEgs(const Matrix<BaseFloat> &noise_egs) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this. You don't need noise_egs_ as private member of class.

XvectorPerturbOptions opts_;
// if we want use many examples in once ApplyDistortion, we can expand the point
// to a point vector.
const Matrix<BaseFloat> *noise_egs_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove noise_egs_!

std::string key_noise_eg = list_noise_egs[index_noise_eg];
RandomAccessBaseFloatMatrixReader noise_random_reader(opts_.add_noise);
Matrix<BaseFloat> noise_eg_mat = noise_random_reader.Value(key_noise_eg);
SetNoiseEgs(noise_eg_mat);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line, and also change noise_egs_mat to noise_mat and key_noise_eg to noise_name!

Matrix<BaseFloat> noise_eg_mat = noise_random_reader.Value(key_noise_eg);
SetNoiseEgs(noise_eg_mat);

ApplyAdditiveNoise(input_egs, *noise_egs_, perturb_egs);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly use noise_mat, why do you use noise_egs_?

input_end_point - input_start_point + 1);
SubVector<BaseFloat> noise_part(noise, noise_start_point,
noise_end_point - noise_start_point + 1);
Vector<BaseFloat> selected_noise(input_part.Dim());
Copy link
Owner

@pegahgh pegahgh Dec 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for this part.

while (the_rest > noise_part.Dim()) {
selected_noise.Range(selected_noise.Dim()-the_rest,
noise_part.Dim()).CopyFromVec(noise_part);
the_rest = the_rest - noise_part.Dim();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the_rest is not appropriate name.

"nnet3-fvector-perturb-signal --noise-range-file=uttid.range.n --add-noise-list="
"scp:noise.scp --input-channel=0 input.wav output.wav\n";
"nnet3-fvector-perturb-signal --noise=scp:noise.scp --noise-range="
"\"head -n 5 a.noiserange | tail -n 1\" --input-channel=0 input.wav "
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't write head -n ... in the usage. Write actual values in range line for description.

controller.push_back(NoiseController(wav_t_start, wav_t_end, noise_uttid,
noise_t_start, noise_t_end, snr));
}
if (noise_range != "") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it (!noise_range.empty())

BaseFloat scale_factor = sqrt(input_energy/ noise_energy/ (pow(10, snr/20)) );
output.Range(input_start_point, input_part.Dim()).AddVec(scale_factor, selected_noise);
}
ApplyNoise(noise, controller, input, &output);
Copy link
Owner

@pegahgh pegahgh Dec 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename name "controller" in the code.

@pegahgh
Copy link
Owner

pegahgh commented Dec 29, 2016

  1. I reviewed your codes and added comments
    1.1. You should remove unnecessary files from git.
    1.2. you should add pertubed-wav-id in the beginning of range line.
    1.3. You should write all ranges for all waves in a same file as ranges.

  2. In the next step, you need to generate two files using python file.
    2.1. wav2pertubedwav , it is similar to spk2utt
    wav1 wav1-p1 wav1-p2 wav1-p3
    In future, it can contains different perturbed version of wav like reverberated wav or combined spks.
    It is better to generate this file in generate-noise-ranges.py.

Then you can combine wav2pertubedwav genereted using different perturbation function in future.

2.2. You should use ranges.scp and wav2perturbedwav and data/train/wav.scp as input to python file generate-pertub-wav-specifier.py, and it will generate new wav.scp e.g. data/train_pertubed/wav.scp
wav.scp -> it is the same as old wav.scp, but it has perturbed version of wav added to file
e.g.
wav1 sph2pipe -f wav -p -c 1 $path/wav1.sph |
wav1-p1 sph2pipe -f wav -p -c 1 $path/wav1.sph | nnet3-fvector-perturb-signal --noise-scp=scp:noise.scp noise-range="range-p1-for-wav1" - |
wav1-p2 sph2pipe -f wav -p -c 1 $path/wav1.sph | nnet3-fvector-perturb-signal --noise-scp=scp:noise.scp noise-range="range-p2-for-wav1" - |
...

@LvHang
Copy link
Author

LvHang commented Dec 30, 2016

@pegahgh
Hi Pegah,
According your suggestions, I fixed the generate_noise_range.py, add_noise.sh and nnet3-fvector-perturb-signal.cc.
Please check it.
I will submit the files about step2 later tonight.

Thanks
Hang

@LvHang LvHang force-pushed the xvector-feat-extraction branch from d846e5d to c6ec39e Compare December 31, 2016 06:17
@LvHang
Copy link
Author

LvHang commented Dec 31, 2016

@pegahgh
Hi Pegah,
According to your suggestions, I modified the generate_noise_range.py and add_noise.sh. Now
1.1 I remove unnecessary files in github
1.2 I add pertubed-wav-id in the beginning of range line and all ranges for all waves in the same file. At the same time, I generate wav2pertubedwav in generate_noise_range.py.
1.3 I write the file--generate-pertub-wav-specifier.py which could generate the perturbed_wav.scp as you show.
But, in generate-pertub-wav-specifier.py, there is a problem. When != , it will make mistake.
The reason is : The general wav.scp format is . However, when the segment file( ) is exist, the utt2dur is . And our range is generated from utt2dur. So we use the . I will different from the so that make the mistake.
When the data directory doesn't have segment, == , the python code works well.
I think may be we need a redirection with segment. I will try it tomorrow.
Please check it. Thank you for your guidance.

Hang

@LvHang
Copy link
Author

LvHang commented Jan 3, 2017

Hi Pegah,
I modify the files in steps/nnet3/fvector directory.

  1. If the segments is exist, I will temporary remove it so that the format of file-utt2dur is , and I will use the new utt2dur file to produce the perturbed_wav.scp. Thus, it will avoid the problem that " != "
  2. I convert "wav1 wav1-p1 wav1-p2 wav1-p3" to "wav1 p1-wav1 p2-wav1 p3-wav1". I refer to the old kaldi scripts, it always has the prefix other than suffix.

I will deal with update the utt2spk and so on in data directory. The examples have been sent to your mailbox.

Hang

@LvHang
Copy link
Author

LvHang commented Jan 4, 2017

@pegahgh
Hi Pegah,
I write a apply_map_one2mult.pl to complete the mapping form 1 to multiple. It will help in generate utt2spk, segments, text, spk2utt and reco2file_and_channel.
I modify the add_noise.sh. Now it will generate a integrated directory such as data/perturb/ which contains all the efficient files.

Best wishes,
Hang

wav_extended_files[current_wav_index],
args.noise,
perturbed_range_contents[current_perturbed_wav_index]),file=g)
g.close()
Copy link
Owner

@pegahgh pegahgh Jan 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All variable names you used is not informative, but we can fix it later.

@pegahgh
Copy link
Owner

pegahgh commented Jan 12, 2017

I expect the code to compile and run properly.
The next step is to generate MFCC features and examples for perturbed data.

  1. You can easily generate mfcc features using modified wav.scp, so you need to generate separate data dir as a copy of original one with new wav.scp generated using generate-perturb-wav-specifier.py.
    also utt2spk and spk2utt needs to modify using new modified utterances.
    you can also use wav2perturbedwav and spk2utt to generate utt2pertubedutt.
    orig-utt1 utt1-p1 utt1-p2 utt1-p3
  2. In the next step, you can use utt2perturbedutt to generate egs(it is the same as allocate_example.py in xvector)
    e.g.
    utt{i}-p{j} utt{i}-p{k}

@LvHang
Copy link
Author

LvHang commented Jan 16, 2017

Hi Pegah,

  1. I fix the bug about segments. Now add_noise.sh can generate the correct segments file.
  2. I think the file--"perturb_utt_map" can be treat as utt2pertubedutt
  3. According to xvector/allocate_examples.py, I wrote the fvector/allocate_examples.py.
    3.1. I want to make sure one thing. In fvector, assume we have M archives, each archive contains N examples and each example has two chunks. The lengths of two chunks are len1 and len2 separately. I know len1 == len2 and the length of chunk in certain archive is same. I want to know whether the length of chunk in different archives are the same. (In my python code, all chunk length is a specified value). Is it right?
    3.2 For each utterance, we have ori_utt and pertubed{i}_utt. So I think the chunk pair can be 'ori_utt{i} p{j}-utt{i}' or 'p{j}-utt{i} p{k}-utt{i}', right?
  4. I'm writing the nnet3-fvector-get-egs which uses to resolve the output of allocate_examples.py. I will finished it today.
    Please check it. And what can we do next step?

Best wishes,
Hang

@LvHang
Copy link
Author

LvHang commented Jan 17, 2017

Hi Pegah,
I complete the C++ code about nnet3-fvector-get-egs.cc which use to deal with the output of allocate_examples and generate the egs. Please check it.
I will write the shell to connect all the steps: generate perturbed data directory; make mfcc; get egs. And test them.
Maybe we can go to the next step about the simple experiments.

Best wishes,
Hang

"<utterance-id> <approx-num-frames>)")
parser.add_argument("oriutt2allutt",
help="oriutt2allutt to be used as input (format is: "
"<ori-utt-id> <ori-utt-id> <p1-utt-id> ... <pn-utt-id>)")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean origutt2allutt? it is better to name it utt2perturbedutt!

f.close()

f = open(args.egs_dir + "/temp/" + prefix + "outputs." + str(job + 1), "w")
if f is None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like temp for directory name. you can rename it as aux_info.

utt_len = lengths[utt_index]
offset = GetRandomOffsets(utt_len, args.frames_per_chunk)
this_egs.append( (utt_index, offset) )
all_egs.append(this_egs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some situations, the variable all_egs gets large cause some io problem and it may slow down the program.
So it maybe better to write down and read egs for different archives. What is the running speed for allocate_egs in your case?
Also do you discard the utterances, that you generated pairs for them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 180 to 189:
I generate a 'egs list' for each archive. Each item in the 'egs list' is a tuple (utt, offset).
In later code, I will use each tuple to generate the final information which will be parsed by nnet3-fvector-get-egs.cc.
I will use 'utt' to randomly select two utterances from ' ... ' to generate ' '. And 'offset' is used to assign "", which means the distance between the beginning of utterance and the start frame.
So, at last, I will generate the information as followed:

I'm not sure the running speed. I just use a few utterance to test the code is right or not. I use the same method with David to do this part.

this_egs = [ ] # this will be an array of 2-tuples (utterance-index, start-frame).
for n in range(this_num_egs):
utt_index = RandomUttAtLeastThisLong(args.frames_per_chunk)
utt_len = lengths[utt_index]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have a variable exclude_utt and add utt_index to exclude_utt and then just choose random utterance using (all_utt - excluded_utt) for each archive.


std::string utt_a = fields[0],
utt_b = fields[1],
start_frame_str = fields[4],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields[2] and fields[3]. You just copied from David's code without changing the name!!

Copy link
Author

@LvHang LvHang Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?
The form of the information is:
<source-utterance1> <source-utterance2> <relative-archive-index> <absolute-archive-index> <offset> <frame-length>
So fields[2] and fields[3] are same with David's '<relative-archive-index> <absolute-archive-index>'.
Note: <relative-archive-index> is the zero-based offset of the archive-index within the subset of archives that a particular ranges file corresponds to; and is the 1-based numeric index of the destination archive among the entire list of archives, which will form part of the archive's filename (e.g. egs/egs.<absolute-archive-index>.ark); <absolute-archive-index> is only kept for debug purposes so you can see which archive each line corresponds to.

@pegahgh
Copy link
Owner

pegahgh commented Jan 20, 2017

@LvHang
The setup is almost ready for doing some preliminary experiments. You can start doing some preliminary experiment using this setup.
You can start doing experiment on small dataset (e.g. 20% swbd ~ 60hrs) for now and generate 5-10 perturbed version of data using noise.scp, (for now you can use 20% of speakers
Do you have any sample egs generated using this setup?
The next step for generating noise is as follows:

  1. Use music and point-source noises in /export/a09/pegahgh/kaldi-xvector/egs/aspire/s5/RIR_NOISES.
    to generated perturbed data using wsj/s5/steps/nnet3/fvector/generate_perturb_wav_specifier.py.
    (I think you need to generate noise.scp)

  2. In the next step, use egs/aspire/s5/local/multi_condition/reverberate_data_dir.sh to generate reverberated wavs using RIR_NOISES.

  3. combine data-dirs from (1) and (2) and start generating egs for fvector experiment.
    Be careful to have at most 10 perturbed version of each waveform.
    You need to generate features for all of them and generate egs for fvector.
    Let me know when you prepared egs for fvector experiments.

Meanwhile, we can start preparing our raw-waveform setup for fvector(I think all codes for this experiment is also ready. am I right??

@LvHang
Copy link
Author

LvHang commented Jan 20, 2017

@pegahgh

  1. Firstly, I will use small data set, part of swbd corpus, to test my codes and perfect them
  2. Then, I will generate the egs using RIR_NOISE and reverberated RIR_NOISE.
  3. I think you are right. We can preparing our raw-wavform setup. During the period, we can find what we lack and supplement it.

@LvHang
Copy link
Author

LvHang commented Jan 30, 2017

Hi Pegah,
I have finished the test. Now we can use steps/nnet3/fvector/add_noise.sh to complete the process of generating the egs.
I use music noises in /export/a09/pegahgh/kaldi-xvector/egs/aspire/s5/RIR_NOISES(use sox down sampling to 8kHz) to generated perturbed the swbd/data/train. The perturbed version is 7, the min-snr=-10, the max-snr=-20. You can get the perturbed data dir in /export/a11/hlyu/kaldi-now/egs/swbd/s5c/data/perturbed_music. And the egs in /export/a11/hlyu/kaldi-now/egs/swbd/s5c/exp/fvector_a/perturbed_music/egs.

Two problems:

  1. For the point-source noises, they are always 1 second, but the average length of swbd file is 200 seconds. That means we need about 200 noise files to cover a swbd recording.Do we really have to do this?
  2. For reverberated noises, which is impulse response dir?we use it to reverberate the swbd data and treat the reverberated swbd as the noise? Is my understanding right?

Best wishes
Hang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants