Joint training with new l2reg technique#4
Joint training with new l2reg technique#4pegahgh wants to merge 37 commits intodanpovey:leaky-hmm-merge-xentfrom
Conversation
There were some stray tabs in the Windows INSTALL file.
Annoyingly, github's zip download doesn't seem to preserve line endings. See http://stackoverflow.com/questions/17347611/downloading-a-zip-from-github-removes-newlines-from-text-files
Patch on windows requires CRLF. This should leave them unchanged.
Should be fixed in the windows_line_endings branch.
Previous version would only look in the root dir which obviously doesn't work.
The previous commits stop git mangling the file endings, but the file endings in the repository were LF, where they should have been CRLF.
…ns, to work for larger matrices.
Windows line endings
Windows docs
It previously didn't mention that you need --enable-openblas (if using OpenBLAS).
…s the most recent commit).
arpa-file-parser.cc: Added a warning when declared count of n-grams is 0 const-arpa-lm.cc: Print an extended error message instead of simple ASSERT
Separated ARPA parsing from const LM construction
Windows docs
…n leaky-hmm (got rid of the 'special state').
… linear function of cross-entropy
|
Pegah, development is now in the 'chain' branch in the official kaldi Dan On Thu, Jan 28, 2016 at 7:28 PM, pegahgh notifications@github.com wrote:
|
…-training-l2reg
src/chain/chain-training.cc
Outdated
There was a problem hiding this comment.
Pegah, I don't think this equation is right. You are first optimizing w.r.t. 'scale' and then 'offset'-- they need to be optimized jointly.
There was a problem hiding this comment.
btw, this is linear regression in one dimension. you could just look up the equations if you want.
There was a problem hiding this comment.
i'm deleting your relevant jobs till this is fixed.
There was a problem hiding this comment.
Dear Dan
Hi
No, I didn't optimize first w.r.t scale and then offset. I jointly solved
them and I substitute offset as a function of scale in 1st equation and
solved scale.
On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote:
In src/chain/chain-training.cc
#4 (comment):
- BaseFloat scale = supervision.weight * opts.l2_regularize;
- *l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);
- if (nnet_output_deriv)
nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);- BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;
- // If xent_output provided, l2 penalty is trying to regress the chain output
- // to be a linear function of cross-entropy output.
- // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,
- // where x is cross-entropy output and y is chain output.
- if (xent_output) {
//compute offset and scale// The objecitve is to minimize L w.r.t scale_i, offset_i,// L = -0.5 \* l2_regularize *// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),// where the target_ji = scale_i \* xent_output_ji + offset_i.// scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)i'm deleting your relevant jobs till this is fixed.
—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51216219.
There was a problem hiding this comment.
Something doesn't seem right about the equation- it doesn't seem like it
has the correct shift invariance. I think you missed a term involving
scale * offset when you computed the derivatives and solved.
Dan
On Fri, Jan 29, 2016 at 2:11 AM, pegahgh notifications@github.com wrote:
In src/chain/chain-training.cc
#4 (comment):
- BaseFloat scale = supervision.weight * opts.l2_regularize;
- *l2_term = -0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans);
- if (nnet_output_deriv)
nnet_output_deriv->AddMat(-1.0 \* scale, nnet_output);- BaseFloat scale_coeff = supervision.weight * opts.l2_regularize;
- // If xent_output provided, l2 penalty is trying to regress the chain output
- // to be a linear function of cross-entropy output.
- // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x + offset - y)^2,
- // where x is cross-entropy output and y is chain output.
- if (xent_output) {
//compute offset and scale// The objecitve is to minimize L w.r.t scale_i, offset_i,// L = -0.5 \* l2_regularize *// \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji - target_ji)^2),// where the target_ji = scale_i \* xent_output_ji + offset_i.// scale_i = \sum_j (nnet_output_ji \* xent_output_ji) / \sum_j(xent_output_ji^2)Dear Dan Hi No, I didn't optimize first w.r.t scale and then offset. I
jointly solved them and I substitute offset as a function of scale in 1st
equation and solved scale.
… <#-1527430838_>
On Thu, Jan 28, 2016 at 8:42 PM, Daniel Povey notifications@github.com
wrote: In src/chain/chain-training.cc <#4 (comment)
https://github.com/danpovey/kaldi/pull/4#discussion_r51216219>: > -
BaseFloat scale = supervision.weight * opts.l2_regularize; > - *l2_term =
-0.5 * scale * TraceMatMat(nnet_output, nnet_output, kTrans); > - if
(nnet_output_deriv) > - nnet_output_deriv->AddMat(-1.0 * scale,
nnet_output); > + BaseFloat scale_coeff = supervision.weight *
opts.l2_regularize; > + // If xent_output provided, l2 penalty is trying to
regress the chain output > + // to be a linear function of cross-entropy
output. > + // It minimizes -0.5 * l2_regularize * l2_norm(diag(scale) * x
- offset - y)^2, > + // where x is cross-entropy output and y is chain
output. > + if (xent_output) { > + //compute offset and scale > + // The
objecitve is to minimize L w.r.t scale_i, offset_i, > + // L = -0.5 *
l2_regularize * > + // \sum_{j=1}^{minibatch_size}(\sum_i (nnet_output_ji -
target_ji)^2), > + // where the target_ji = scale_i * xent_output_ji +
offset_i. > + // scale_i = \sum_j (nnet_output_ji * xent_output_ji) /
\sum_j(xent_output_ji^2) i'm deleting your relevant jobs till this is
fixed. — Reply to this email directly or view it on GitHub <
https://github.com/danpovey/kaldi/pull/4/files#r51216219>.—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/4/files#r51231663.
No description provided.