About dropout and BN

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, **so your indicator would be scaled to 1/0.6 or drop to 0**, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About dropout and BN #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About dropout and BN #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions