Skip to content

About dropout and BN #11

@AmoseKang

Description

@AmoseKang

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions