Skip to content

Miscelanea

Jorge MF edited this page Oct 2, 2020 · 22 revisions

Training with Quantization Noise for Extreme Model Compression [code] (Apr 2020)
Apply Quantization during training for random nodes of the network. It achieves a small model, a quantized model, with good performance as the model without any quantization.

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [code] (Mar 2020)
Using AutomML to discover Machine Learning algorithms, it discovered gradient descent.

Proving the Lottery Ticket Hypothesis: Pruning is All You Need (Feb 2020)
Given an over parametrized neural network, you can find a smaller one pruning the neurons with same accuracy as a target network.

Rigging the Lottery: Making All Tickets Winners (Dec 2019)
A method to get a sparse network efficiently by dropping and growing connection, without starting from a dense network.

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models (Nov 2019)
Enhance the performance of anomaly detection by verifying predictions of deep discriminative models using deep generative models. Deep Verifier Networks (DVN) which is based on conditional variational auto-encoders with disentanglement constraints.

Ranked List Loss for Deep Metric Learning [code] (Mar 2019)
Similar to triplet loss but using list of elements instead of 2 elements and an anchor.

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data (Nov 2018)
A correlation matrix is created with a period of different signals and this matrix is used in an autoencoder with recurrent networks and convolutions. Differences with the decoded matrix detect the anomalies.

An Empirical Model of Large-Batch Training (Dec 2018)
The gradient noise scale predicts the parallelizability of neural network training. Complex tasks tend to have noisier gradients, increasingly large batch sizes are likely to become useful.

Learning deep representations by mutual information estimation and maximization (Oct 2018)
Unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder.

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning [code] (Aug 2018)
The weights of the network are a normal distribution instead of a single value. It makes the ouputs of each neuron a bit noise. The variance and mean are updated with the gradient rule.

DARTS: Differentiable Architecture Search [code] (Jun 2018)
Efficiently design high-performance convolutional architectures for image classification and recurrent architectures for language modeling based on continuous relaxation and gradient descent in the architecture space.

Relational recurrent neural networks (Jun 2018)
Added a memory module with multi-head attention to improve in tasks that require higher understanding of the relations between steps. Applied to RL and language modeling.

Large scale distributed neural network training through online distillation (Apr 2018)
Added as error the difference between the predictions of the current model and a stale model trained model in other dataset or a subsample of the dataset. Helps to speed up distributed training.

Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN [code] (Mar 2018)
IndRNNs use an element-wise vector multiplication u * state meaning each neuron has a single recurrent weight connected to its last hidden state. Neurons in recurrent layers are independent from each other.

One Model To Learn Them All (Jun 2017)
Single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task

Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks (Apr 2017)
Loss function where the relationships between classes are taking into account. For example, to rank elements when the outputs are the scores (from 1 to 10).

Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance (Nov 2016)
A model-agnostic technique that produces high-precision rule-based explanations of the outputs of the network based on the inputs (which input is more relevant per each output)

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization [code] (Oct 2016)
Show the gradients activation (>0) of a layer based on a target prediction of the model.

Clone this wiki locally