Info Dump
A place I dump information, for later sorting. Do not expect logical thoughts, or things here to be correct!
Last updated
A place I dump information, for later sorting. Do not expect logical thoughts, or things here to be correct!
Gradient vanishing and blowing up for sigmoid, ReLU
Gradient Clipping
Stable Sigmoid
Leaky ReLU
Learning Rate needed for ReLU is roughly 10x smaller than for sigmoid due to bounds of derivative
Flipping the kernel in backpropagation
He and Xavier Initialization + reasons
Importance of weight initialization
im2col
Importance of Non-Linearity (via activation functions)
Batch Normalization (mean 0, std 1) vs /= 255.0
CNNs
Translation Invariance
FC NN vs CNN for MNIST
Dead Kernels (constantly 0)
Learning Rate for kernels is much smaller than for the FC NN at the end, due to their size
Last updated