LeNet - 5

Yann LeCun

Mainly used for digit classification
We take an image of 32 x 32 x 1.
Back then people used average pooling.
The last conv layer has 400 parameters.
As we go deeper in the network $n_H$ and $n_W$decrease while $n_C$ increases.
Conv → Pool → Conv → Pool → FC → FC → Output
They used sigmoid and tanh and not ReLU.
Also, it used the activation layer after Average pooling layer.

AlexNet

Alex Krizhevsky, Ilya Sutskevar, Geoffrey Hinton

Image size of 227 x 227 x 3.
This paper applied Max Pooling.
The last Conv layer has 9216 parameters.
Used ReLU.
Multiple GPUs were used
LeNet-5 had 60k parameters, AlexNet had 60M parameters.