Getting Started with Neural Network

Jan 5, 2022

Implementing and training a neural network could be complicated. Below is a checklist for modeling with pytorch.

Preprocessing

Input:
- Normalization / subtract mean, reason: optimization shape.
- Features should be on the same scale
- Outlier/very large number should be capped/rescaled: for large input, gradient of sigmoid and tanh is close to zero (saturation), this will slow down optimization
- For CV task: resize image to same dimension
- For NLP task: text cleaning and tokenization
Output:
- For regression problem, standardize $y$

Network architecture

Base network
Activation
Regularization
- Dropout (torch.nn.X in Dropout Layers)
- Batch normalization (torch.nn.X in Normalization Layers)
- Weight regularization (defined as an argument of torch.optimizer.X)
- Ordering of batch norm and dropout
Loss function
Eval mode: disable Dropout, BatchNorm, etc.

Optimizer

Optimizer
- optimizer
- weight initializers
Learning rate
- Lr scheduler: scheduler.step()
- ReduceLROnPlateau
- CyclicLR
Early stopping
Sanity check
- Lr estimating: run a grid search/random search on $\log(\text{lr})$ for a few epochs
- Check the network can overfit a small batch of data

Dataloader

Batch size
- Restricted by GPU vRAM
- Large batch size leads to poor generalization
- Increase batch size ~= decay lr
Dataset
Dataloader
torchvision.transform

Model Serialization

state_dict
Save: torch.save(model.state_dict(), PATH)
Load into existing model: model.load_state_dict(torch.load(PATH))
Init model and load: model = torch.load(PATH)

Diagnostics

Get parameters: torch.parameters
cuda available: device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’
torch version: torch.__version__

Tensorboard

Write to tensorboard

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer.add_scalar("Loss/train", loss, epoch)

Host tensorboard

tensorboard --logdir PATH
[ssh port forwarding](https://www.ssh.com/academy/ssh/tunneling/example): ssh -L 16006:127.0.0.1:6006 usr@server_ip

Transfer learning

Number of layers to be finetuned