Fluent Numbers 🌱

❯

❯

❯

Jan 20, 20241 min read

regularization

_{scroll ↓ to Resources}

Contents

Note
Regularization for shallow models
- [[#Regularization for shallow models#Methods|Methods]]
  - [[#Methods#Lasso (L1)|Lasso (L1)]]
  - [[#Methods#Ridge|Ridge]]
  - [[#Methods#ElasticNet|ElasticNet]]
Regularization for deep neural networks
Resources

Note

regularization is used to regulate model complexity and help fight overfitting or multicollinearity
it leads to increasing the error on the training set and decreasing the error on the validation set
there are methods, which modify the loss function and the ones, which modify data like data augmentation

Regularization for shallow models

normalization is required for most algorithms

Methods

λ is a hyperparameter and needs to be adjusted from experiments
Minimizing the sum of two functions

Lasso (L1)

Lasso (L1) - punishes non-zero coefficients ⇒ some coefficients go to 0

Ridge

Ridge (L2) - punishes large coefficients ⇒ makes model robust to small changes in input data, well differentiable
Minimization task becomes a sum of the loss function and the squared weights:
Minimizing the above yields the solution for weights w:

ElasticNet

ElasticNet (L1+L2)

Regularization for deep neural networks

data augmentation
dropout
weight decay
label smoothing
normalization

Resources

Знакомьтесь, линейные модели / Хабр

Cheat sheet

Graph View

regularization
Contents
Note
Regularization for shallow models
Methods
Lasso (L1)
Ridge
ElasticNet
Regularization for deep neural networks
Resources

Backlinks

batch normalization
dropout
layer normalization
regularization

Recent

log probs
Aug 30, 2025
synthetic data
Aug 20, 2025
chunking strategy
Jul 26, 2025
hard negative
Jul 25, 2025
How to kindly request the best interview feedback
Jul 22, 2025

Created with Quartz v4.4.1 © 2025

GitHub
Discord Community