About 50 results
Open links in new tab
  1. neural networks - Why would regularization reduce training error ...

    Feb 11, 2026 · Though it may be that back in 2012, "primitive" training practices and lack of batch norm meant that l2 regularization helped sgd. @GaslightDeceiveSubvert having trained several million …

  2. What is regularization in plain english? - Cross Validated

    Is regularization really ever used to reduce underfitting? In my experience, regularization is applied on a complex/sensitive model to reduce complexity/sensitvity, but never on a simple/insensitive model to …

  3. L1 & L2 double role in Regularization and Cost functions?

    Mar 19, 2023 · Regularization is a way of sacrificing the training loss value in order to improve some other facet of performance, a major example being to sacrifice the in-sample fit of a machine learning …

  4. What are Regularities and Regularization? - Cross Validated

    Is regularization a way to ensure regularity? i.e. capturing regularities? Why do ensembling methods like dropout, normalization methods all claim to be doing regularization?

  5. Why is the L2 regularization equivalent to Gaussian prior?

    Dec 13, 2019 · I keep reading this and intuitively I can see this but how does one go from L2 regularization to saying that this is a Gaussian Prior analytically? Same goes for saying L1 is …

  6. Boosting: why is the learning rate called a regularization parameter?

    The learning rate parameter ($\\nu \\in [0,1]$) in Gradient Boosting shrinks the contribution of each new base model -typically a shallow tree- that is added in the series. It was shown to dramatically

  7. Difference between weight decay and L2 regularization

    Apr 6, 2025 · I'm reading [Ilya Loshchilov's work] [1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and L2 L 2 norm regularization are the same for SGD …

  8. Why do smaller weights result in simpler models in regularization?

    Dec 24, 2015 · Regularization like ridge regression, reduces the model space because it makes it more expensive to be further away from zero (or any number). Thus when the model is faced with a choice …

  9. Why do we only see $L_1$ and $L_2$ regularization but not other norms?

    Mar 27, 2017 · I am just curious why there are usually only L1 L 1 and L2 L 2 norms regularization. Are there proofs of why these are better?

  10. When will L1 regularization work better than L2 and vice versa?

    Nov 29, 2015 · Note: I know that L1 has feature selection property. I am trying to understand which one to choose when feature selection is completely irrelevant. How to decide which regularization (L1 or …