These random errors and fluctuations are learned by the model while training so well that training data model’s accuracy becomes very high causing overfitting of data..Sometimes more than required training can lead to overfitting.source : https://stats.stackexchange.com/So, what can be done to avoid problem of overfitting ?It can be observed easily that higher the weights higher is the non linearity, so an easy way would be to penalize the weights while they are updated..At this point we have two such techniques using this idea.L1 norm :The way this works is that it adds a penalty term with a parameter λ in the error function that we need to reduce..W is nothing but the weight matrix here.Here, λ is a hyper parameter whose value is decided by us..If λ is high it adds high penalty to error term making the learned hyper plain almost linear, if λ is close to 0 it has almost no effect on the error term causing no regularization.L1 regularization is often seen as a feature selection technique too as it zero out the respective weights of features undesired..L1 is also computationally inefficient on non-sparse cases.2..L2 norm :L2 regularization may not seem quite different from L1, but they have almost non similar impact..Here weights ‘w’ are squared individually and then added..L1’s feature selection property is lost here but it provides better efficiency on non-sparse cases..Sometimes L2 is known by names such as lasso or ridge regression..parameter λ works same as L1.Early Stopping : We have added penalty to control the values of weights till now but there are other ways to regularize too, what if when the train error starts going down and test error starts going up while training, we stop the training..This will give us the desired setup of train test error.. More details