This is because the architectures uncovered by pruning are harder to train from the beginning and bring down the accuracy significantly.
Objective of this Paper This paper aims to show that there exist smaller sub-networks that train from the start.
These networks learn at least as fast as their larger counterpart while achieving similar test accuracy.
For instance, we randomly sample and train sub-networks from a fully connected network for MNIST and convolutional networks for CIFAR10: The dashed lines trace the iteration of minimum validation loss and the test accuracy at that iteration across various levels of sparsity.
The sparser the network, the slower the learning and the lower the eventual test accuracy.
This is where the researchers have stated their lottery ticket hypothesis.
Lottery Ticket Hypothesis A randomly-initialized, dense neural network contains a sub-network, labeled as winning tickets.
This is initialized such that, when trained in isolation, it can match the test accuracy of the original network after training for at most the same number of iterations.
Here is a superb illustration of the lottery ticket hypothesis concept: Identifying Winning Tickets We identify a ticket by training its network and pruning its smallest-magnitude weight.
The remaining, unpruned connections constitute the architecture of the winning ticket.
Each unpruned connection’s value is then reset to its initialization from the original network before it was trained.
The process for doing this involves an iterative process of smart training and pruning.
I have summarized it in five steps: Randomly initialize a neural network Train the network until it converges Perform pruning To extract the winning ticket, reset the weights of the remaining portion of the network to their values from step 1 The pruning is one shot, which means it is done only once.
But in this paper, the researchers focus on iterative pruning, which repeatedly trains, prunes, and resets the network overground.
Each round prunes p^(1/n) % of the weights that survive the previous round.
As a result, this iterative pruning finds winning tickets that match the accuracy of the original network at smaller sizes as compared to one-shot pruning.
Applications A question that comes to everyone’s mind when reading these research papers – where in the world can we apply this?.It’s all well and good experimenting and coming up trumps with a new approach.
But the jackpot is in converting it into a real-world application.
This paper can be super useful for figuring out the winning tickets.
Lottery Ticket Hypothesis could be applied on fully-connected networks trained on MNIST and on convolutional networks on CIFAR10, increasing both the complexity of the learning problem and the size of the network.
Existing work on neural network pruning demonstrates that the functions learned by a neural network can often be represented with fewer parameters.
Pruning typically proceeds by training the original network, removing connections, and further fine-tuning.
In effect, the initial training initializes the weights of the pruned network so that it can learn in isolation during fine-tuning.
The Importance of Winning Ticket Initialization A winning ticket learns more slowly and achieves lower test accuracy when it is randomly reinitialized.
This suggests that initialization is important to its success.
The Importance of Winning Ticket Structure The initialization that gives rise to a winning ticket is arranged in a particular sparse architecture.
Since we uncover winning tickets through heavy use of training data, we hypothesize that the structure of our winning tickets encodes an inductive bias customized to the learning task at hand.
Limitation and Future Work The researchers are aware that this isn’t the finished product yet.
The current approach has a few limitations which could be addressed in the future: Larger datasets are not investigated.
Only vision-centric classification tasks on smaller datasets are considered.
These researchers intend to explore more efficient methods for finding winning tickets that will make it possible to study the lottery ticket hypothesis in more resource-intensive settings Sparse pruning is our only method for finding winning tickets.
The researchers intend to study other pruning methods from the extensive contemporary literature, such as structured pruning (which would produce networks optimized for contemporary hardware) and non-magnitude pruning methods (which could produce smaller winning tickets or find them earlier) The winning tickets found have initializations that allow them to match the performance of the unpruned networks at sizes too small for randomly-initialized networks to do the same.
The researchers intend to study the properties of these initializations that, in concert with the inductive biases of the pruned network architectures, make these networks particularly adept at learning End Notes In this article, we thoroughly discussed the two best research papers published in ICLR.
I learned so much while going through these papers and understanding the thought process of these research experts.
I encourage you to go through these papers yourself once you finish this article.
There are a couple of more research focused conferences coming up soon.
The International Conference on Machine Learning (ICML) and the Computer Vision and Pattern Recognition (CVPR) conferences are lined up in the coming months.
Stay tuned!.You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.
adsbygoogle || ).