Warning: mkdir(): No space left on device in /var/www/hottg/post.php on line 59

Warning: file_put_contents(aCache/aDaily/2024-05-29/post/datastorieslanguages/--): Failed to open stream: No such file or directory in /var/www/hottg/post.php on line 72
​​Explaining grokking through circuit efficiency @Data, Stories and Languages
TG Telegram Group & Channel
Data, Stories and Languages | United States America (US)
Create: Update:

​​Explaining grokking through circuit efficiency

The paper explores the phenomenon of "grokking" in neural networks, where a network that initially performs poorly on new data eventually excels without any change in training setup. According to the authors, grokking occurs when two conditions are present: a memorizing solution and a generalizing solution. The generalizing solution takes longer to learn but is more efficient in terms of computational resources. The authors propose a "critical dataset size" at which the efficiencies of memorizing and generalizing are equal, providing a pivot point for the network to switch from memorization to generalization.

Furthermore, the paper introduces two new behaviors: "ungrokking" and "semi-grokking." Ungrokking describes a situation where a well-performing network reverts to poor performance when trained on a smaller dataset. Semi-grokking refers to a scenario where the network, instead of achieving full generalization, reaches a state of partial but improved performance.

Paper link: https://arxiv.org/abs/2309.02390

My overview of the paper:
https://andlukyane.com/blog/paper-review-un-semi-grokking
https://artgor.medium.com/paper-review-explaining-grokking-through-circuit-efficiency-1f420d6aea5f

#paperreview

​​Explaining grokking through circuit efficiency

The paper explores the phenomenon of "grokking" in neural networks, where a network that initially performs poorly on new data eventually excels without any change in training setup. According to the authors, grokking occurs when two conditions are present: a memorizing solution and a generalizing solution. The generalizing solution takes longer to learn but is more efficient in terms of computational resources. The authors propose a "critical dataset size" at which the efficiencies of memorizing and generalizing are equal, providing a pivot point for the network to switch from memorization to generalization.

Furthermore, the paper introduces two new behaviors: "ungrokking" and "semi-grokking." Ungrokking describes a situation where a well-performing network reverts to poor performance when trained on a smaller dataset. Semi-grokking refers to a scenario where the network, instead of achieving full generalization, reaches a state of partial but improved performance.

Paper link: https://arxiv.org/abs/2309.02390

My overview of the paper:
https://andlukyane.com/blog/paper-review-un-semi-grokking
https://artgor.medium.com/paper-review-explaining-grokking-through-circuit-efficiency-1f420d6aea5f

#paperreview


>>Click here to continue<<

Data, Stories and Languages






Share with your best friend
VIEW MORE

United States America Popular Telegram Group (US)


Fatal error: Uncaught TypeError: shuffle(): Argument #1 ($array) must be of type array, null given in /var/www/hottg/post.php:344 Stack trace: #0 /var/www/hottg/post.php(344): shuffle() #1 /var/www/hottg/route.php(63): include_once('...') #2 {main} thrown in /var/www/hottg/post.php on line 344