Wednesday, November 6, 2024

The Premise of a New S-Curve in AI by @ttunguz

Since July, have you ever seen how significantly better your AI mannequin has turn into? Measuring them is tough to do. All we are able to do is quantify the vibe : is that this one higher than that one?

Elo is a rating that measures how typically one mannequin wins in opposition to one other, as judged by a human. Which mannequin solutions the immediate : “Describe the variations in texture between a Pink Girl and a Macoun apple” higher? The one with the upper Elo rating.1

image

Within the final 4 months, the highest 100 fashions have improved their Elo by about 60 factors, with the highest fashions now at 1339 vs 1287 in July.

image
The largest efficiency beneficial properties occurred on the heart a part of the distribution. Researchers have pushed considerably extra efficiency with improvements in algorithms.

Mannequin Dimension Win Chance Improve (%) Definition
Small 32.0% < 10b parameters
Medium 22.4% 10b – 100b parameters
Massive 29.6% 100 – 200b parameters
Mega 25.9% 200b+ parameters

The smallest fashions have elevated efficiency most. October fashions have elevated their win charges by almost a 3rd in 4 months. The entire fashions have improved their aggressive win charges by greater than 20%.

image

In July, we posed the query : what occurs when mannequin efficiency asymptotes? Progress in small, medium, & massive fashions is linear in Elo-terms.

However the mega fashions present extra information factors of inflection, suggesting the current improvements in reasoning & scale (the most important fashions have grown from 200b parameters to greater than 400b) have produced the start of a brand new high-growth S-curve.


1 See the Bradley-Terry mannequin.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles