Consolidation AI unveiled a smaller, more efficient 1.6B language model as part of ongoing innovation.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

When it comes to large language models (LLMs), size certainly matters because it affects where the model can run.

Stability AI, the vendor perhaps best known for its stable diffusion text-to-image generative AI technology, today released one of its smallest models ever with the debut of the Stable LM 2 1.6B. Stable LM is a text content generation LLM that Stable AI first launched in April 2023 with 3 billion and 7 billion parameter models. The new StableLM model is actually the second model released by Stable AI in 2024, following the company’s Stable Code 3B earlier this week.

The new compact yet powerful Stable LM model aims to reduce barriers and enable more developers to participate in the creative AI ecosystem by incorporating multilingual data in seven languages ​​– English, Spanish, German, Italian, French, Portuguese And Dutch. This model uses recent algorithmic advances in language modeling to give Stability AI an optimal balance between speed and efficiency.

“In general, larger models trained on similar data with a similar training composition outperform smaller models,” Carlos Requelmi, head of the language team, told VentureBeat. However, over time, as new models implement better algorithms and are trained on more and higher quality data, we sometimes see newer smaller models outperform older larger models.”

Why Smaller Is Better With Stable LM (This Time)

According to Stability AI, this model outperforms other small language models with fewer than 2 billion parameters on most benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B, and Falcon 1B.

The new smaller Stable LM is also able to outperform some larger models, including Stability AI’s own first Stable LM 3B model.

“The stable LM 2 1.6B outperforms some of the larger models that were trained a few months ago,” said Riquelme. “If you think about computers, televisions or microchips, we can see almost the same trend, they got smaller, thinner and better over time.”

To be clear, the smaller Stable LM 2 1.6B has some drawbacks due to its size. In its release for the new model, Stability AI warns that “…due to the nature of small, low-capacity language models, Stable LM 2 1.6B also suffers from common problems such as high hallucination rates or potentially toxic language.” can exhibit.”

Transparency and more data are fundamental in the release of new models.

More towards smaller more powerful LLM options is what Stability AI has been pushing for the past few months.

In December 2023, the StableLM Zephyr 3B model was released, providing more performance to StableLM with a smaller size than the initial iteration in April.

Riquelme explained that the new Stable LM 2 models are trained on more data, including multilingual documents in 6 languages ​​besides English (Spanish, German, Italian, French, Portuguese and Dutch). Another interesting aspect highlighted by Riquelme is the order in which data is presented to the model during training. They note that focusing on different types of data during different training phases can pay off.

Going a step further, Stability AI is making new models available with pre-trained and optimized options, as well as a format that the researchers describe as, “… the last model checkpoint before pre-training cooldown. “

“Our goal here is to give individual developers more tools and patterns to innovate, change and build on their existing model,” said Riquelme. “Here we’re providing a certain half-baked model for people to play with.”

During training, the model is sequentially updated and its performance improves, Riquelme explained. In this scenario, the first model knows nothing, while the last model has used and hopefully learned most aspects of the data. At the same time, Riquelme said the models may be less vulnerable at the end of their training because they are forced to internalize the learning.

“We decided to provide the model in its current form before starting the final training phase, so that it would hopefully be easier to master on other tasks or datasets that people might want to use,” he said. They said. “We’re not sure if it will work well, but we really believe in people’s ability to leverage new tools and models in great and surprising ways.”

Mission of VentureBeat To be a digital town square for technology decision makers to learn about transformative enterprise technology and transactions. Explore our briefing.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment