AIFuture of AI

SLMs: Beyond the hype

By Dr Joe Mullen, Director of Data Science for SciBite at Elsevier

Interest in small language models (SLMs) is booming as GenAI remains at the top of the business agenda. But these models are not as solidly defined as you might think – with many different ideas about where the line between SLMs and large language models (LLMs) lies.Ā Ā 

Typically, an SLM refers to a model trained on a small, specific dataset – although ā€˜small’ in this context is still referring to potentially tens of billions of parameters. In this sense, SLMs are defined in opposition to LLMs, which have so far dominated the AI market and can have trillions of parameters that cover a huge scope of human knowledge.Ā Ā 

Already there are smaller versions of several GenAI LLMs available, including Mistral, Gemma and Llama. The uptick in interest in SLMs comes from an increasing recognition that LLMs can’t solve every problem that organizations might use GenAI for– and that sometimes more specific tasks require more specific solutions.Ā Ā 

There is also a growing need for more lightweight models that can be used in a variety of situations and on a wider variant of devices.Ā Ā 

But in truth there’s no official border where an LLM becomes an SLM. Instead of thinking of these models as a binary, organizations need to look beyond strict definitions and make sure they are aligning the right model type – whether large or small – with the specific task they are facing.Ā 

Small or large?Ā Ā 

Leaving binary distinctions behind, but we can instead focus on some specific types of models that may work particularly well in certain situations.Ā Ā Ā 

For example, a very small model trained on an extremely specific dataset can be useful in situations where high levels of precision are needed. R&D and evidence-based industries, such as life sciences, chemistry or engineering, are particularly suited to such models because of their many complex and discrete workflows.Ā Ā 

Consider a pharma company working on a precision medicine project: a model could be trained on a dataset related only to that disease and patient population to generate ideas for potential new drug candidates.Ā Ā 

Alternatively, an organization might want something with a larger number of parameters that is also lightweight in terms of resource use, and can therefore be used on mobile devices or in conjunction with edge computing.Ā Ā 

Such a model could be installed on tablets used by doctors in hospitals, giving them access to a GenAI tool that could advise on treatment options while they are visiting patients, or on mobile devices used by engineers working in the field.Ā Ā 

Security might also be a factor. By limiting a language model to a specific internal dataset with no connection to cloud systems, organizations can be more confident in the privacy of their inputs. Healthcare is again a strong use case here – instead of sending sensitive patient data over a network, a small model could run locally to ensure data stays within the system.Ā Ā 

Looking to larger models, many organizations have found it beneficial to have an enterprise-wide model with an upper level of parameters that allows employees to question and search all data the enterprise holds via a conversational, chatbot style GenAI interface.Ā Ā 

Likewise, larger models can be more useful when implementing AI agents, since the more data an agent has access to, the greater its contextual understanding and learning capabilities – which in turn gives it more scope to act autonomously.Ā 

What to ask when choosing a modelĀ 

With all this in mind, it’s clear that instead of asking ā€œDo I need an SLM or LLM?ā€, organizations should look at a more nuanced set of considerations, namely:Ā Ā Ā 

  • What datasets they have in houseĀ Ā 
  • What datasets will need to be acquired externallyĀ Ā 
  • What time/resources they haveĀ Ā 
  • What domain expertise they have access toĀ 
  • Which AI models they are already usingĀ 
  • What hardware they will need the AI to run onĀ 
  • How broadly applicable the outcomes need to beĀ 
  • What specific security issues they will need to be aware ofĀ 

Although ā€˜SLM’ is a hyped-up term at the moment, it’s likely that in the long term we’ll move away from a hard SLM/LLM distinction towards thinking of GenAI as a continuum. At one end is an AI that works on a small, extremely specific dataset. At the other end is something like ChatGPT, with trillions of parameters and access to a wide corpus of human knowledge.Ā 

For that reason, it’s unlikely that SLMs will come to replace LLMs entirely, or vice versa. For many organizations, the most useful model for their specific tasks will actually lie between those two extremes – which is why it’s exciting to see so many new options emerging with every passing month.Ā Ā 

Author

Related Articles

Back to top button