Small AI, Big Impact: Boosting Productivity with Small Language Models

Updated Jul 8, 2024 • 11 min read

In 2024, industry giants IBM and Microsoft have identified small language models as one of the most significant trends in artificial intelligence.

Valued for their focus on operational efficiency, these models reduce energy consumption and increase cost-effectiveness. Such advantages allow for innovation with fewer resources compared to large language models.

Small language models' capacity to process billions or even trillions of operations per second on innumerable parameters enables unmatched help for human needs, making them a highly effective artificial intelligence solution.

In the last few years, we’ve already seen examples of large language model capabilities. Some of the most illustrative demos I’ve witnessed include Google Duplex technology, where AI is able to schedule a telephone appointment in a human-like manner. This is possible thanks to the use of speech recognition, natural language understanding, and text-to-speech.

The difficulty is that with immense capability comes immense power, computational consumption and delay. The data storage requirements exceed what could be easily transported, so it relies on connectivity to provide the service. A more compact alternative are SLMs, or small language models, and they're already on the way.

What is a small language model?

If you've ever utilized Copilot to tackle intricate queries, you've witnessed the prowess of large language models. These models demand substantial computing resources to operate efficiently, making the emergence of small language models a significant breakthrough.

SLMs, though still sizable with several billion parameters, stand in stark contrast to the hundreds of billions found in LLMs. They possess a crucial advantage: they are compact enough to run seamlessly on a smartphone, even offline. Parameters serve as the adjustable elements within a model, dictating its behavior and functionality.

“Small language models can make AI more accessible due to their size and affordability,” says Sebastien Bubeck, who leads the Machine Learning Foundations group at Microsoft Research. “At the same time, we’re discovering new ways to make them as powerful as large language models.”

Microsoft researchers have developed two small language models, named Phi and Orca, which demonstrate comparable or superior performance to large language models in specific domains. All the while, they require less computational power. This disproves the common belief that bigger models are always more effective.

In contrast to LLMs, which are trained on encyclopedic datasets sourced from the internet, these smaller language models leverage curated and very high-quality training data. This method has revealed new ways to balance model size with performance. Looking at the market, I expect to see new, improved models this year that will speed up research and innovation.

Difference between LLMs and SLMs

The distinction between SLMs and LLMs is that SLMs are scaled-down versions of their larger counterparts. While LLMs like GPT-4 boast millions or even billions of parameters, SLMs operate on a much simpler scale. They are optimized for efficiently handling less complex tasks without requiring extensive computational resources.

However, despite their simplicity, SLMs remain highly practical. They can undertake tasks such as text generation, question answering, and language translation, though they may have lower accuracy and versatility compared to larger models.

Despite these limitations, small language models are easy to use, quick to train, and adaptable for many uses, from chatbots to language learning tools. Their smaller size also makes it possible to use them on devices with limited resources, such as IoT and mobile devices.

What are the benefits of SLMs?

The benefits of small language models over large language models are numerous. One of the most significant advantages lies in their streamlined design, characterized by fewer parameters and reduced requirements for training.

While LLMs may demand hours, if not days, for training, SLMs can be ready for deployment in mere minutes to a few hours. This efficiency not only accelerates the implementation but also makes SLMs more feasible for on-site or smaller device applications.

Indeed, the flexibility of SLMs allows for customization to cater to specific and niche applications. Unlike LLMs, which often require extensive datasets, SLMs can excel in scenarios where training data is limited.

This adaptability makes them particularly appealing for companies seeking language models optimized for specialized domains or industries, where precision is needed.

SLMs also improve data security, addressing increasing concerns about data privacy and protection. Their smaller codebases and reduced parameter count mean they're less vulnerable to attacks. This stronger security helps businesses better control their AI systems, protect sensitive information, and enhance their overall cybersecurity.

From a financial perspective, SLMs offer a compelling, cost-effective solution for organizations looking to leverage AI capabilities.

By minimizing infrastructure costs and resource demands, SLMs help businesses manage expenses while adhering to resource constraints. This affordability enhances the appeal of SLMs as a practical and sustainable choice for integrating AI technologies into business operations.

Uses of small language models

Small language models are versatile and efficient, working well across different platforms and applications due to their lightweight design. They find integration in various contexts. For example:

Mobile applications

Small language models boost mobile apps by requiring less memory and processing power, making them ideal for smartphones. They enable offline functionalities such as nearly-human chatbots, language translation, text generation, and text summarization. Additionally, they reduce costs by cutting cloud reliance and enhance user experience with faster, on-device processing.

Web browsers

Within web applications, these SLMs can enhance the user experience by providing language-related functions such as auto-completion when typing, grammar correction, and sentiment analysis, which has the capability of spotting emotionally saturated terms and suggesting alternatives. For example, instead of going with a blatant “Just do it..", it could suggest an alternate response, like "Maybe you would consider…”.

IoT devices

In IoT devices, small language models enable functions like voice recognition, natural language processing, and personalized assistance without heavy reliance on cloud services. This optimizes both performance and privacy. Right now, Alexa and other home devices have to consult with international servers to turn your Smart Lights or IoT devices on and off. That should be entirely local, with no outside consultation, and now it can be.

Edge computing

Small language models shine in edge computing environments, where data processing occurs virtually at the data source. Deployed on edge devices such as routers, gateways, or edge servers, they can execute language-related tasks in real time. This setup lowers delay and reduces reliance on central servers, improving cost-efficiency and responsiveness.

What are real-world examples of small language models?

LLaMA – a breakthrough in mobile text generation

At Netguru, we delivered a proof-of-concept (POC) application that integrates the LLaMA model with Apple's Transformers architecture, allowing us to deploy this advanced machine learning model on iPhone devices. This accomplishment was the result of extensive feasibility research, prototyping, and rigorous performance testing conducted on iPhone hardware.

The application offers users a unique experience by allowing them to input text and receive relevant information. It supports various uses like summarizing documents, assisting with writing, and generating creative content.

A standout instance of this application involves generating comprehensive content from minimal text inputs. If I were to provide a brief description of my favorite restaurant, the application would generate an in-depth review covering factors such as the ambiance of the venue, the quality of the food, and even a summary of the overall experience. This functionality has the potential to change how users access and interact with information, streamlining the process.

With our society's notable decrease in attention span, summarizing lengthy documents can be extremely useful. As well, it can aid in essay composition and even create humorous content. Its ability to accelerate text generation while maintaining simplicity is especially beneficial for users needing quick summaries or creative content on the go.

Another striking feature of the application is its text prediction capability—you've probably experienced that with your smartphone trying to help you out by supplying a likely "next word." Beyond that simple task, users can now input a sentence, prompting the application to generate further information or complete thoughts.

This assists in the writing process, stimulating creative ideas, or facilitating any task that requires more text based on the initial input. Arguably, the machine itself is not creative; it can only repeat what it has been trained to recognize as "creative," but that can stimulate human creativity by presenting a new way to look at a subject.

Phi-1 – the compact transformer elevating Python coding

The language model phi-1 stands out as a specialized transformer with 1.3 billion parameters, designed for fundamental Python coding tasks. Despite its tiny model size compared to others, phi-1 offers remarkable performance. Its training included various data sources such as Python code subsets, competition code from coding contests, and synthetic Python 'textbooks' and exercises created by GPT-3.5.

Over a span of 4 days, phi-1 underwent rigorous training on 8 A100s, leveraging a combination of "textbook quality" data from the web (6B tokens) and the aforementioned synthetically generated textbooks and exercises with GPT-3.5 (1B tokens).

Despite its modest scale in comparison to contemporary LLMs, phi-1 has showcased an impressive accuracy rate, surpassing 50% on the simple Python coding benchmark, HumanEval, and nearly 56% on MBPP.

It also includes prominent new features compared to earlier versions. Before fine-tuning on a dataset of coding exercises, the model was at the phi-1-base stage. It then progressed to phi-1-small, an even smaller model with 350 million parameters. This model was trained using the same methods as phi-1 and achieved a commendable 45% accuracy on HumanEval.

The future of small language models

Looking ahead, the future of small language models (SLMs) appears promising. With ongoing advancements in training techniques and architectural enhancements, small language models are set to greatly improve in capabilities. These enhancements will equip SLMs to handle tasks traditionally performed by larger models.

As their functionality increases, SLMs are expected to become more integrated into everyday technology, such as in synthetic personal assistants and more intuitive device interfaces.

Ultimately, the future will provide privacy first, instead of sending all the data to an AI model provider. We'll see on-device usage entirely locally instead of using a lot of computing power on servers.

Finally, we can expect efficiency through specialization, case-tuned answers, and higher determinism because you'll know what to expect from the AI model.