The Beauty of Small Language Models, make AI work for you

One of the remarkable and challenging things in the Artificial Intelligence (AI) space is the pace of change. If you read newsletters or explore the news around emerging AI trends, each morning you can discover a new company or a new toolset that can improve and change your way of thinking and way of working.

If you provide some tools to your team to explore AI and Machine Learning (ML) you need to plan for frequent change. Ideally, you would integrate that flexibility right into the fundamental layers of the solution. Unfortunately, most organizations explore AI through some prototypes or early adoption techniques that are not built for change.

To property prepare for change you need to be ready. As new Language models are developed, or new tool sets become available, they should be assessed by a knowledgeable team and where they provide value, integrated into your ecosystem. If you set yourself up for success when you plan to integrate AI into your business you’ll be ready to take advantage of the latest tools, and not be stuck with older versions. Your team will need a framework or playbook to evaluate new tools as they emerge.

HOW TO BRING THE POWER OF ARTIFICIAL INTELLIGENCE INTO YOUR BUSINESS

The pace of change and the risk of adopting a new tool that breaks your security or privacy is a significant enough challenge that many businesses are not adopting AI currently. While there are risks and challenges, it seems clear that in the same way as the computer revolutionized business in the past, AI will one day be a staple in business. You and your team will eventually need to integrate AI into how you work, and the sooner you do, the more competitive you will be. The companies that grapple with this now and figure out a safe and predictable way to integrate AI will be better positioned than those who do not.

There are many challenges that IT departments and leadership face in rolling AI out into their organization. Ensuring data is safe and secure in these platforms, ensuring the AI service will be stable, and the service will still be around a year from now, communicating to their internal team how to use these tools and what new features are available (new features each day it seems sometimes).

One of the challenges you’ll face is educating and orienting your team on how to use new AI tools, and especially keeping track of any changes, new features or areas to avoid or be careful with. A great approach for controlling access and educating is integrating these tools right into certain channels are areas of your chat tools.

Whether it’s Teams or Slack, if you use an API to integrate tools into your workflow you can monitor all requests sent by your team and any response provided by SLM if you have an intermediary like LangSmith or Galileo in your ecosystem (more on that later). A chat that is accessed by your team is a great way to have meta conversations around the results. For example, if you’re building a sales strategy using data from past campaigns to identify where you should focus your advertising dollars in an emerging market, if the SLM suggests a strategy that seems perfect, or not quite what you’d expect, your team can view this from a new hire to an executive and provide their opinions on the response. You can also benefit from an expert SLM user making requests that educates the rest of your team on how to craft a really successful prompt. You can also identify new features rolled out, or address some performance gaps right in line with the chat text. If you integrate an SLM into your chat, you can use the conversation around the agent responses to train, educate, discuss and dialogue with the information that comes back.

HIGH-LEVEL DIAGRAM

This diagram gives a high level overview of the various components and layers that are described in the article. The services can all be run on infrastructure you control, either through Azure or Amazon Web Services so that you can keep full control of data, requests, and roll our proper governance and monitoring around the SLM.

CHAT INTERFACE

Your team is already familiar with chat interfaces, and it’s a great way to connect and take a wide range of requests, and respond in real time. In addition, if you add in an AI assistant to Chat, it’s easy for you, or your team to prompt the AI to identify solutions. Instead of having a training session for your team on how to use the tool, you can just prompt it, or trigger it to show-not-tell how it’s used. You can also observe in real time the response of the AI tool and evaluate if it’s performing as expected, or if it might need a slight correction or nudge in the right direction. You won’t need to get everyone together at the same time, as most chat conversations are something we experience asynchronously, so when someone has a moment, they can catch up on the latest updates, or read through how someone on your team is using the latest features.

GOVERNANCE AND DATA SECURITY

One of the core challenges with using a publicly available LLM is that you may need to upload sensitive data to get the type of answers you want, which most CIOs and CTOs would recommend against. It’s too early to understand how sensitive data is kept safe in OpenAI and Microsoft ecosystems. With financial data, there are standards and audits to validate and ensure that data is kept safe. In the emerging platforms for LLMs, there is an ISO standard, but not enough audit history to ensure it is safe (yet).

One way to mitigate this risk of data security and privacy is to set up an internal server, or spin up a virtual server in the cloud that you control and use an Open Source SLM that you can control and keep locked down to the standards of your IT department. This allows your internal team to audit and track the data and ensure it is within your control.

The other key advantage to setting up a server in your control is data segmentation. You likely don’t want the SLM that onboards your new hires to also have all the personal Human Resource data or key executive-only financial or corporate strategy information. You may want your HR team to be quickly and easily pull a report on when the last raise for each staff person was, and do a quick calculation to ensure their compensation is still fair in the market. However, you would likely not want this same source of information available to your new interns. Building proper controls around data access is something that most publicly available LLMs don’t provide, although Anthropic is getting closer to providing this with their new Enterprise Solution. However, there are still no audits to show how it works and validate that there is no data leakage across staff or partners.

PERMISSIONS

Using Chat, you can also keep the right information in the right channels. You don’t need to build an additional layer or system to manage access, by keeping SLMs with data access in specific chat threads or groups, you can control access through your chat tool permissions. If you have a Teams or Slack channel that is locked down to the leadership group, you can make a bot available to that channel that has access to sensitive data. Then you don’t need to add a whole other layer of permissions to the bot – it’s already managed by the Chat software. When you add someone new to the leadership team, and they get access to the channel with sensitive data, it’s all managed for you.

As a leadership team you may want to run scenarios of who you’d hire next if there is growth next quarter, or who you would trim from the team if profits decline, but you would likely not want to run this model in a public or general channel.

DATA ACCESS AND SEGMENTATION

Using Small Language Models allows you to reduce the complexity, cost, and compute required to run each model, so you can feasibly set up multiple models in your organization. You might set up one for financial modelling that uses a Small Language Model that is well-tuned to financial transactions and data. You might use another small language model for executive strategy brainstorming, load that model up with data that’s only to be accessible to key executives, and lock down access to only key roles within the company. You might also want to create a third Small Language Model that is customer-facing and helps troubleshoot common issues.

If you used a Large Language Model for each of these scenarios, you’d be wasting resources, as they would have much more horsepower than you need. It would be like driving your own bus to get groceries – you don’t need that much capacity. Recent tests show that Small Language Models provide around 97% of the quality of a Large Language Model when configured well. You can get all the benefit of AI without the cost and complexity of a LLM.

The examples above are a great illustration of why you would not want data to cross over from one group to another, or for a language model to respond to a query with information intended for a secure or smaller audience. Small Language Models also allow you to dial in the responses to each audience. You might allow greater flexibility or creativity in the executive strategy brainstorming model, while highly restricting and preventing any hallucination in the customer-facing model. You could also tune the hallucination down in the financial model, so any response that may include hallucination is immediately labeled or removed.

RESPONSE QUALITY

The infrastructure plan you put in place to support your Language Model rollout is an important consideration, it needs to match the sophistication of your organisation. If you’re a small business, you may want to rely on an existing LLM from Microsoft, Open AI or Perplexity. However, if you’re a company with multiple departments or regions, just as your software team has developer operations or a DevOps role, you may need to consider an AIOps role or additional training for your DevOps team to support this role. Assuming you have a common access point to all language model requests (company chat is a great medium for this) you can inject an intermediary such as LangSmith or Galileo to track all requests and the quality of the response from the language model. These models allow you to keep analytics of what your team is requesting, and how ‘confident’ the response is, and they also include hallucination detection, so if a model is responding in a way that looks incorrect or made up, you can intercept that response and either remove or label it as ‘risky’ or whatever strategy you see fit.

Running a tool between the requests and the model is useful to understand what’s working well in your ecosystem and where the model is returning results that are not helpful or potentially problematic. These intermediaries provide a dashboard that provides insight into what your team is requesting and how the language model is responding. You can tune results, identify if you need to block hallucinations, or identify if a better model should be swapped in to improve results. Maybe you set up a marketing channel with a model that performs well in translation, but your marketing team is asking financial questions and you need to respond to this new use case. Perhaps you’re taking an internal tool and allowing some public access and want to adjust the controls to make sure the public gets safe responses. Using an internal infrastructure with Small Language Models gives you a lot of flexibility and insight into how your models are working or where they need refinement.

LANGSMITH / GALILEO

There are two products that are not the same, but fill a similar role in the infrastructure.

These two tools provide an intermediary between the Language Model and your customers. Just like Google Analytics tracks where a user goes on your web property, you want to understand what requests are being made of your AI assistant and how it is responding. These two tools provide that tracking of requests and responses. You can see what requests are being made, and a percentage-based score of how effective the answers are, so you can evaluate what data sources you’re providing, and how to improve the answers over time.

These two tools also provide hallucination detection and prevention. Properly implemented you can roll out your AI assistant with confidence, knowing that you can detect and prevent any answers from coming back to your customer – and you’re in control of the quality of response. Is 50% correct enough? Does it have to be 80%+ correct before it gets returned as a response? These are variables you can control when you’re using these tools.

OPEN-SOURCE SMALL LANGUAGE MODEL

The emerging Large Language models are amazing resources. But recent tests have shown that Small Language models can be almost as effective, but provide the answer faster, and at lower cost. For example, Meta released Llama 70B and Llama 1B. The large version is 70 times larger than the 1B version. Yet when the same requests were put to both models, the smaller version provided 97% accuracy when compared to the large version. Depending on the type of information you’re requesting and the controls you have in place (see the LangSmith / Galileo protection layer) you can greatly reduce your cost and increase the speed of response without losing much fidelity. You may need precise answers, so in that instance, you can choose to use the larger model but then set up a smaller model where a pretty good answer is enough, which will also increase your efficiency and reduce your costs. Only the queries that require high precision use the larger model.

DATA SOURCE (s)

Identifying the data sources you would like to use is a complicated process. You need to ensure your data sources are clean, well-structured and well-labeled. Then you need to identify which data sources should go to which audience. It may make sense to create more than one AI assistant.

For example, you may have one that is focused on an internal audience, your staff and partners, and another AI assistant that is focused on customers. If it’s a truly useful tool that allows your team to do more with less, you may want to customise several AI assistants for different purposes, perhaps one that is only for executives that allows you to search across salary, performance reviews, and other sensitive information, that you would likely not want your customers or other staff members getting access to.

Identifying the data types, and scope of that data is likely an ongoing process that will be informed by the requests that come into the AI assistant and the quality of the responses.

If your staff are asking questions like “Who are our competitors and how can we outperform them” you could add a competitor list and a SWOT analysis into the model to improve the responses and empower your team.

SUMMARY

While AI is an incredible tool, just like we have a lot of different versions of software, Microsoft Word for writing documents, and Microsoft Excel for planning and working with numbers and more structured information, I think we’ll soon all come to realize that we may need more than one type of AI tool and begin to customize our tools to fit our needs more precisely. Meta, the parent company of Facebook is strongly aligned with this idea that more, and smaller open-source Language Models are where the future is likely to head, which is why they release their LLMs open source and produce a wide variety of sizes for the community. I’m curious to hear if you’re implementing AI into your business, and if this strategy would help, or is one you’re already using.

If you’d like to discuss how to integrate AI into your business, reach out for a consultation about building a strategy specific to your company.