Run it Fast Locally

It’s a challenge to have all this running so that it can scale and you can manage it with ease. How Systems kernel combines the latest advances in web-scale distributed computing with the special needs of Generative AI models. We transparently run models across PC, mobile, department servers, corporate machines and the cloud giving you the best tradeoff in terms of cost and performance.

Technology has matured significantly in the last year, the critical technology is the Small Agent Framework for the Enterprise (SAFE™ AI). Instead of large uncontrolled models, the heart of the new approach takes small Large Language Models (sLLMs) and programs each individually and puts them into a framework that checks the input and output of each. Such a system is. controllable and easy to deploy. Here is how it works:

Small means really small, we work with models that are 100-1000x smaller than things like ChatGPT. Small Models (with less than 8B parameters) have some terrific advantages. They run fast because the number of computations is significantly reduced. They require less memory so they can fit in smaller machines. All this combines into models that run nearly instantly on PCs, phones or inexpensive commodity servers. This reduces cost and enables a whole generation of open source models that are your proprietary models and which can’t leak data out of your enterprise.

Small models know less than bigger ones, so we solve that problem by deploying many agents working together. Each agent is specialized and highly tuned. This makes it hard for them to hallucinate and easier for us to control them. By deploying a whole series of agents, we get the same power of a large model, but with ability to tune and control them.

While most systems require that you write a custom computer program to put these agents into a workflow or system, we use the latest frameworks to make it possible to execute dozens if not hundreds of agents in a coordinated way.

And we do this without generating more hard to maintain code, but by creating static manifests that are easy to read and maintain and deploy.

And most important we use all these components to provide enterprise features. We have special agents that just explain the methodology used for a recommendation. We use more agents to be “skeptics”. They analyze a memo and tell you all the reasons it might be inaccurate (so you can fix that and create an even more air-tight argument). Finally, we have logging agents that record events and determine what is happening in the system.