Trust is Triangulation: Cracking the Code of Trust in LLMs
Today’s LLM Architectures
Without going too far into philosophy, I assert that things that we trust are mostly grounded in triangulation. Confirmation from multiple sources is sought and the result is judged more or less trustworthy based on the relative agreement or disagreement from those sources. One can quickly go down the rabbit hole here: how trustworthy are the sources? What if they sort of agree? These are challenges, but the general idea still applies.
Trust in LLMs today is very restrained, and with good reason. While the systems have enormous range and astonishing capability to both interpret and generate plausible human language, they are also capable of “hallucinations” or language that contains significant factual errors. As a result, when we need a trustworthy system, we rule out the use of LLMs. When driving a car, making a medical diagnosis, or for any other systems where the output must be “trustworthy”, LLMs are not (yet) an option.
Creating trustworthy systems using LLMs requires adding some sort of validation that the LLM output is comparable to what we would get from a current trustworthy system, such as other computer systems or human systems. For example, home security systems often consist of a computer system with sensors in the home combined with remote monitoring by humans. LLMs might achieve this through modifications or additions to their own products. LLMs could also be made part of systems with components that lead to improved trustworthiness.
For example, LLM maker Anthropic has been publishing research about Constitutional AI. A grand oversimplification of this approach is using one AI to check another, or triangulation. Anthropic recently published another paper attempting to better understand how LLM output is generated. This might enable modifying LLMs’ internal models to produce more trustworthy outputs.
These and other efforts are laudable, but the results are not in and it’s not clear that these efforts will address the problem of trust in LLM output. LLM outputs are probabilistic. The goal embodied in the software algorithms of LLMs is to produce plausible language output, both syntactically and semantically. It is remarkable how far they have come in such a short period of time, but correctness of output is not an LLM goal. In fact, LLMs don’t understand what correctness is. They don’t understand what trust is. LLMs don’t have goals of this nature because they can’t have goals of this nature. This is a critical concept for all to understand; LLMs can’t have trustworthiness as a goal because their construction doesn’t contain this level of semantics. This characteristic of LLMs makes it hard to determine if adding multiple LLMs to check on each other will ever produce a truly trustworthy system.
LLMs don’t contain any actual model of the world “out there”. LLM output is determined by a set of dials that rates the likelihood that humans will find that output plausible. They do more than just string together syntactically correct phrases and sentences. LLMs have a notion of context involving multiple words that make sense when used in a certain order. However, this makes it more likely that LLMs will confuse us, because the prose they produce is confident and convincing, even if it consists of entirely fabricated content.
Current triangulation architectures
In my last article, I talked about Retrieval Augmented Generation (RAG), the most popular current architecture for deploying LLMs. This is an early attempt at a type of triangulation. The LLM is augmented using a vector database that supplies additional context to the LLM. A critical point is that the vector database contains information deemed to be trustworthy, and much more relevant to the task being performed. For instance, the vector database for an auto repair shop might contain information from all the repair manuals of the cars that the repair shop services.
Early uses of RAG have proven it is very effective. RAG attempts to create more trust in the output of the system by adding context to the input. Modifying the input in this way attempts to steer the LLM toward more trustworthy output. However, there is still the problem that the LLM output hasn’t been validated. The fundamental problem of probabilistic output remains. I am skeptical of claims that the use of RAG eliminates hallucinations. The LLM output must be validated to be trustworthy.
“Human in the loop” is the phrase that has been used to describe another current form of triangulation. A human is involved in evaluating and modifying the LLM output as needed. This is the ultimate form of triangulation. However, “human in the loop” also has issues. Humans are not perfect, either, and mistakes can still result, though generally very different types of mistakes. In fact, extra diligence must be applied when humans evaluate LLM output because LLM output is so convincing that treating the LLM as human is unavoidable (read Thinking, Fast and Slow for more on this type of problem). As a result, we give more credence to LLM output than is desirable for validation purposes.
Because it is also much harder to scale systems with humans in the loop to deal rapidly with large sets of LLM output, both vendors and consumers are motivated to find solutions that don’t rely on so much human involvement. Today, however, “human in the loop” is still the gold standard for building trustworthy LLM systems.
Next up: Future triangulation
In my next article, I will discuss reasoning systems that do have a model of the world “out there”. These systems can potentially provide near “human in the loop” validation when used for triangulation.