When we ask if AI knows anything, we are, in the strictest sense, not referring to memory or experience as humans would. Instead, we are exploring a very complex mathematical domain in which AI predicts what comes next in a language. Upon realization, AI is not a particular source of truth; it is a system that simulates understanding through patterns, probabilities, and memory architecture. This article attempts to unravel the puzzle of how AI converts text into knowledge-like predictions, from tokens and embeddings to the machines that carry out these operations.
From Words to Tokens
AI does not interpret after human fashion. Upon encountering the sentence “The moral of Snow White is to never eat …,” it first converts it into some string of tokens-the smallest units it can process. Tokens can be whole words, parts of words, punctuations, or spaces. For example, the sentence above would be tokenized as:
[“The” | ” moral” | ” of” | ” Snow” | ” White” | ” is” | ” to” | ” never” | ” eat”]
This conversion is only the initial step of a highly structured process that takes human language and converts it into something an AI can work with.
Embeddings: From Tokens to Numbers
Upon tokenization, each token is mapped to an embedding-an abstract numerical representation revealing the statistical relationship S-theory between words. These embeddings exist in a high-dimensional embedding space-theoretical map of word associations learned after the analysis of great volumes of text. Words that appear in similar contexts cluster together-not really because the AI “understands” them in the human sense-but because language-based hypothesis-building patterns suggest they are related. For instance, “pirouette” and “arabesque” might cluster together, just as “apples” and “caramel.” The AI does not comprehend these words in human terms; it simply recognizes patterns of their co-occurrence.
Simulated Knowledge
Human beings derive meaning from experience, culture, and sensation. AI, on the other hand, simulates knowledge. So, when arguing for sentence completion, it invents statements: “food from strangers,” “a poisoned apple,” or simply “apples.” Each is statistically plausible, yet none comes from comprehension. AI is about predicting what is likely to be next, not what is “true” in a human sense.
The Abstract World of the Embedding Space
Embedding space is where AI’s predictions live. Each word becomes a point in hundreds or thousands of dimensions, having something to do with the patterns of meaning, syntax, and context. For example, in a simplified 2D space, “apple” might cluster near “fruit” and “red.” Add more dimensions, and it could relate to “knowledge,” “temptation,” or even “technology,” denoting its cultural and contextual associations.
Because such spaces are high-dimensional, they cannot be directly visualized, but serve as a backdrop against an AI’s scenario of language prediction. The AI does not consider concepts or narrative tension; it calculates statistically coherent sequences.
From Math to Memory
These embeddings are not just theoretical matrices; they require physical memory. The embedding of each token consists of hundreds or thousands of numerical entries, which are stored in various memory systems and worked upon by hardware. As the size of the AI model increases and it accords with more tokens, memory turns out to be one major issue, regarding the speed and complexity of predictions.
Originally created for scientific work, High-bandwidth memory (HBM) would be applied towards AI so models can efficiently handle overwhelming amounts of data. Memory is no longer merely a storage device; it determines the amount of context an AI remembers from training examples and how quickly it accesses this information to make predictions.
Looking Ahead
The knowledge base of an AI has always depended on what the AI can hold in-memory. As longer conversations or more complicated prompts would require more tokens and embeddings, so would the memory requirements. These limitations end up shaping the way the AI represents the context and keeps coherence in text generation.
Understanding AI’s statistical and hardware basis does not undermine the usefulness of AI; rather, it sets its interpretation to that of a very complex system of probabilities and memory, instead of some kind of conscious understanding.
(This article has been adapted and modified from content on Micron.)