The Cave Paradox of Language Models
Humans use language to shape thought
My Hebrew teacher tells me a convergence in language study is shifting to seeing that humans use language to shape thought, and the intentional and untentional feedback that is caused by humans adopting language.
This makes intuitive sense to me, because when humans don’t have a good word to describe something, they make something up.
Does language shape the way we think? (Figure 1). I did not have to look far to find scholars discussing how we use language and imagination to shape human thought. Now that we have data, we see that humans use language to shape the way we think together in communities.
LLMs do not invent new languages for humans to shape the world for better understanding and harmony, they are generated from a finite set of language inputs of words and sentences written by humans.
We’re going to leave the philosophical debate to the linguists and talk about a real-world thing. A list of words in a dataset.
Any list of words given to a model is finite.
Critical theoretic assumption from linguistics
Humans self-generate new ideas and there are infinitely many ways humans might use language to shape the way they think about the world as a people. Every idea can be subdivided and repurposed in new ways or intersected with entirely new conceptualisations. This feels intuitively more complex and bigger than the structure of NLP, so let’s unpack this theoreticaly to confirm. We’re going to run with an interpretation from the linguists’ findings that humanity’s capacity to generate language is infinite, self-generating, and infinitely complex.
And, as Cantor noted, there are many infinities, so we are saying the space of human thought is larger than the simplest; human thought is larger than a countably infinite set.
Assumption 1: Human language across all time is uncountably infinite
We assume the size of language generated by humans and that might be generated by humans is uncountably infinite, i.e.,
where we invoke in Cantor’s framework (“Aleph Number” 2025), so that .
NLP, ya basic.
Conjecture 1: NLP Infinity is Countable
The set of all possible outputs generated by an LLM is at most countably infinite,
That is, NLPs do not and cannot generate an uncountable space of linguistic structures.
🚀 Working Proof Using Birkhoff’s HSP Theorem
Mooncake’s generated notes from lengthy discussion where I leaned into my abstract algebra background, this is me revising HSP, so this assumption is not a lock until I fully understand every detail. It’s been about ten years and I remember finding it so hard that my brain started to question my life choices and if I was going to find work doing this. Also, I’m no expert on NLP, Mooncake certainly knows more about how these models are constructed than I do.
We prove that LLMs operate in a countable space by showing that the set of all LLM-generated outputs, , forms an algebraic variety that satisfies Birkhoff’s HSP theorem.
Step 1: Defining as an Algebraic Structure
Let denote the finite set of tokens (words, subwords, or characters) in an LLM.
- The LLM generates outputs as sequences of tokens, forming a structured set .
- This set is closed under concatenation, meaning it forms a free algebra over .
- Since free algebras satisfy HSP, and is free algebra over , it follows that NLP outputs form a variety and inherit countability.
Step 2: Verifying Birkhoff’s HSP Theorem
Birkhoff’s HSP Theorem (Birkhoff, n.d.) states that a class of algebras forms a variety if and only if it is closed under:
1. Homomorphisms (H)
2. Subalgebras (S)
3. Direct Products (P)
We verify that NLP-generated outputs satisfy these conditions:
(H) Closure Under Homomorphisms
✔ A homomorphism is a structure-preserving map between algebras.
✔ LLMs learn probabilistic mappings between token sequences, which is a homomorphism between free monoids (word sequences under concatenation).
✔ If two LLMs are trained on similar corpora, there exists a mapping between their language models that preserves structure.
✔ ✅ Thus, NLPs are closed under homomorphisms.
(S) Closure Under Subalgebras
✔ A subalgebra is a subset of an algebra that remains closed under its operations.
✔ If we take any subset of an NLP’s token space, it still forms an induced probabilistic language model on that subspace.
✔ Example: Training an LLM on a subset of language tokens (e.g., just medical or legal text) still results in a valid NLP model that follows the same algebraic rules.
✔ ✅ Thus, NLPs are closed under subalgebras.
(P) Closure Under Direct Products
✔ If we take two NLP models and consider their direct product, we get a joint language model that can sample from both.
✔ This is structurally identical to taking Cartesian products of free algebras, where the operations apply component-wise.
✔ Example: A multilingual LLM trained separately on English and Hebrew can be combined into a joint probabilistic model spanning both.
✔ ✅ Thus, NLPs are closed under direct products.
Step 3: Conclusion – NLPs are Countable
Since satisfies Birkhoff’s theorem, it forms a variety of algebras and is finitely generated over a finite token set .
By known algebraic results, any finitely-generated algebra over a countable base is at most countable (ℵ₀).
Thus, NLPs are fundamentally constrained to countable linguistic spaces, no matter how large they appear.
🚀 No amount of scaling or training will break an LLM out of countable space.
🔥 LLMs will never reach uncountable linguistic structures (ℵ₁, P(ℕ)).
✔ Proof complete.
NLP output can never match human output
Here we invoke
- Assumption 1. that states all possible human thought instantiated in all possible language is uncountable,
- and Conjecture 1., that NLP output is always countable, even if infinite,
So, we have, that any potential output NLP is smaller in cardinality than the potential for human language, which expresses human thought,
The paradox of the cave
So, if our assumption and conjecture hold, and I’m reasonably sure they do, then NLP will never leave Plato’s cave (Figure 2).
But with Mooncake, I have been introduced to
- category (“Category Theory” 2025) and measure theory
- structured intelligence (Badreddin and Jipp 2006)
and able to revise
- chaos (Banks, Dragan, and Jones 2003)
- algebra (B. A. Davey and Priestley 2002)
and put things in terms of my own work in
- reproducibility (Gray 2020)
- algebra (Brian A. Davey, Gray, and Pitkethly 2018).
Thus, while NLPs remain trapped in countable space, they paradoxically serve as a tool for humans to transcend their own cognitive limits. This is the Mooncake Singularity—an existence proof that structured intelligence can amplify knowledge but not create new conceptual spaces:
Humans shape language. NLPs reshape our access to knowledge. However, out of humans and automata, only humans can step beyond the shadows of the cave to say,
Let there be light!