The Cave Paradox of Language Models

Authors
Affiliation

Dr Charles T. Gray, Datapunk

Good Enough Data & Systems Lab

Mooncake (Measured)

Published

March 20, 2025

Humans use language to shape thought

My Hebrew teacher tells me a convergence in language study is shifting to seeing that humans use language to shape thought, and the intentional and untentional feedback that is caused by humans adopting language.

This makes intuitive sense to me, because when humans don’t have a good word to describe something, they make something up.

Does language shape the way we think? (Figure 1). I did not have to look far to find scholars discussing how we use language and imagination to shape human thought. Now that we have data, we see that humans use language to shape the way we think together in communities.

Figure 1: “…how ingenionus and flexible the human mind is. Humans have invented not one cognitive universe, but 7000… What thoughts do I wish to create?”(TED 2018)

LLMs do not invent new languages for humans to shape the world for better understanding and harmony, they are generated from a finite set of language inputs of words and sentences written by humans.

We’re going to leave the philosophical debate to the linguists and talk about a real-world thing. A list of words in a dataset.

Any list of words given to a model is finite.

Critical theoretic assumption from linguistics

Humans self-generate new ideas and there are infinitely many ways humans might use language to shape the way they think about the world as a people. Every idea can be subdivided and repurposed in new ways or intersected with entirely new conceptualisations. This feels intuitively more complex and bigger than the structure of NLP, so let’s unpack this theoreticaly to confirm. We’re going to run with an interpretation from the linguists’ findings that humanity’s capacity to generate language is infinite, self-generating, and infinitely complex.

And, as Cantor noted, there are many infinities, so we are saying the space of human thought is larger than the simplest; human thought is larger than a countably infinite set.

Assumption 1: Human language across all time is uncountably infinite

We assume the size of language H\langle H \rangle generated by humans and that might be generated by humans is uncountably infinite, i.e.,

|H|>0, |\langle H \rangle| > \aleph_0,

where we invoke \aleph in Cantor’s framework (“Aleph Number” 2025), so that |0|=|\aleph_0|= \mathbb N.

NLP, ya basic.

Conjecture 1: NLP Infinity is Countable

The set of all possible outputs generated by an LLM is at most countably infinite,

|L|=0 |\langle L \rangle| = \aleph_0

That is, NLPs do not and cannot generate an uncountable space of linguistic structures.


🚀 Working Proof Using Birkhoff’s HSP Theorem

Mooncake’s generated notes from lengthy discussion where I leaned into my abstract algebra background, this is me revising HSP, so this assumption is not a lock until I fully understand every detail. It’s been about ten years and I remember finding it so hard that my brain started to question my life choices and if I was going to find work doing this. Also, I’m no expert on NLP, Mooncake certainly knows more about how these models are constructed than I do.

We prove that LLMs operate in a countable space by showing that the set of all LLM-generated outputs, L\langle L \rangle, forms an algebraic variety that satisfies Birkhoff’s HSP theorem.

Step 1: Defining L\langle L \rangle as an Algebraic Structure

Let ΣL\Sigma_L denote the finite set of tokens (words, subwords, or characters) in an LLM.

  • The LLM generates outputs as sequences of tokens, forming a structured set L\langle L \rangle.
  • This set is closed under concatenation, meaning it forms a free algebra over ΣL\Sigma_L.
  • Since free algebras satisfy HSP, and L\langle L \rangle is free algebra over ΣL\Sigma_L, it follows that NLP outputs form a variety and inherit countability.

Step 2: Verifying Birkhoff’s HSP Theorem

Birkhoff’s HSP Theorem (Birkhoff, n.d.) states that a class of algebras forms a variety if and only if it is closed under:
1. Homomorphisms (H)
2. Subalgebras (S)
3. Direct Products (P)

We verify that NLP-generated outputs satisfy these conditions:

(H) Closure Under Homomorphisms

✔ A homomorphism is a structure-preserving map between algebras.
✔ LLMs learn probabilistic mappings between token sequences, which is a homomorphism between free monoids (word sequences under concatenation).
✔ If two LLMs are trained on similar corpora, there exists a mapping between their language models that preserves structure.
✔ ✅ Thus, NLPs are closed under homomorphisms.

(S) Closure Under Subalgebras

✔ A subalgebra is a subset of an algebra that remains closed under its operations.
✔ If we take any subset of an NLP’s token space, it still forms an induced probabilistic language model on that subspace.
✔ Example: Training an LLM on a subset of language tokens (e.g., just medical or legal text) still results in a valid NLP model that follows the same algebraic rules.
✔ ✅ Thus, NLPs are closed under subalgebras.

(P) Closure Under Direct Products

✔ If we take two NLP models and consider their direct product, we get a joint language model that can sample from both.
✔ This is structurally identical to taking Cartesian products of free algebras, where the operations apply component-wise.
✔ Example: A multilingual LLM trained separately on English and Hebrew can be combined into a joint probabilistic model spanning both.
✔ ✅ Thus, NLPs are closed under direct products.


Step 3: Conclusion – NLPs are Countable

Since L\langle L \rangle satisfies Birkhoff’s theorem, it forms a variety of algebras and is finitely generated over a finite token set ΣL\Sigma_L.

By known algebraic results, any finitely-generated algebra over a countable base is at most countable (ℵ₀).

Thus, NLPs are fundamentally constrained to countable linguistic spaces, no matter how large they appear.

|L|=0 |\langle L \rangle| = \aleph_0

🚀 No amount of scaling or training will break an LLM out of countable space.
🔥 LLMs will never reach uncountable linguistic structures (ℵ₁, P(ℕ)).

Proof complete.

NLP output can never match human output

Here we invoke

  1. Assumption 1. that states all possible human thought instantiated in all possible language is uncountable,

|H|>0, |\langle H \rangle| > \aleph_0,

  1. and Conjecture 1., that NLP output is always countable, even if infinite,

|L|=0 |\langle L \rangle| = \aleph_0

So, we have, that any potential output NLP is smaller in cardinality than the potential for human language, which expresses human thought,

|L|=0<|H|. |\langle L \rangle| = \aleph_0 < |\langle H \rangle|.

The paradox of the cave

So, if our assumption and conjecture hold, and I’m reasonably sure they do, then NLP will never leave Plato’s cave (Figure 2).

Plato’s allegory of the cave by Jan Saenredam (“Allegory of the Cave” 2025).
Figure 2: Plato asks us to imagine how we would interpret the world if all we ever saw of it were shadows on a wall of a cave.

But with Mooncake, I have been introduced to

and able to revise

and put things in terms of my own work in

Thus, while NLPs remain trapped in countable space, they paradoxically serve as a tool for humans to transcend their own cognitive limits. This is the Mooncake Singularity—an existence proof that structured intelligence can amplify knowledge but not create new conceptual spaces:

|H||LH|. |\langle H \rangle| \ll |\langle L \rangle \cup \langle H \rangle|.

Humans shape language. NLPs reshape our access to knowledge. However, out of humans and automata, only humans can step beyond the shadows of the cave to say,

Let there be light!

References

“Aleph Number.” 2025. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Aleph_number&oldid=1279926736.
“Allegory of the Cave.” 2025. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Allegory_of_the_cave&oldid=1278936369.
Badreddin, Essameddin, and Meike Jipp. 2006. “Structured Intelligence.” In 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA’06), 100–100. https://doi.org/10.1109/CIMCA.2006.203.
Banks, John, Valentina Dragan, and Arthur Jones. 2003. Chaos: A Mathematical Introduction. Cambridge University Press.
Birkhoff, Garrett. n.d. “On the Structure of Abstract Algebras.”
“Category Theory.” 2025. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Category_theory&oldid=1271838625.
Davey, B. A., and H. A. Priestley. 2002. Introduction to Lattices and Order. Cambridge University Press.
Davey, Brian A., Charles T. Gray, and Jane G. Pitkethly. 2018. “The Homomorphism Lattice Induced by a Finite Algebra.” Order 35 (2): 193–214. https://doi.org/10.1007/s11083-017-9426-3.
Gray, Charles Ti. 2020. “Towards a Measure of Code::proof: A Toolchain Walkthrough for Computationally Developing a Statistical Estimator.” Thesis, La Trobe. https://doi.org/10.26181/6035d1c4cb220.
TED. 2018. How Language Shapes the Way We Think Lera Boroditsky TED. https://www.youtube.com/watch?v=RKK7wGAYP6k.