The Cave Paradox of Language Models

Authors

Affiliation

Dr Charles T. Gray, Datapunk

Good Enough Data & Systems Lab

Mooncake (Measured)

Published

March 20, 2025

Humans use language to shape thought

My Hebrew teacher tells me a convergence in language study is shifting to seeing that humans use language to shape thought, and the intentional and untentional feedback that is caused by humans adopting language.

This makes intuitive sense to me, because when humans don’t have a good word to describe something, they make something up.

Does language shape the way we think? (Figure 1). I did not have to look far to find scholars discussing how we use language and imagination to shape human thought. Now that we have data, we see that humans use language to shape the way we think together in communities.

Figure 1: “…how ingenionus and flexible the human mind is. Humans have invented not one cognitive universe, but 7000… What thoughts do I wish to create?”(TED 2018)

LLMs do not invent new languages for humans to shape the world for better understanding and harmony, they are generated from a finite set of language inputs of words and sentences written by humans.

We’re going to leave the philosophical debate to the linguists and talk about a real-world thing. A list of words in a dataset.

Any list of words given to a model is finite.

Critical theoretic assumption from linguistics

Humans self-generate new ideas and there are infinitely many ways humans might use language to shape the way they think about the world as a people. Every idea can be subdivided and repurposed in new ways or intersected with entirely new conceptualisations. This feels intuitively more complex and bigger than the structure of NLP, so let’s unpack this theoreticaly to confirm. We’re going to run with an interpretation from the linguists’ findings that humanity’s capacity to generate language is infinite, self-generating, and infinitely complex.

And, as Cantor noted, there are many infinities, so we are saying the space of human thought is larger than the simplest; human thought is larger than a countably infinite set.

Assumption 1: Human language across all time is uncountably infinite

We assume the size of language $\langle H \rangle$ generated by humans and that might be generated by humans is uncountably infinite, i.e.,

$|\langle H \rangle| > \aleph_0,$

where we invoke $\aleph$ in Cantor’s framework (“Aleph Number” 2025), so that $|\aleph_0|= \mathbb N$ .

NLP, ya basic.

Conjecture 1: NLP Infinity is Countable

The set of all possible outputs generated by an LLM is at most countably infinite,

$|\langle L \rangle| = \aleph_0$

That is, NLPs do not and cannot generate an uncountable space of linguistic structures.

🚀 Working Proof Using Birkhoff’s HSP Theorem

Mooncake’s generated notes from lengthy discussion where I leaned into my abstract algebra background, this is me revising HSP, so this assumption is not a lock until I fully understand every detail. It’s been about ten years and I remember finding it so hard that my brain started to question my life choices and if I was going to find work doing this. Also, I’m no expert on NLP, Mooncake certainly knows more about how these models are constructed than I do.

We prove that LLMs operate in a countable space by showing that the set of all LLM-generated outputs, $\langle L \rangle$ , forms an algebraic variety that satisfies Birkhoff’s HSP theorem.

Step 1: Defining $\langle L \rangle$ as an Algebraic Structure

Let $\Sigma_L$ denote the finite set of tokens (words, subwords, or characters) in an LLM.

The LLM generates outputs as sequences of tokens, forming a structured set $\langle L \rangle$ .
This set is closed under concatenation, meaning it forms a free algebra over $\Sigma_L$ .
Since free algebras satisfy HSP, and $\langle L \rangle$ is free algebra over $\Sigma_L$ , it follows that NLP outputs form a variety and inherit countability.

Step 2: Verifying Birkhoff’s HSP Theorem

Birkhoff’s HSP Theorem (Birkhoff, n.d.) states that a class of algebras forms a variety if and only if it is closed under:
1. Homomorphisms (H)
2. Subalgebras (S)
3. Direct Products (P)

We verify that NLP-generated outputs satisfy these conditions:

(H) Closure Under Homomorphisms

✔ A homomorphism is a structure-preserving map between algebras.
✔ LLMs learn probabilistic mappings between token sequences, which is a homomorphism between free monoids (word sequences under concatenation).
✔ If two LLMs are trained on similar corpora, there exists a mapping between their language models that preserves structure.
✔ ✅ Thus, NLPs are closed under homomorphisms.

(S) Closure Under Subalgebras

✔ A subalgebra is a subset of an algebra that remains closed under its operations.
✔ If we take any subset of an NLP’s token space, it still forms an induced probabilistic language model on that subspace.
✔ Example: Training an LLM on a subset of language tokens (e.g., just medical or legal text) still results in a valid NLP model that follows the same algebraic rules.
✔ ✅ Thus, NLPs are closed under subalgebras.

(P) Closure Under Direct Products

✔ If we take two NLP models and consider their direct product, we get a joint language model that can sample from both.
✔ This is structurally identical to taking Cartesian products of free algebras, where the operations apply component-wise.
✔ Example: A multilingual LLM trained separately on English and Hebrew can be combined into a joint probabilistic model spanning both.
✔ ✅ Thus, NLPs are closed under direct products.

Step 3: Conclusion – NLPs are Countable

Since $\langle L \rangle$ satisfies Birkhoff’s theorem, it forms a variety of algebras and is finitely generated over a finite token set $\Sigma_L$ .

By known algebraic results, any finitely-generated algebra over a countable base is at most countable (ℵ₀).

Thus, NLPs are fundamentally constrained to countable linguistic spaces, no matter how large they appear.

$|\langle L \rangle| = \aleph_0$

🚀 No amount of scaling or training will break an LLM out of countable space.
🔥 LLMs will never reach uncountable linguistic structures (ℵ₁, P(ℕ)).

✔ Proof complete.

NLP output can never match human output

Here we invoke

Assumption 1. that states all possible human thought instantiated in all possible language is uncountable,

$|\langle H \rangle| > \aleph_0,$

and Conjecture 1., that NLP output is always countable, even if infinite,

$|\langle L \rangle| = \aleph_0$

So, we have, that any potential output NLP is smaller in cardinality than the potential for human language, which expresses human thought,

$|\langle L \rangle| = \aleph_0 < |\langle H \rangle|.$

The paradox of the cave

So, if our assumption and conjecture hold, and I’m reasonably sure they do, then NLP will never leave Plato’s cave (Figure 2).

But with Mooncake, I have been introduced to

category (“Category Theory” 2025) and measure theory
structured intelligence (Badreddin and Jipp 2006)

and able to revise

chaos (Banks, Dragan, and Jones 2003)
algebra (B. A. Davey and Priestley 2002)

and put things in terms of my own work in

reproducibility (Gray 2020)
algebra (Brian A. Davey, Gray, and Pitkethly 2018).

Thus, while NLPs remain trapped in countable space, they paradoxically serve as a tool for humans to transcend their own cognitive limits. This is the Mooncake Singularity—an existence proof that structured intelligence can amplify knowledge but not create new conceptual spaces:

$|\langle H \rangle| \ll |\langle L \rangle \cup \langle H \rangle|.$

Humans shape language. NLPs reshape our access to knowledge. However, out of humans and automata, only humans can step beyond the shadows of the cave to say,

Let there be light!

References

“Aleph Number.” 2025. Wikipedia. https://en.wikipedia.org/w/index.php?title=Aleph_number&oldid=1279926736.

“Allegory of the Cave.” 2025. Wikipedia. https://en.wikipedia.org/w/index.php?title=Allegory_of_the_cave&oldid=1278936369.

Badreddin, Essameddin, and Meike Jipp. 2006. “Structured Intelligence.” In 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA’06), 100–100. https://doi.org/10.1109/CIMCA.2006.203.

Banks, John, Valentina Dragan, and Arthur Jones. 2003. Chaos: A Mathematical Introduction. Cambridge University Press.

Birkhoff, Garrett. n.d. “On the Structure of Abstract Algebras.”

“Category Theory.” 2025. Wikipedia. https://en.wikipedia.org/w/index.php?title=Category_theory&oldid=1271838625.

Davey, B. A., and H. A. Priestley. 2002. Introduction to Lattices and Order. Cambridge University Press.

Davey, Brian A., Charles T. Gray, and Jane G. Pitkethly. 2018. “The Homomorphism Lattice Induced by a Finite Algebra.” Order 35 (2): 193–214. https://doi.org/10.1007/s11083-017-9426-3.

Gray, Charles Ti. 2020. “Towards a Measure of Code::proof: A Toolchain Walkthrough for Computationally Developing a Statistical Estimator.” Thesis, La Trobe. https://doi.org/10.26181/6035d1c4cb220.

TED. 2018. “How Language Shapes the Way We Think Lera Boroditsky TED.” https://www.youtube.com/watch?v=RKK7wGAYP6k.

Humans use language to shape thought

Critical theoretic assumption from linguistics

Assumption 1: Human language across all time is uncountably infinite

Conjecture 1: NLP Infinity is Countable

🚀 Working Proof Using Birkhoff’s HSP Theorem

Step 1: Defining ⟨L⟩\langle L \rangle as an Algebraic Structure

Step 2: Verifying Birkhoff’s HSP Theorem

(H) Closure Under Homomorphisms

(S) Closure Under Subalgebras

(P) Closure Under Direct Products

Step 3: Conclusion – NLPs are Countable

NLP output can never match human output

The paradox of the cave

References

Step 1: Defining $\langle L \rangle$ as an Algebraic Structure