Library
llms-are-bayesian-in-expectation-not-in-realization

LLMs are Bayesian, in Expectation, not in Realization

ai-frameworkuncertainty-quantificationtransformers
arxiv.org

Large language models (LLMs) exhibit a striking ability to learn and adapt to new tasks on the fly—commonly referred to as in-context learning. Without modifying their underlying parameters, these models can generalize from just a few examples, a trait that has drawn comparisons to implicit Bayesian updating. However, emerging research highlights a key theoretical contradiction: transformers, the foundational architecture behind most LLMs, consistently violate the martingale property—a central requirement for Bayesian reasoning when dealing with exchangeable data. This insight invites a deeper exploration into the mathematical assumptions underlying modern AI systems, particularly in scenarios where quantifiable uncertainty is essential.

A recent theoretical analysis offers a nuanced view into the inner workings of transformers and how they handle uncertainty. Among the findings, positional encodings—integral to transformers’ sequence awareness—are shown to be a primary source of deviation from Bayesian expectations, leading to martingale violations of logarithmic order. Interestingly, despite these violations, the models still achieve information-theoretic optimality in expected prediction risk. This suggests that transformers are not failing to reason probabilistically, but rather are operating via a fundamentally different, yet still highly efficient, mechanism.

One particularly notable contribution is the derivation of the optimal length for chain-of-thought reasoning, presenting a concrete formula that balances computational cost with inference quality. Empirical tests on GPT-3 align closely with these theoretical predictions, showing near-perfect entropy efficiency within only 20 demonstration examples. These insights not only strengthen our understanding of how LLMs process new information, but also introduce practical tools for improving performance and reliability in real-world deployments. For developers and researchers alike, these findings present a compelling framework to rethink model interpretability, inference costs, and uncertainty quantification.

 Search Ask

Makeroom

RegisterLogin

Discussion

    General

    Tech

    Photos

    Music

Library

  • Chevron Right Icon
    Design
    • Resources
      • Websites
      • Chevron Right Icon
        Web development
        • Cool Libraries
          • Tools
            • Resources
            • Papers and Studies
              • Language Models
                • Chevron Right Icon
                  Random fun stuff
                  • Esoteric File Systems
                    • Cool websites
                     Collections Links Members

                    Makeroom

                    Icon

                    A small rag-tag assortment of makers, engineers and designers sharing mentoring, support and projects to work on at any stage in their career.

                    Join our Discord server!


                    Welcome to the Makeroom installation of Storyden!

                    This acts as a live demo of Storyden's forum and library software. On this site you'll find a curated collection of web and design resources as well as anything our members share.

                    Feel free to participate, this may be a demo but it's never wiped. That being said, Storyden is in active development and we encourage you to experiment respectfully as well as report any security issues you find to @Southclaws or by opening an issue.

                    Have an amazing day!

                    powered by storyden

                    Login