Treat LLMs as 'naive interns'
A trending thought piece suggests that for reliable AI workflows, large language models should be treated as "naive interns" rather than experts. This approach emphasizes providing LLMs with strong structural guardrails and breaking down complex tasks into a graph of smaller, verifiable calls. The goal is to build robust systems by managing the model's inherent non-determinism.
- This approach is part of a broader shift from "prompt engineering" to "flow engineering," where developers focus on designing robust, multi-step workflows for LLMs. Frameworks like LangChain and LlamaIndex provide the tools to build these chains of reasoning, manage tools, and structure outputs, making them more predictable. - Non-determinism in LLMs arises from factors like floating-point arithmetic and the variable batching of user requests on servers, which can change computation order and thus the final output. Techniques to manage this include using low "temperature" settings to reduce randomness in responses and implementing batch-invariant computation methods. - A key technique for providing guardrails is to structure complex problems as a graph, where the LLM is guided through a series of steps (nodes) to solve a problem. This can be combined with Retrieval-Augmented Generation (RAG) over a knowledge graph (GraphRAG) to ensure the model's reasoning is grounded in factual, interconnected data, which is especially effective for multi-hop questions. - Andrej Karpathy, a prominent AI researcher, has advocated for similar reliability-focused architectures, such as his "LLM Council" concept. This approach involves querying multiple different AI models simultaneously, having them critique each other's responses, and then synthesizing a final, more reliable answer, mimicking a human board meeting. - The "programming, not prompting" philosophy is embodied by frameworks like DSPy from Stanford NLP, which abstracts away hand-crafted prompts into programmable modules. DSPy uses optimizers to automatically fine-tune the prompts and even model weights based on performance metrics, creating a more maintainable and self-improving system. - The "naive intern" analogy highlights that while LLMs can generate creative and novel ideas, human oversight is critical to validate the correctness and efficiency of their outputs. Engineers must be able to distinguish between functional and optimal solutions, as LLMs can hallucinate or produce logically flawed code without understanding the underlying requirements.