OpenAI publishes gpt-oss Apache 2.0

- On August 5, 2025, OpenAI released gpt-oss-120b and gpt-oss-20b, two Apache 2.0-licensed open-weight reasoning models for local, on-premises and hosted deployment. (openai.com) - The key number is memory: OpenAI says gpt-oss-20b fits in about 16GB, while gpt-oss-120b fits on a single H100 GPU. (developers.openai.com) - OpenAI points developers to its model card, GitHub repository and Hugging Face downloads for setup and deployment details. (github.com)

OpenAI has published two open-weight models under a permissive Apache 2.0 license, and that changes the conversation from “can you call the API?” to “can you run the model yourself.” The release covers gpt-oss-120b and gpt-oss-20b, which OpenAI says are designed for reasoning, tool use and agentic workflows, with compatibility for deployments on infrastructure a customer controls. (openai.com) (developers.openai.com) The practical significance is less about a single benchmark claim than about where inference can happen. Apache 2.0 lowers legal friction for commercial use, modification and redistribution compared with more restrictive model terms, while OpenAI’s own materials emphasize consumer hardware, on-premises systems and third-party hosting providers as target environments. (github.com) ### Why does the license matter more than the launch language? Apache 2.0 is a permissive software license, and OpenAI is explicitly using it for the model weights. That means companies can generally deploy, adapt and commercialize the models without the copyleft obligations that often complicate enterprise adoption, subject to OpenAI’s separate usage policy and the license terms themselves. (openai.com) For infrastructure buyers, that matters because the decision shifts from vendor access to operating economics. A team can choose its own cloud, a private cluster, or an air-gapped environment, and compare hardware cost, latency and compliance tradeoffs against a metered API model. (openai.com) That does not remove operating complexity, but it does widen the set of viable deployment paths. This is an inference drawn from the licensing and deployment materials. ### What exactly did OpenAI release? OpenAI says the two models are gpt-oss-120b and gpt-oss-20b, both described as open-weight reasoning models. (openai.com) The company’s model pages describe the larger model as 117 billion parameters with 5.1 billion active parameters, and the smaller model as 21 billion parameters with 3.6 billion active parameters, reflecting a Mixture-of-Experts design in which only part of the network is active at once for a given token. The context window is 128,000 tokens, according to OpenAI’s model card and product materials. (openai.com) OpenAI also says the models are text-only and built for strong instruction following, tool use and compatibility with its Responses API. ### How much hardware do these models actually need? OpenAI’s published guidance makes the smaller model the more disruptive one for local use. The company says gpt-oss-20b fits in roughly 16GB of memory, while cookbook material says it was designed to run in more resource-constrained environments, including Google Colab. (developers.openai.com) The larger model is aimed higher. OpenAI’s model page says gpt-oss-120b fits into a single Nvidia H100 GPU, which typically means about 80GB of memory in the configuration enterprises already use for production inference. (openai.com) ### Why does Mixture-of-Experts change the cost picture? Mixture-of-Experts matters because total parameter count and active parameter count are not the same thing. OpenAI’s published figures show only a fraction of each model’s parameters are active during inference, which is one reason the company can pair large headline sizes with lower hardware requirements than dense models of similar scale. (github.com) For operators, that can change the economics of self-hosting. Lower active compute and memory demands can make it easier to run a model on a single accelerator, place it in private infrastructure, or offer hosted access at lower cost. (developers.openai.com) That is an inference from OpenAI’s architecture and deployment claims rather than a quoted company statement about pricing. ### Where do developers go next? OpenAI has published the release across its blog, GitHub repository, model card and help center, with downloads available through Hugging Face links referenced in those materials. (developers.openai.com) The GitHub repository also includes guides, examples and compatibility tooling for developers testing local inference and API-style integrations. OpenAI’s current documentation points developers to gpt-oss-120b for higher-reasoning production use cases and to gpt-oss-20b for lower-latency or more resource-constrained deployments. The next concrete step is in those setup guides and model pages, where OpenAI has published configuration details, usage policy documents and download paths for both checkpoints. (openai.com) (developers.openai.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.