In an effort to enhance its competitiveness with other AI companies such as Google, OpenAI has introduced a new API option called Flex processing. This option offers lower prices for AI model usage, albeit with slower response times and occasional resource unavailability.
Flex processing is currently in beta for OpenAI’s newly launched o3 and o4-mini reasoning models. It is designed for lower-priority and “non-production” tasks, which include model evaluations, data enrichment, and asynchronous workloads, as stated by OpenAI.
The Flex processing option reduces the API costs by half. For the o3 model, Flex processing is priced at $5 per million input tokens (approximately 750,000 words) and $20 per million output tokens, compared to the standard rate of $10 per million input tokens and $40 per million output tokens. Meanwhile, for the o4-mini model, the Flex pricing is reduced to $0.55 per million input tokens and $2.20 per million output tokens from the regular price of $1.10 per million input tokens and $4.40 per million output tokens.
The introduction of Flex processing aligns with the rise in the cost of frontier AI technology and the launch of more cost-efficient models by competitors. Recently, Google introduced the Gemini 2.5 Flash, a reasoning model that equals or surpasses DeepSeek’s R1 in performance while offering a lower cost per input token.
In an announcement email regarding the Flex pricing launch, OpenAI also informed its customers that developers in tiers 1-3 of its usage tier hierarchy are required to undergo a recently implemented ID verification process to gain access to the o3 model. These tiers are based on the expenditure on OpenAI services. Additionally, the reasoning summaries and streaming API support for o3 and other models are also contingent upon verification.
OpenAI has previously communicated that the ID verification initiative aims to prevent misuse of its services by bad actors.