OpenAI launches experimental GPT-4o Long Output model with 16X token capacity

Nadia24x7OfficialJuly 31, 2024

0 0 3 minutes read

OpenAI launches experimental GPT-4o Long Output model with 16X token capacity

OpenAI is reportedly eyeing a cash crunch, but that isn’t stopping the preeminent generative AI company from continuing to release a steady stream of new models and updates.

Yesterday, the company quietly posted a webpage announcing a new large language model (LLM): GPT-4o Long Output, which is a variation on its signature GPT-4o model from May, but with a massively extended output size: up to 64,000 tokens of output instead of GPT-4o’s initial 4,000 — a 16-fold increase.

Tokens, as you may recall, refer to the numerical representations of concepts, grammatical constructions, and combinations of letters and numbers organized based on their semantic meaning behind-the-scenes of an LLM.

The word “Hello” is one token, for example, but so too is “hi.” You can see an interactive demo of tokens in action via OpenAI’s Tokenizer here. Machine learning researcher Simon Willison also has a great interactive token encoder/decoder.

By offering a 16X increase in token outputs with the new GPT-4o Long Output variant, OpenAI is now giving users — and more specifically, third-party developers building atop its application programming interface (API) — the opportunity to have the chatbot return far longer responses, up to about a 200-page novel in length.

Table of Contents

Why is OpenAI launching a longer output model?

OpenAI’s decision to introduce this extended output capability stems from customer feedback indicating a need for longer output contexts.

An OpenAI spokesperson explained to VentureBeat: “We heard feedback from our customers that they’d like a longer output context. We are always testing new ways we can best serve our customers’ needs.”

The alpha testing phase is expected to last for a few weeks, allowing OpenAI to gather data on how effectively the extended output meets user needs.

This enhanced capability is particularly advantageous for applications requiring detailed and extensive output, such as code editing and writing improvement.

By offering more extended outputs, the GPT-4o model can provide more comprehensive and nuanced responses, which can significantly benefit these use cases.

Distinction between context and output

Already, since launch, GPT-4o offered a maximum 128,000 context window — the amount of tokens the model can handle in any one interaction, including both input and output tokens.

For GPT-4o Long Output, this maximum context window remains at 128,000.

So how is OpenAI able to increase the number of output tokens 16-fold from 4,000 to 64,000 tokens while keeping the overall context window at 128,000?

It call comes down to some simple math: even though the original GPT-4o from May had a total context window of 128,000 tokens, its single output message was limited to 4,000.

Similarly, for the new GPT-4o mini window, the total context is 128,000 but the maximum output has been raised to 16,000 tokens.

That means for GPT-4o, the user can provide up to 124,000 tokens as an input and receive up to 4,000 maximum output from the model in a single interaction. They can also provide more tokens as input but receive fewer as output, while still adding up to 128,000 total tokens.

For GPT-4o mini, the user can provide up to 112,000 tokens as an input in order to get a maximum output of 16,000 tokens back.

For GPT-4o Long Output, the total context window is still capped at 128,000. Yet, now, the user can provide up to 64,000 tokens worth of input in exchange for a maximum of 64,000 tokens back out — that is, if the user or developer of an application built atop it wants to prioritize longer LLM responses while limiting the inputs.

In all cases, the user or developer must make a choice or trade-off: do they want to sacrifice some input tokens in favor of longer outputs while still remaining at 128,000 tokens total? For users who want longer answers, the GPT-4o Long Output now offers this as an option.

Priced aggressively and affordably

The new GPT-4o Long Output model is priced as follows:

$6 USD per 1 million input tokens
$18 per 1 million output tokens

Compare that to the regular GPT-4o pricing which is $5 per million input tokens and $15 per million output, or even the new GPT-4o mini at $0.15 per million input and $0.60 per million output, and you can see it is priced rather aggressively, continuing OpenAI’s recent refrain that it wants to make powerful AI affordable and accessible to wide swaths of the developer userbase.

Currently, access to this experimental model is limited to a small group of trusted partners. The spokesperson added, “We’re conducting alpha testing for a few weeks with a small number of trusted partners to see if longer outputs help their use cases.”

Depending on the outcomes of this testing phase, OpenAI may consider expanding access to a broader customer base.

Future prospects

The ongoing alpha test will provide valuable insights into the practical applications and potential benefits of the extended output model.

If the feedback from the initial group of partners is positive, OpenAI may consider making this capability more widely available, enabling a broader range of users to benefit from the enhanced output capabilities.

Clearly, with the GPT-4o Long Output model, OpenAI hopes to address an even wider range of customer requests and power applications requiring detailed responses.