Monday, October 7, 2024

OpenAI Just Announced 4 New AI Features, and They’re Available Now

OpenAI announced a slew of updates to its API services at a developer day event today in San Francisco. These updates will enable developers to further customize models, develop new speech-based applications, reduce prices for repetitive prompts, and get better performance out of smaller models. OpenAI announced four major API updates during the event: model distillation, prompt caching, vision fine-tuning, and the introduction of a new API service called Realtime. For the uninitiated, an API (application programming interface) enables software developers to integrate features from an external application into their own product. Model Distillation The company introduced a new way to enhance the capabilities of smaller models like GPT-4o mini by fine-tuning them with the outputs of larger models, called model distillation. In a blog post, the company said that “until now, distillation has been a multi-step, error-prone process, which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements.” To make the process more efficient, OpenAI built a model distillation suite within its API platform. The platform enables developers to build their own datasets by using advanced models like GPT-4o and o1-preview to generate high-quality responses, fine-tune a smaller model to follow those responses, and then create and run custom evaluations to measure how the model performs at specific tasks. OpenAI says it will offer 2 million free training tokens per day on GPT-4o mini and 1 million free training tokens per day on GPT-4o until October 31 in order to help developers get started with distillation. (Tokens are chunks of data that AI models process in order to understand requests.) The cost of training and running a distilled model is the same as OpenAI’s standard fine-tuning prices. Prompt Caching OpenAI has been laser-focused on driving down the price of its API services, and has taken another step in that direction with prompt caching, a new feature that enables developers to reuse commonly-occurring prompts without paying full price every time. Many applications that use OpenAI’s models include lengthy prefixes in front of prompts that detail how the model should act when completing a specific task, like directing the model to respond to all requests with a chipper tone or to always format responses in bullet points. Longer prefixes typically improve the model and help keep responses consistent, but they also increase the cost per API call. Now, OpenAI says the API will automatically save or “cache” lengthy prefixes for up to an hour. If the API detects a new prompt with the same prefix, it will automatically apply a 50-percent discount to the input cost. For developers of AI applications with very focused use cases, the new feature could save a significant amount of money. OpenAI rival Anthropic introduced prompt caching to its own family of models in August. Vision Fine-Tuning Developers will now be able to fine-tune GPT-4o with images in addition to text, which OpenAI says will enhance the model’s ability to understand and recognize images, enabling “applications like enhanced visual search functionality, improved object detection for autonomous vehicles or smart cities, and more accurate medical image analysis.” By uploading a dataset of labeled images to OpenAI’s platform, developers can hone the model’s performance when it comes to understanding images. OpenAI says that Coframe, a startup building an AI-powered growth engineering assistant, has used vision fine-tuning to improve the assistant’s ability to generate code for websites. By giving GPT-4 hundreds of images of websites and the code used to create them, “they improved the model’s ability to generate websites with consistent visual style and correct layout by 26% compared to base GPT-4o.” To get developers started, OpenAI will give out 1 million free training tokens every day during the month of October. From November on, fine-tuning GPT-4o with images will cost $25 per one million tokens. Realtime Last week, OpenAI made its human-sounding advanced voice mode available for all ChatGPT subscribers. Now, the company is enabling developers to build speech-to-speech applications using its technology. If a developer had previously wanted to create an AI-powered application that could speak to users, they’d first need to transcribe the audio, pass the text over to a language model like GPT-4 in order to be processed, and then send the output to a text-to-speech model. OpenAI says this approach “often resulted in loss of emotion, emphasis, and accents, plus noticeable latency.” With the Realtime API, audio is immediately processed by the API without needing to link multiple applications together, making it much faster, cheaper, and more responsive. The API also supports function calling, meaning applications powered by it will be able to take actions, like ordering a pizza or making an appointment. Realtime will eventually be updated to handle multimodal experiences of all kinds, including video. To process text, the API will cost $5 per one million input tokens and $20 per 1 million output tokens. When processing audio, the API will charge $100 per 1 million input tokens and $200 per 1 million output tokens. OpenAI says this equates to “approximately $0.06 per minute of audio input and $0.24 per minute of audio output.”

No comments: