Amazon Bedrock Intelligent Prompt Routing now generally available
Investing.com -- Amazon (NASDAQ: AMZN ) has announced the general availability of its Bedrock Intelligent Prompt Routing. This tool, previewed in December, offers a single serverless endpoint to effectively direct requests between different foundation models within the same model family. It does this by dynamically predicting the response quality of each model for a request and then directing the request to the most suitable model based on cost and response quality.
Over the past few months, Amazon has implemented several enhancements in intelligent prompt routing. These improvements were driven by customer feedback and extensive internal testing. The aim is to facilitate automated, optimal routing between large language models (LLMs) through Amazon Bedrock Intelligent Prompt Routing. This tool has a deep understanding of model behaviors within each model family, which includes state-of-the-art methods for training routers for different sets of models, tasks and prompts.
Users can now either use Amazon Bedrock Intelligent Prompt Routing with the default prompt routers provided by Amazon Bedrock or configure their own prompt routers. This allows for performance adjustment linearly between the performance of the two candidate LLMs. Default prompt routers are provided by Amazon Bedrock for each model family. These routers come with predefined settings and are designed to work out-of-the-box with specific foundation models. They offer a straightforward, ready-to-use solution without the need to configure any routing settings.
Amazon Bedrock Intelligent Prompt Routing now supports more models from within the Amazon Nova, Anthropic, and Meta (NASDAQ: META ) families. This includes the Anthropic’s Claude family, the Llama family, and the Nova family. Users can also define their own routing configurations tailored to specific needs and preferences.
Amazon has reduced the overhead of added components by over 20% to approximately 85 ms. As the router preferentially invokes the less expensive model while maintaining the same baseline accuracy in the task, users can expect an overall latency and cost benefit compared to always using the larger and more expensive model, despite the additional overhead.
This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.