6 Key Insights into Amazon Bedrock's New Prompt Optimization and Migration Tool

Amazon Bedrock has just unveiled a powerful new tool designed to supercharge the way you refine prompts and migrate between large language models. The Advanced Prompt Optimization feature takes the guesswork out of prompt engineering by automating the optimization process, allowing you to compare original and improved prompts across up to five different models at once. Whether you're moving to a new model or simply seeking better performance from your current setup, this tool promises to save time, reduce regressions, and boost accuracy. Below, we break down the six essential things you need to know about this game-changing addition.

1. What Is Amazon Bedrock Advanced Prompt Optimization?

At its core, this new tool is an automated prompt engineer that operates within the Amazon Bedrock console. It takes your existing prompt templates, sample user inputs, ground truth answers, and an evaluation metric, then iteratively refines the prompt to maximize performance on that metric. The result? A version of your prompt that is measurably better at generating accurate, relevant responses. The tool works in a feedback loop—adjusting the prompt, testing the output, and repeating until it converges on the best possible formulation. This means you no longer have to manually tweak prompts countless times; the optimization engine does the heavy lifting for you.

6 Key Insights into Amazon Bedrock's New Prompt Optimization and Migration Tool — Source: aws.amazon.com

2. Compare Up to Five Models Simultaneously

One of the standout features is the ability to run optimization and evaluation across up to five different inference models concurrently. If you are migrating from one model to another, you can designate your current model as a baseline and then test up to four other models. This allows you to see side-by-side performance comparisons, including evaluation scores, cost estimates, and latency figures. Even if you are not changing models, you can select just your current model to see before-and-after optimization results. This parallel testing ensures you have all the data you need to make an informed decision without running multiple separate experiments.

3. What Inputs Does the Optimizer Require?

To get started, you need to prepare your prompt templates in a specific JSONL format. Each JSON object must be on a single line and include a version identifier (bedrock-2026-05-14), a template ID, the prompt template itself, and an array of evaluation samples. Each sample contains input variables (with their corresponding values) and a ground truth answer. You also need to specify either a custom evaluation metric label along with a Lambda function or an LLM-as-a-judge configuration, or a short natural language description that guides the optimization. The tool uses these inputs to drive the metric-driven feedback loop.

4. Multimodal Inputs Are Supported

Prompt optimization isn't limited to text alone. The tool supports multimodal user inputs, including PNG, JPG, and PDF files. This means you can optimize prompts for tasks like document analysis, image interpretation, or any other scenario where visual data is part of the input. For example, if your workflow involves extracting information from invoices or analyzing medical images, the optimizer can handle those file types seamlessly. The prompt templates themselves remain text-based, but the evaluation samples can include references to these multimodal files, allowing the system to assess how well the prompt performs when given visual context.

5. Custom Evaluation Methods Offer Flexibility

You are not locked into a single evaluation approach. Amazon Bedrock provides three ways to define how the optimizer judges prompt quality:

AWS Lambda function – Write a custom scoring function that runs serverless and can implement any logic you need.
LLM-as-a-judge rubric – Use another language model to rate responses against a set of criteria you provide in a custom prompt.
Natural language description – Simply describe what a good response looks like, and the optimizer will interpret that as a guideline.

This flexibility lets you tailor the evaluation to your specific domain, whether you need strict factual accuracy, creative fluency, or adherence to brand voice.

6. Outputs: Scores, Cost, and Latency

After optimization completes, you receive a comprehensive report for each model tested. This includes evaluation scores for both the original and optimized prompts, estimated costs per inference, and latency measurements. You can see exactly how much better the optimized prompt performs, and at what additional (or reduced) cost. These metrics are presented in a clear, comparative format, making it easy to decide which model-prompt combination to deploy. The final optimized prompt template is also provided, ready for use in production. This transparency helps you balance performance against budget and speed constraints.

Conclusion

Amazon Bedrock's Advanced Prompt Optimization is a smart addition for any team working with large language models. By automating the tedious work of prompt tuning, enabling multi-model comparisons, and supporting multimodal inputs with flexible evaluation methods, it simplifies the journey to better AI performance. Whether you are migrating to a newer model or just want to squeeze more accuracy out of your current one, this tool provides the metrics and insights you need to move forward with confidence. Give it a try on the Bedrock console and see how much your prompts can improve.