Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
Like college students who skip assigned reading, generative AI chatbots have a reputation for confidently spouting wrong answers. Amazon Web Services (AWS) is looking to curb that habit with a new tool that will force the AI to show its work.
The tech giant’s cloud arm is rolling out a new option called a “contextual grounding check” that will compel large language models (LLMs) to back up output with a reference text. Enterprise AI users can set the confidence level of accuracy they demand, and Amazon claims the tool can cut down on as much as 75% of hallucinations on retrieval-augmented generation (RAG) and summarization tasks.
The tool joins other customizable guardrails that Amazon’s generative AI Bedrock platform already has in place to allow users to filter out objectionable content, such as offensive words, personally identifiable information, or simply irrelevant topics. AWS also announced that these guardrails, first made widely available in April, will now be offered as a standalone API.
Confidence boost: The trustworthiness of AI generation continues to be an obstacle as companies scramble to develop LLM tools for everything from customer service to summarization. Safeguarding tools like these aim to help to set AWS’s platforms apart as a safer place, especially for companies in highly regulated industries like banking or healthcare, according to AWS VP of AI Products Matt Wood.
“Now we can protect against erroneous, confidently wrong answers that the model might accidentally generate,” Wood told Tech Brew at a New York event this week. “We have seen from customers in regulated industries and many others, that as they move these systems to production, guardrails are just a ‘do not pass go, do not collect $200’ kind of capability.”
Users employing a contextual grounding check will have two knobs to turn. One is the threshold of confidence that the LLM’s output is factually based on a reference text and does not contain any information beyond that. The other is the confidence score that the information is relevant to a given query.
Custom risk: AWS Responsible AI Lead Diya Wynn said the goal is to let organizations tailor their strictness depending on their needs, whether it’s a classroom setting guarding against inappropriate material or a financial institution avoiding sensitive investment information. But ultimately, there also needs to be a baseline trust in the AI’s accuracy.
“No one’s going to want to use the product…if they feel like, every time I ask a question…I can’t actually confirm or feel comfortable that I’m getting something that’s real,” Wynn said.