Konferenzprogramm

Automated testing of Large Language Models: A Live Demo

How do you test a large language model's shape-shifting outputs using automated tests? 

In this session, I will demonstrate the inner workings of a Retrieval Augmented Generation (RAG) model, and how you can qualify it using automated tests. Spoiler alert: testing an LLM's non-deterministic outputs will involve using another LLM to judge this output and score it. 

Once we run the tests, we will examine the stack traces to debug an unexpected result. We will then subject the RAG model to an indirect injection attack. Join me in this live demonstration, where we pit one LLM against another, and also expose a security flaw.

Target Audience: Testers, Test Automation Engineers, Developers, Product Managers
Prerequisites: Basic knowledge of LLMs
Level: Basic

Extended Abstract:

The talk starts with an outline of how LLMs and RAG models work. I will explain how context-window size limits LLM performance, which creates the need for RAG models. I will then explain how 

  • Datasources are chunked and stored as embeddings in a vector database
  • When a user query is submitted, the most relevant chunks are retrieved from the database
  • The LLM then generates the response by using the retrieved chunks as the context. 

With this background, I can start the demonstration. I will step through an existing RAG model, explaining what is happening in the background. I then submit a user query to the RAG model and examine its response. Once this response is generated, we will submit it to an automated evaluation using the RAG assessment framework, Ragas. 

We will then peel back the layers of this framework to see how the evaluation is performed. With the help of stack traces, I will explain one or two of the metrics in detail, demonstrating how the results were arrived at, and how we can debug unexpected results. 

I will finally demonstrate a common security vulnerability in RAG models - an indirect injection attack. This type of attack happens if the retrieved context contains instructions that override the RAG model's original instructions to produce unexpected results. I will then discuss how we can mitigate this vulnerability.

AnuKrit
Test Solution Architect

Anupam ist ein freiberuflicher Test Solution Architect. Er hat schon in vielen Bereichen gearbeitet: Raumfahrt, Softwareentwicklung, Strategieberatung und Prozessautomatisierung. Als Manager, der Softwareentwickler wurde, arbeitet er ständig daran, die Grenzen zwischen diesen beiden Welten aufzulösen.

Anupam setzt sich dafür ein, die Qualität von Software immer weiter zu verbessern - indem er Prozesse vereinfacht und Verschwendung reduziert. Wenn er Teams dabei nicht unterstützt, spielt er gern Online-Schach, läuft lange Strecken mit einem History-Podcast im Ohr oder liest ein Buch in einer ruhigen Ecke.

Anupam Krishnamurthy
11:20 - 11:55
Vortrag: Do 1.2

Vortrag Teilen