Ověření schopností LLM generovat použitelné testy software

Velc, Matyáš Josef

Ověření schopností LLM generovat použitelné testy software

dc.contributor.advisor	Lipka Richard, Ing. Ph.D.	cs
dc.contributor.author	Velc, Matyáš Josef	cs
dc.contributor.referee	Herout Pavel, doc. Ing. Ph.D.	cs
dc.date.accepted	2025-06-10
dc.date.accessioned	2026-02-21T00:18:58Z
dc.date.available	2024-09-30
dc.date.available	2026-02-21T00:18:58Z
dc.date.issued	2025-05-05
dc.date.submitted	2025-05-05
dc.description.abstract	Tato bakalářská práce zkoumá schopnosti různých velkých jazykových modelů (LLM) generovat automatizované testy pro webové aplikace. Práce navazuje na před- chozí výzkum v oblasti generování testů a rozšiřuje ho o systematické porovnání sedmi modelů od předních poskytovatelů (Google, OpenAI, Anthropic a Mistral AI). Experimentálně jsem implementoval systém pro automatizované generování, spouš- tění a vyhodnocování testů v prostředí Robot Framework včetně schopnosti auto- matické opravy chybných testů a měření pokrytí kódu. Na základě experimentů s testováním webové aplikace TbUIS jsou analyzovány schopnosti jednotlivých mo- delů z hlediska úspěšnosti generovaných testů, jejich schopnosti detekovat chyby, časové náročnosti generování, schopnosti automatických oprav a pokrytí kódu. Výsledky ukazují výrazné rozdíly mezi modely, přičemž nejvyšší úspěšnost dosáhl Claude 3.7 Sonnet (91,7%), následovaný modely Gemini Pro 2.5 (79,2%) a Claude 3 Opus (75,0%). Práce přináší empiricky podložená doporučení pro využití různých LLM v procesu testování softwaru.	cs
dc.description.abstract-translated	This bachelor thesis investigates the capabilities of various large language models (LLMs) to generate automated tests for web applications. The thesis builds on pre- vious research in test generation and extends it with a systematic comparison of seven models from leading providers (Google, OpenAI, Anthropic, and Mistral AI). I experimentally implemented a system for automated generation, execution, and evaluation of tests in the Robot Framework environment, including the ability to automatically repair failed tests and measure code coverage. Based on experiments with testing the TbUIS web application, the capabilities of individual models are analyzed in terms of success rate of generated tests, their ability to detect errors, time requirements for generation, and automatic repair capabilities. The results show significant differences between models, with Claude 3.7 Sonnet achieving the highest success rate (91.7%), followed by Gemini Pro 2.5 (79.2%) and Claude 3 Opus (75.0%). The thesis provides empirically based recommendations for using various LLMs in the software testing process.	en
dc.description.department	Katedra informatiky a výpočetní techniky	cs
dc.description.result	Obhájeno	cs
dc.format	53
dc.identifier	100586
dc.identifier.uri	http://hdl.handle.net/11025/66472
dc.language.iso	cs
dc.publisher	Západočeská univerzita v Plzni	cs
dc.rights	Plný text práce je přístupný bez omezení	cs
dc.rights.access	openAccess	cs
dc.subject	Robot Framework	cs
dc.subject	velký jazykový model	cs
dc.subject	generování testů	cs
dc.subject	automatizované testování	cs
dc.subject.translated	Robot Framework	en
dc.subject.translated	large language model	en
dc.subject.translated	test generation	en
dc.subject.translated	automated testing	en
dc.thesis.degree-grantor	Západočeská univerzita v Plzni. Fakulta aplikovaných věd	cs
dc.thesis.degree-level	Bakalářský	cs
dc.thesis.degree-name	Bc.	cs
dc.thesis.degree-program	Informatika a výpočetní technika	cs
dc.title	Ověření schopností LLM generovat použitelné testy software	cs
dc.title.alternative	Analysis of the LLM's ability to generate useful software tests	en
dc.type	bakalářská práce	cs
local.files.count	6	*
local.files.size	949742474	*
local.has.files	yes	*
local.relation.IS	https://portal.zcu.cz/StagPortletsJSR168/CleanUrl?urlid=prohlizeni-prace-detail&praceIdno=100586