Online Appendix: LLM Agents for Autonomous System Testing: A Semi-Structured Literature Review

This appendix provides a detailed overview of the literature review process and the intermediate results for our study on LLM-based autonomous testing agents.
The data is organized into directories representing different stages of the review process.

Literature Search

0-search_process

Contents:
- Search queries used for each digital library.
- Excel sheets containing all retrieved studies and decisions on inclusion/exclusion for the initial screening.
Purpose:
Documents the first step of our review: identifying potential papers using systematic search queries.

1-detailed-screening

Contents:
- Excel sheet listing all studies that passed the initial screening (43 studies).
- Decisions on whether each study was included after full-text screening.
Purpose:
Tracks the detailed evaluation of each candidate paper to determine eligibility for the review.

2-snowballing

Contents:
- Excel sheet listing all studies from the detailed screening (18 studies).
- Papers considered during backwards snowballing (i.e., references of included papers) and inclusion/exclusion decisions.
Purpose:
Ensures coverage of additional relevant studies that may not have appeared in the initial search.

3-final-results

Contents:
- Excel sheet listing all final included studies (21 studies).
Purpose:
Provides a consolidated record of the studies that form the basis of our review and analysis.

Primary Sources Classification

The Excel sheet (Primary_Sources.xlsx) contains the detailed classification of all 21 studies included in our systematic literature review.

It provides a structured view of study metadata, agent characteristics, testing targets, and evaluation details.

This sheet provides full transparency of our classification process, enabling:

Easy replication of the literature review.
Filtering studies by agent type, autonomy, oracle, or other dimensions.
Reference for future meta-analyses or research synthesis.

Columns Description

Column	Description
Paper Nr.	Sequential number assigned to each paper.
Paper Link	Direct link to the paper.
Title	Full title of the paper.
Year	Year of publication.
Publication Venue	Conference, journal, or workshop name.
Publication Type	Type of publication (conference, journal, workshop, preprint).
Application domain	Domain where AI is applied for testing (e.g., web, mobile, API, embedded).
Testing Target	Target system or component under test.
Testing focus	Functional focus, usability, performance, etc.
Agent Framework	Whether an agent framework is used (e.g., AutoGen).
Number of LLM Agents	Total number of LLM instances used.
Testing framework / automation library	Automation tools used for executing tests.
Other tools	Additional tools integrated into the testing process.
LLM used	The Large Language Model(s) employed.
Fine-tuning done	Whether LLMs were fine-tuned for the study.
Agent architecture	One of: - Single-agent iterative - Single autonomous agent + auxiliary LLM utilities - Multi-agent collaborative - Multi-agent independent
Agent collaboration	For multi-agent collaborative setups: - message passing - shared memory - orchestrator
Level of autonomy	One of: - Fully human-specified goals - Semi-autonomous - Fully autonomous
Oracle	One or more of: - Explicit - LLM intrinsic - System specification - Simple crash detection - Human-In-the-Loop - Metric-based
Granularity of Actions	Low-level (click, API call) or high-level (scenario execution, task).
State Representation for LLM	One or more of: - Complete Structural State - Filtered Context State - Visual State - Symbolic/Abstracted State
Comments	Any additional notes about the study.
Number of systems evaluated on	How many systems the study evaluated their approach on.
System type	Industrial, open-source, or academic systems.
Evaluation Metric	Metrics used to assess testing performance (e.g., coverage, faults found, execution time).
Baseline comparison	Whether the study compares results to existing methods or baselines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Online Appendix: LLM Agents for Autonomous System Testing: A Semi-Structured Literature Review

Literature Search

0-search_process

1-detailed-screening

2-snowballing

3-final-results

Primary Sources Classification

Columns Description

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
0-search_process		0-search_process
1-detailed-screening		1-detailed-screening
2-snowballing		2-snowballing
3-final-results		3-final-results
Primary_Sources.xlsx		Primary_Sources.xlsx
README.md		README.md
Taxonomy.md		Taxonomy.md

software-competence-center-hagenberg/2026-SWQD-Autonomous-Testing

Folders and files

Latest commit

History

Repository files navigation

Online Appendix: LLM Agents for Autonomous System Testing: A Semi-Structured Literature Review

Literature Search

0-search_process

1-detailed-screening

2-snowballing

3-final-results

Primary Sources Classification

Columns Description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages