Skip to content

software-competence-center-hagenberg/2026-SWQD-Autonomous-Testing

Repository files navigation

Online Appendix: LLM Agents for Autonomous System Testing: A Semi-Structured Literature Review

This appendix provides a detailed overview of the literature review process and the intermediate results for our study on LLM-based autonomous testing agents.
The data is organized into directories representing different stages of the review process.


Literature Search

0-search_process

  • Contents:
    • Search queries used for each digital library.
    • Excel sheets containing all retrieved studies and decisions on inclusion/exclusion for the initial screening.
  • Purpose:
    Documents the first step of our review: identifying potential papers using systematic search queries.

1-detailed-screening

  • Contents:
    • Excel sheet listing all studies that passed the initial screening (43 studies).
    • Decisions on whether each study was included after full-text screening.
  • Purpose:
    Tracks the detailed evaluation of each candidate paper to determine eligibility for the review.

2-snowballing

  • Contents:
    • Excel sheet listing all studies from the detailed screening (18 studies).
    • Papers considered during backwards snowballing (i.e., references of included papers) and inclusion/exclusion decisions.
  • Purpose:
    Ensures coverage of additional relevant studies that may not have appeared in the initial search.

3-final-results

  • Contents:
    • Excel sheet listing all final included studies (21 studies).
  • Purpose:
    Provides a consolidated record of the studies that form the basis of our review and analysis.

Primary Sources Classification

The Excel sheet (Primary_Sources.xlsx) contains the detailed classification of all 21 studies included in our systematic literature review.

It provides a structured view of study metadata, agent characteristics, testing targets, and evaluation details.

This sheet provides full transparency of our classification process, enabling:

  • Easy replication of the literature review.
  • Filtering studies by agent type, autonomy, oracle, or other dimensions.
  • Reference for future meta-analyses or research synthesis.

Columns Description

Column Description
Paper Nr. Sequential number assigned to each paper.
Paper Link Direct link to the paper.
Title Full title of the paper.
Year Year of publication.
Publication Venue Conference, journal, or workshop name.
Publication Type Type of publication (conference, journal, workshop, preprint).
Application domain Domain where AI is applied for testing (e.g., web, mobile, API, embedded).
Testing Target Target system or component under test.
Testing focus Functional focus, usability, performance, etc.
Agent Framework Whether an agent framework is used (e.g., AutoGen).
Number of LLM Agents Total number of LLM instances used.
Testing framework / automation library Automation tools used for executing tests.
Other tools Additional tools integrated into the testing process.
LLM used The Large Language Model(s) employed.
Fine-tuning done Whether LLMs were fine-tuned for the study.
Agent architecture One of:
- Single-agent iterative
- Single autonomous agent + auxiliary LLM utilities
- Multi-agent collaborative
- Multi-agent independent
Agent collaboration For multi-agent collaborative setups:
- message passing
- shared memory
- orchestrator
Level of autonomy One of:
- Fully human-specified goals
- Semi-autonomous
- Fully autonomous
Oracle One or more of:
- Explicit
- LLM intrinsic
- System specification
- Simple crash detection
- Human-In-the-Loop
- Metric-based
Granularity of Actions Low-level (click, API call) or high-level (scenario execution, task).
State Representation for LLM One or more of:
- Complete Structural State
- Filtered Context State
- Visual State
- Symbolic/Abstracted State
Comments Any additional notes about the study.
Number of systems evaluated on How many systems the study evaluated their approach on.
System type Industrial, open-source, or academic systems.
Evaluation Metric Metrics used to assess testing performance (e.g., coverage, faults found, execution time).
Baseline comparison Whether the study compares results to existing methods or baselines.

About

Literature Review on LLM-driven Autonomous System Testing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published