|
4 | 4 | :align: right |
5 | 5 |
|
6 | 6 |
|
7 | | -SDialog: Synthetic Dialog Generation, Evaluation, and Interpretability |
8 | | -======================================================================= |
| 7 | + |
| 8 | +SDialog: A Python Toolkit for End-to-End Dialogue Generation, Agent Building, Simulation, and Evaluation |
| 9 | +======================================================================================================= |
9 | 10 |
|
10 | 11 | SDialog is an MIT-licensed open-source toolkit for building, simulating, and evaluating LLM-based conversational agents end-to-end. It aims to bridge **agent construction → dialog generation → evaluation → (optionally) interpretability** in a single reproducible workflow, so you can generate reliable, controllable dialog systems or data at scale. |
11 | 12 |
|
12 | | -It standardizes a Dialog schema and offers persona‑driven multi‑agent simulation with LLMs, composable orchestration, built‑in metrics, and mechanistic interpretability. |
| 13 | +It standardizes a Dialog schema and offers persona-driven multi-agent simulation with LLMs, composable orchestration, built-in metrics, and mechanistic interpretability. |
13 | 14 |
|
14 | 15 | ✨ Key Features |
15 | 16 | --------------- |
16 | 17 |
|
17 | 18 | - **Standard dialog schema** with JSON import/export *(aiming to standardize dialog dataset formats with your help 🙏)* |
18 | | -- **Persona‑driven multi‑agent simulation** with contexts, tools, and thoughts |
| 19 | +- **Persona-driven multi-agent simulation** with contexts, tools, and thoughts |
19 | 20 | - **Composable orchestration** for precise control over behavior and flow |
20 | | -- **Built‑in evaluation** (metrics + LLM‑as‑judge) for comparison and iteration |
| 21 | +- **Built-in evaluation** (metrics + LLM-as-judge) for comparison and iteration |
21 | 22 | - **Native mechanistic interpretability** (inspect and steer activations) |
22 | 23 | - **Easy creation of user-defined components** by inheriting from base classes (personas, metrics, orchestrators, etc.) |
23 | 24 | - **Interoperability** across OpenAI, Hugging Face, Ollama, AWS Bedrock, Google GenAI, Anthropic, and more |
24 | 25 | - **Audio generation** for converting text dialogs to realistic audio conversations |
25 | 26 |
|
26 | | -If you are building conversational systems, benchmarking dialog models, producing synthetic training corpora, simulating diverse users to test or probe conversational systems, or analyzing internal model behavior, SDialog provides an end‑to‑end workflow. |
| 27 | +If you are building conversational systems, benchmarking dialog models, producing synthetic training corpora, simulating diverse users to test or probe conversational systems, or analyzing internal model behavior, SDialog provides an end-to-end workflow. |
27 | 28 |
|
28 | 29 | Quick Links |
29 | 30 | ----------- |
@@ -64,7 +65,7 @@ Alternatively, a ready-to-use Apptainer image (.sif) with SDialog and all depend |
64 | 65 | 🏁 Quickstart Tour |
65 | 66 | ------------------ |
66 | 67 |
|
67 | | -Here's a short, hands‑on example: a support agent helps a customer disputing a double charge. We add a small refund rule and two simple tools, generate three dialogs for evaluation, then serve the agent on port 1333 for Open WebUI or any OpenAI‑compatible client. |
| 68 | +Here's a short, hands-on example: a support agent helps a customer disputing a double charge. We add a small refund rule and two simple tools, generate three dialogs for evaluation, then serve the agent on port 1333 for Open WebUI or any OpenAI-compatible client. |
68 | 69 |
|
69 | 70 | .. code-block:: python |
70 | 71 |
|
@@ -140,12 +141,12 @@ Core Capabilities |
140 | 141 | Testing Remote Systems |
141 | 142 | ^^^^^^^^^^^^^^^^^^^^^^ |
142 | 143 |
|
143 | | -Probe OpenAI‑compatible deployed systems with controllable simulated users and capture dialogs for evaluation. |
| 144 | +Probe OpenAI-compatible deployed systems with controllable simulated users and capture dialogs for evaluation. |
144 | 145 |
|
145 | | -You can use SDialog as a controllable test harness for any OpenAI‑compatible system such as **vLLM**-based ones by role‑playing realistic or adversarial users against your deployed system: |
| 146 | +You can use SDialog as a controllable test harness for any OpenAI-compatible system such as **vLLM**-based ones by role-playing realistic or adversarial users against your deployed system: |
146 | 147 |
|
147 | | -- Black‑box functional checks (Does the system follow instructions? Handle edge cases?) |
148 | | -- Persona / use‑case coverage (Different goals, emotions, domains) |
| 148 | +- Black-box functional checks (Does the system follow instructions? Handle edge cases?) |
| 149 | +- Persona / use-case coverage (Different goals, emotions, domains) |
149 | 150 | - Regression testing (Run the same persona batch each release; diff dialogs) |
150 | 151 | - Safety / robustness probing (Angry, confused, or noisy users) |
151 | 152 | - Automated evaluation (Pipe generated dialogs directly into evaluators) |
@@ -194,7 +195,7 @@ Import, export, and transform dialogs from JSON, text, CSV, or Hugging Face data |
194 | 195 | Evaluation and Comparison |
195 | 196 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
196 | 197 |
|
197 | | -Score dialogs with built‑in metrics and LLM judges, and compare datasets with aggregators and plots. |
| 198 | +Score dialogs with built-in metrics and LLM judges, and compare datasets with aggregators and plots. |
198 | 199 |
|
199 | 200 | .. code-block:: python |
200 | 201 |
|
@@ -223,7 +224,7 @@ Score dialogs with built‑in metrics and LLM judges, and compare datasets with |
223 | 224 | Mechanistic Interpretability |
224 | 225 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
225 | 226 |
|
226 | | -Capture per‑token activations and steer models via Inspectors for analysis and interventions. |
| 227 | +Capture per-token activations and steer models via Inspectors for analysis and interventions. |
227 | 228 |
|
228 | 229 | .. code-block:: python |
229 | 230 |
|
|
0 commit comments