You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -192,27 +192,29 @@ See [Dialog section](https://sdialog.readthedocs.io/en/latest/sdialog/index.html
192
192
<summary>Score dialogs with built‑in metrics and LLM judges, and compare datasets with aggregators and plots.</summary>
193
193
194
194
Dialogs can be evaluated using the different components available inside the `sdialog.evaluation` module.
195
-
Use [built‑in metrics](https://sdialog.readthedocs.io/en/latest/api/sdialog.html#module-sdialog.evaluation) (readability, flow, linguistic features, LLM judges) or easily create new ones, then aggregate and compare datasets (sets of dialogs) via `DatasetComparator`.
195
+
Use [built‑in metrics](https://sdialog.readthedocs.io/en/latest/api/sdialog.html#module-sdialog.evaluation)—conversational features, readability, embedding-based, LLM-as-judge, flow-based, functional correctness (30+ metrics across six categories)—or easily create new ones, then aggregate and compare datasets (sets of dialogs) via `Comparator`.
196
196
197
197
```python
198
-
from sdialog.evaluation import LLMJudgeRealDialog, LinguisticFeatureScore
199
-
from sdialog.evaluation import FrequencyEvaluator, MeanEvaluator
200
-
from sdialog.evaluation import DatasetComparator
201
-
202
-
reference = [...] # list[Dialog]
203
-
candidate = [...] # list[Dialog]
198
+
from sdialog import Dialog
199
+
from sdialog.evaluation import LLMJudgeYesNo, ToolSequenceValidator
200
+
from sdialog.evaluation import FrequencyEvaluator, Comparator
0 commit comments