Skip to content

Commit 2ffde47

Browse files
authored
Merge pull request #526 from idiap/dev
v0.27.3
2 parents 3194998 + 3250edd commit 2ffde47

35 files changed

+715
-181
lines changed

.github/workflows/tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ jobs:
7676
if [ "${{ matrix.python-version }}" == "3.10" ]; then
7777
resolution=lowest-direct
7878
fi
79-
uv run --resolution=$resolution --extra server --extra languages make ${{ matrix.subset }}
79+
uv run --resolution=$resolution --extra codec --extra server --extra languages make ${{ matrix.subset }}
8080
- name: Upload coverage data
8181
uses: actions/upload-artifact@v4
8282
with:
@@ -119,7 +119,7 @@ jobs:
119119
if [ "${{ matrix.python-version }}" == "3.10" ]; then
120120
resolution=lowest-direct
121121
fi
122-
uv run --resolution=$resolution --extra languages coverage run -m pytest -x -v --durations=0 $shard_tests
122+
uv run --resolution=$resolution --extra codec --extra languages coverage run -m pytest -x -v --durations=0 $shard_tests
123123
- name: Upload coverage data
124124
uses: actions/upload-artifact@v4
125125
with:
@@ -154,7 +154,7 @@ jobs:
154154
uv add git+https://github.com/idiap/coqui-ai-coqpit --branch ${{ github.event.inputs.coqpit_branch }}
155155
fi
156156
- name: Zoo tests
157-
run: uv run --extra server --extra languages make test_zoo
157+
run: uv run --extra codec --extra server --extra languages make test_zoo
158158
env:
159159
NUM_PARTITIONS: 3
160160
TEST_PARTITION: ${{ matrix.partition }}

Dockerfile

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
ARG BASE=nvidia/cuda:12.8.1-base-ubuntu24.04
22
FROM ${BASE}
33

4+
ARG BASE=nvidia/cuda:12.8.1-base-ubuntu24.04
45
RUN apt-get update && apt-get upgrade -y
56
RUN apt-get install -y --no-install-recommends \
67
gcc g++ make python3 python3-dev \
@@ -9,8 +10,7 @@ RUN apt-get install -y --no-install-recommends \
910

1011
# Install uv
1112
COPY --from=ghcr.io/astral-sh/uv:0.8.15 /uv /uvx /bin/
12-
ENV UV_NO_CACHE=1 \
13-
UV_TORCH_BACKEND=auto
13+
ENV UV_NO_CACHE=1
1414

1515
RUN uv venv /opt/venv
1616
ENV VIRTUAL_ENV=/opt/venv PATH="/opt/venv/bin:$PATH"
@@ -19,7 +19,12 @@ WORKDIR /app
1919

2020
# Install dependencies first for better caching
2121
COPY pyproject.toml /app
22-
RUN uv pip install -r pyproject.toml --extra all
22+
RUN if echo "$BASE" | grep -q "cuda"; then \
23+
UV_TORCH_BACKEND=cu128; \
24+
else \
25+
UV_TORCH_BACKEND=cpu; \
26+
fi && \
27+
uv pip install -r pyproject.toml --extra all --torch-backend=${UV_TORCH_BACKEND}
2328

2429
# Copy the rest of the application
2530
COPY . /app

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ lint: ## run linters.
4444
uv run --only-dev ruff format ${target_dirs} --check
4545

4646
system-deps: ## install linux system deps
47-
sudo apt-get install -y libsndfile1-dev
47+
sudo apt-get install -y libsndfile1-dev ffmpeg
4848

4949
install: ## install 🐸 TTS
5050
uv sync --all-extras

README.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@
2424

2525
## 📣 News
2626
- **Fork of the [original, unmaintained repository](https://github.com/coqui-ai/TTS). New PyPI package: [coqui-tts](https://pypi.org/project/coqui-tts)**
27-
- 0.25.0: [OpenVoice](https://github.com/myshell-ai/OpenVoice) models now available for voice conversion.
27+
- 0.27.0: [Caching mechanism](https://coqui-tts.readthedocs.io/en/latest/cloning.html) for cloned voices.
28+
- 0.25.2: [OpenVoice](https://github.com/myshell-ai/OpenVoice) and [kNN-VC](https://github.com/bshall/knn-vc) models now available for voice conversion.
2829
- 0.24.2: Prebuilt wheels are now also published for macOS and Windows (in addition to Linux as before) for easier installation across platforms.
2930
- 0.20.0: XTTSv2 is here with 17 languages and better performance across the board. XTTS can stream with <200ms latency.
3031
- 0.19.0: XTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech).
@@ -117,7 +118,9 @@ You can also help us implement more models.
117118
## Installation
118119

119120
🐸TTS is tested on Ubuntu 24.04 with **python >= 3.10, < 3.14**, but should also
120-
work on Mac and Windows.
121+
work on Mac and Windows. Depending on your platform, you might first want to
122+
separately install Pytorch, `torchaudio`, and `torchcodec` with their
123+
[official instructions](https://pytorch.org/get-started/locally/).
121124

122125
If you are only interested in [synthesizing speech](https://coqui-tts.readthedocs.io/en/latest/inference.html) with the pretrained 🐸TTS models, installing from PyPI is the easiest option.
123126

@@ -140,6 +143,7 @@ The following extras allow the installation of optional dependencies:
140143
| Name | Description |
141144
|------|-------------|
142145
| `all` | All optional dependencies |
146+
| `codec` | Installs torchcodec needed with Pytorch>=2.9 |
143147
| `notebooks` | Dependencies only used in notebooks |
144148
| `server` | Dependencies to run the TTS server |
145149
| `bn` | Bangla G2P |
@@ -227,6 +231,10 @@ From version 0.27.0 you can [cache cloned
227231
voices](https://coqui-tts.readthedocs.io/en/latest/cloning.html) with a custom
228232
`speaker` ID, so you only need to pass audio files in `speaker_wav` once.
229233

234+
> [!NOTE]
235+
> For more control or additional outputs, e.g. timestamps, use the lower-level
236+
> [Synthesizer API](https://coqui-tts.readthedocs.io/en/latest/main_classes/synthesizer.html).
237+
230238
#### Single speaker model
231239

232240
```python
@@ -287,6 +295,9 @@ api.tts_to_file(
287295
)
288296
```
289297

298+
**Note:** Some Fairseq models need the romanization library `uroman` to be
299+
installed. For this you can install `coqui-tts` with the `languages` extra.
300+
290301
### Command-line interface `tts`
291302

292303
<!-- begin-tts-readme -->

TTS/bin/synthesize.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
import argparse
66
import contextlib
7+
import importlib.metadata
78
import logging
89
import sys
910
from argparse import RawTextHelpFormatter
@@ -288,7 +289,12 @@ def parse_args(arg_list: list[str] | None) -> argparse.Namespace:
288289
"--voice_dir",
289290
type=str,
290291
default=None,
291-
help="Voice dir for tortoise model",
292+
help="Custom directory for caching of cloned voices.",
293+
)
294+
parser.add_argument(
295+
"--version",
296+
action="store_true",
297+
help="Print the Coqui TTS version number and exit.",
292298
)
293299

294300
args = parser.parse_args(arg_list)
@@ -304,6 +310,7 @@ def parse_args(arg_list: list[str] | None) -> argparse.Namespace:
304310
args.model_info_by_name,
305311
args.source_wav,
306312
args.target_wav,
313+
args.version,
307314
]
308315
if not any(check_args):
309316
parser.parse_args(["-h"])
@@ -338,6 +345,11 @@ def main(arg_list: list[str] | None = None) -> None:
338345
vc_config_path = None
339346
model_dir = None
340347

348+
# 0) Print version number
349+
if args.version:
350+
logger.info(importlib.metadata.version("coqui-tts"))
351+
sys.exit(0)
352+
341353
# 1) List pre-trained TTS models
342354
if args.list_models:
343355
manager.list_models()

TTS/tts/configs/vits_config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ class VitsConfig(BaseTTSConfig):
4242
Parameters for the learning rate scheduler of the discriminator. Defaults to `{'gamma': 0.999875, "last_epoch":-1}`.
4343
4444
scheduler_after_epoch (bool):
45-
If true, step the schedulers after each epoch else after each step. Defaults to `False`.
45+
If true, step the schedulers after each epoch else after each step. Defaults to `True`.
4646
4747
optimizer (str):
4848
Name of the optimizer to use with both the generator and the discriminator networks. One of the

TTS/tts/datasets/dataset.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,19 @@
33
import logging
44
import os
55
import random
6+
from math import floor
67
from typing import Any
78

89
import numpy as np
910
import numpy.typing as npt
1011
import torch
11-
import torchaudio
1212
import tqdm
1313
from torch.utils.data import Dataset
1414

1515
from TTS.tts.utils.data import prepare_data, prepare_stop_target, prepare_tensor
1616
from TTS.utils.audio import AudioProcessor
1717
from TTS.utils.audio.numpy_transforms import compute_energy as calculate_energy
18+
from TTS.utils.generic_utils import is_pytorch_at_least_2_9
1819

1920
logger = logging.getLogger(__name__)
2021

@@ -47,6 +48,20 @@ def string2filename(string: str) -> str:
4748
return base64.urlsafe_b64encode(string.encode("utf-8")).decode("utf-8", "ignore")
4849

4950

51+
def _get_audio_size_torchcodec(audiopath: str | os.PathLike[Any]) -> int:
52+
try:
53+
from torchcodec.decoders import AudioDecoder
54+
except ImportError as e:
55+
msg = "torchcodec not installed (available in the `codec` extra)"
56+
raise ImportError(msg) from e
57+
except RuntimeError as e:
58+
msg = "Error while importing torchcodec, see the stacktrace for details."
59+
raise ImportError(msg) from e
60+
61+
metadata = AudioDecoder(audiopath).metadata
62+
return floor(metadata.duration_seconds_from_header * metadata.sample_rate)
63+
64+
5065
def get_audio_size(audiopath: str | os.PathLike[Any]) -> int:
5166
"""Return the number of samples in the audio file."""
5267
if not isinstance(audiopath, str):
@@ -57,7 +72,12 @@ def get_audio_size(audiopath: str | os.PathLike[Any]) -> int:
5772
raise RuntimeError(msg)
5873

5974
try:
60-
return torchaudio.info(audiopath).num_frames
75+
if is_pytorch_at_least_2_9():
76+
return _get_audio_size_torchcodec(audiopath)
77+
else:
78+
import torchaudio
79+
80+
return torchaudio.info(audiopath).num_frames
6181
except RuntimeError as e:
6282
msg = f"Failed to decode {audiopath}"
6383
raise RuntimeError(msg) from e

0 commit comments

Comments
 (0)