This API server enables audio transcription using the OpenAI Whisper models.
- Download
.exefrom Releases - Just run it!
- GCC Compiler Installed in your PATH (You can get it from here)
- Install Go (https://go.dev/doc/install)
Before build make sure that CGO_ENABLED env is set to 1
$env:CGO_ENABLED = "1"
you can check this with this command
go env
Also you have to have installed gcc x64 i.e. by MYSYS
Download the sources and use go build.
For example, you can build using the following command:
go build -ldflags "-s -w" -o server.exe main.goMake a request to the server using the following command:
curl http://localhost:3000/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \Receive a response in JSON format:
{
"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}Usage with Obsidian
- Install Obsidian voice recognotion plugin
- Open the plugin's settings.
- Set the following values:
- API KEY:
sk-1 - API URL:
http://localhost:3000/v1/audio/transcriptions - Model:
whisper-1
- API KEY:
- Implement automatic model downloading from huggingface
- Implement automatic
Whisper.dlldownloading from Guthub releases - Provide prebuilt binaries for Windows
- Include instructions for running on Linux with Wine (likely possible).
- Use flags to override the model path
- Use flags to override the port
- Const-me/Whisper project
- goConstmeWhisper for the remarkable Go bindings for Const-me/Whisper
- Georgi Gerganov for GGML models