odidere

odidere is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi.

This project is under active development and breaking changes are possible.

Features

HTTP API and web UI for voice interaction
Speech-to-text via whisper-server
LLM queries via OpenAI-compatible API (hosted or local)
Text-to-speech via kokoro-fastapi
Configurable concurrency for serializing access to a GPU, or to allow parallel message processing
Streaming responses via Server-Sent Events (SSE)
Tools defined with YAML, parsed with Go templates, and executed as subprocesses

Limitations

Tools do not support variadic arguments. The {{ json . }} template function may serve as a workaround when a tool accepts a JSON blob.
Audio is base64 encoded in JSON requests and responses.

Security

odidere can expose its host to a variety of security risks:

Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack.
Prompt injection (direct, indirect, and crescendo attacks)
Unexpected costs (from tokens, tools, retries)
Data exposure, if using a hosted LLM provider, or via manipulated tools.
Arbitrary command execution and all its consequences, depending on the tools exposed.

Mitigations which a user can consider include:

Constraining the runtime and sandboxing odidere
Sandboxing tools (jails, containers)
Opting for narrow tools over broad ones (e.g., scripts versus bash -c)
Monitoring and constraining resource usage or LLM API costs
Auditing logs
Defensive system prompts that constrain model behavior
Network isolation or authentication for the HTTP server

This software is provided as-is; refer to the license.

Requirements

Go 1.25+
An LLM server with OpenAI-compatible API (e.g., llama.cpp's llama-server)
whisper-server for speech-to-text
kokoro-fastapi for text-to-speech

Installation

From source:

git clone https://code.chimeric.al/odidere.git
cd odidere
make

The binary is written to bin/odidere.

To install to $GOBIN:

make install

Configuration

Environment variables can be referenced with ${VAR} syntax.

Reference config.example.yaml for all options.

Deployment

The path to the configuration file is required:

odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml

Example systemd service files for system or user-level services are in init/systemd/.

API Endpoints

GET / — Web UI
GET /status — Health check
GET /static/* — Embedded static assets
POST /v1/chat/voice — Voice API (JSON request/response with base64 audio)
POST /v1/chat/voice/stream — Streaming voice API (SSE with incremental messages and audio)
GET /v1/voices — List available TTS voices
GET /v1/models — List available LLM models

License

MIT