3.2 KiB
odidere
odidere is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi.
This project is under active development and breaking changes are possible.
Features
- HTTP API and web UI for voice interaction
- Speech-to-text via whisper-server
- LLM queries via OpenAI-compatible API (hosted or local)
- Text-to-speech via kokoro-fastapi
- Configurable concurrency for serializing access to a GPU, or to allow parallel message processing
- Streaming responses via Server-Sent Events (SSE)
- Tools defined with YAML, parsed with Go templates, and executed as subprocesses
Limitations
- Tools do not support variadic arguments. The
{{ json . }}template function may serve as a workaround when a tool accepts a JSON blob. - Audio is base64 encoded in JSON requests and responses.
Security
odidere can expose its host to a variety of security risks:
- Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack.
- Prompt injection (direct, indirect, and crescendo attacks)
- Unexpected costs (from tokens, tools, retries)
- Data exposure, if using a hosted LLM provider, or via manipulated tools.
- Arbitrary command execution and all its consequences, depending on the tools exposed.
Mitigations which a user can consider include:
- Constraining the runtime and sandboxing
odidere - Sandboxing tools (jails, containers)
- Opting for narrow tools over broad ones (e.g., scripts versus bash -c)
- Monitoring and constraining resource usage or LLM API costs
- Auditing logs
- Defensive system prompts that constrain model behavior
- Network isolation or authentication for the HTTP server
This software is provided as-is; refer to the license.
Requirements
- Go 1.25+
- An LLM server with OpenAI-compatible API (e.g., llama.cpp's
llama-server) - whisper-server for speech-to-text
- kokoro-fastapi for text-to-speech
Installation
From source:
git clone https://code.chimeric.al/odidere.git
cd odidere
make
The binary is written to bin/odidere.
To install to $GOBIN:
make install
Configuration
Environment variables can be referenced with ${VAR} syntax.
Reference config.example.yaml for all options.
Deployment
The path to the configuration file is required:
odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml
Example systemd service files for system or user-level services are in init/systemd/.
API Endpoints
GET /— Web UIGET /status— Health checkGET /static/*— Embedded static assetsPOST /v1/chat/voice— Voice API (JSON request/response with base64 audio)POST /v1/chat/voice/stream— Streaming voice API (SSE with incremental messages and audio)GET /v1/voices— List available TTS voicesGET /v1/models— List available LLM models
License
MIT