2026-02-21 19:47:00 +00:00
2026-02-21 19:47:00 +00:00
2026-02-13 15:04:02 +00:00
2026-02-21 19:47:00 +00:00
2026-02-21 19:47:00 +00:00
2026-02-13 15:04:17 +00:00
2026-02-21 19:47:00 +00:00
2026-02-13 15:04:26 +00:00

odidere

odidere is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi.

This project is under active development and breaking changes are possible.

Features

  • HTTP API and web UI for voice interaction
  • Speech-to-text via whisper-server
  • LLM queries via OpenAI-compatible API (hosted or local)
  • Text-to-speech via kokoro-fastapi
  • Configurable concurrency for serializing access to a GPU, or to allow parallel message processing
  • Streaming responses via Server-Sent Events (SSE)
  • Tools defined with YAML, parsed with Go templates, and executed as subprocesses

Limitations

  • Tools do not support variadic arguments. The {{ json . }} template function may serve as a workaround when a tool accepts a JSON blob.
  • Audio is base64 encoded in JSON requests and responses.

Security

odidere can expose its host to a variety of security risks:

  • Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack.
  • Prompt injection (direct, indirect, and crescendo attacks)
  • Unexpected costs (from tokens, tools, retries)
  • Data exposure, if using a hosted LLM provider, or via manipulated tools.
  • Arbitrary command execution and all its consequences, depending on the tools exposed.

Mitigations which a user can consider include:

  • Constraining the runtime and sandboxing odidere
  • Sandboxing tools (jails, containers)
  • Opting for narrow tools over broad ones (e.g., scripts versus bash -c)
  • Monitoring and constraining resource usage or LLM API costs
  • Auditing logs
  • Defensive system prompts that constrain model behavior
  • Network isolation or authentication for the HTTP server

This software is provided as-is; refer to the license.

Requirements

  • Go 1.25+
  • An LLM server with OpenAI-compatible API (e.g., llama.cpp's llama-server)
  • whisper-server for speech-to-text
  • kokoro-fastapi for text-to-speech

Installation

From source:

git clone https://code.chimeric.al/odidere.git
cd odidere
make

The binary is written to bin/odidere.

To install to $GOBIN:

make install

Configuration

Environment variables can be referenced with ${VAR} syntax.

Reference config.example.yaml for all options.

Deployment

The path to the configuration file is required:

odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml

Example systemd service files for system or user-level services are in init/systemd/.

API Endpoints

  • GET / — Web UI
  • GET /status — Health check
  • GET /static/* — Embedded static assets
  • POST /v1/chat/voice — Voice API (JSON request/response with base64 audio)
  • POST /v1/chat/voice/stream — Streaming voice API (SSE with incremental messages and audio)
  • GET /v1/voices — List available TTS voices
  • GET /v1/models — List available LLM models

License

MIT

Description
LLM voice assistant (STT, TTS, tools)
Readme MIT 1.3 MiB
Languages
Go 57%
JavaScript 30.5%
CSS 10%
Makefile 2.5%