odidere/README.org

* odidere
~odidere~ is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi.

This project is under active development and breaking changes are possible.

** Features
- HTTP API and web UI for voice interaction
- Speech-to-text via whisper-server
- LLM queries via OpenAI-compatible API (hosted or local)
- Text-to-speech via kokoro-fastapi
- Configurable concurrency for serializing access to a GPU, or to allow parallel message processing
- Streaming responses via Server-Sent Events (SSE)
- Tools defined with YAML, parsed with Go templates, and executed as subprocesses

** Limitations
- Tools do not support variadic arguments. The ~{{ json . }}~ template function may serve as a workaround when a tool accepts a JSON blob.
- Audio is base64 encoded in JSON requests and responses.

** Security
~odidere~ can expose its host to a variety of security risks:

- Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack.
- Prompt injection ([[https://genai.owasp.org/llmrisk/llm01-prompt-injection/][direct, indirect, and crescendo attacks]])
- Unexpected costs (from tokens, tools, retries)
- Data exposure, if using a hosted LLM provider, or via manipulated tools.
- Arbitrary command execution and all its consequences, depending on the tools exposed.

Mitigations which a user can consider include:
- Constraining the runtime and sandboxing ~odidere~
- Sandboxing tools (jails, containers)
- Opting for narrow tools over broad ones (e.g., scripts versus bash -c)
- Monitoring and constraining resource usage or LLM API costs
- Auditing logs
- Defensive system prompts that constrain model behavior
- Network isolation or authentication for the HTTP server

This software is provided as-is; refer to the license.

** Requirements
- Go 1.25+
- An LLM server with OpenAI-compatible API (e.g., llama.cpp's ~llama-server~)
- whisper-server for speech-to-text
- kokoro-fastapi for text-to-speech

** Installation
From source:

#+begin_src sh
git clone https://code.chimeric.al/odidere.git
cd odidere
make
#+end_src

The binary is written to ~bin/odidere~.

To install to ~$GOBIN~:

#+begin_src sh
make install
#+end_src

** Configuration

Environment variables can be referenced with ~${VAR}~ syntax.

Reference ~config.example.yaml~ for all options.

** Deployment

The path to the configuration file is required:

#+begin_src sh
odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml
#+end_src

Example ~systemd~ service files for system or user-level services are in ~init/systemd/~.

** API Endpoints
- ~GET /~ — Web UI
- ~GET /status~ — Health check
- ~GET /static/*~ — Embedded static assets
- ~POST /v1/chat/voice~ — Voice API (JSON request/response with base64 audio)
- ~POST /v1/chat/voice/stream~ — Streaming voice API (SSE with incremental messages and audio)
- ~GET /v1/voices~ — List available TTS voices
- ~GET /v1/models~ — List available LLM models

** License

MIT