* odidere ~odidere~ is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi. This project is under active development and breaking changes are possible. ** Features - HTTP API and web UI for voice interaction - Speech-to-text via whisper-server - LLM queries via OpenAI-compatible API (hosted or local) - Text-to-speech via kokoro-fastapi - Configurable concurrency for serializing access to a GPU, or to allow parallel message processing - Streaming responses via Server-Sent Events (SSE) - Tools defined with YAML, parsed with Go templates, and executed as subprocesses ** Limitations - Tools do not support variadic arguments. The ~{{ json . }}~ template function may serve as a workaround when a tool accepts a JSON blob. - Audio is base64 encoded in JSON requests and responses. ** Security ~odidere~ can expose its host to a variety of security risks: - Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack. - Prompt injection ([[https://genai.owasp.org/llmrisk/llm01-prompt-injection/][direct, indirect, and crescendo attacks]]) - Unexpected costs (from tokens, tools, retries) - Data exposure, if using a hosted LLM provider, or via manipulated tools. - Arbitrary command execution and all its consequences, depending on the tools exposed. Mitigations which a user can consider include: - Constraining the runtime and sandboxing ~odidere~ - Sandboxing tools (jails, containers) - Opting for narrow tools over broad ones (e.g., scripts versus bash -c) - Monitoring and constraining resource usage or LLM API costs - Auditing logs - Defensive system prompts that constrain model behavior - Network isolation or authentication for the HTTP server This software is provided as-is; refer to the license. ** Requirements - Go 1.25+ - An LLM server with OpenAI-compatible API (e.g., llama.cpp's ~llama-server~) - whisper-server for speech-to-text - kokoro-fastapi for text-to-speech ** Installation From source: #+begin_src sh git clone https://code.chimeric.al/odidere.git cd odidere make #+end_src The binary is written to ~bin/odidere~. To install to ~$GOBIN~: #+begin_src sh make install #+end_src ** Configuration Environment variables can be referenced with ~${VAR}~ syntax. Reference ~config.example.yaml~ for all options. ** Deployment The path to the configuration file is required: #+begin_src sh odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml #+end_src Example ~systemd~ service files for system or user-level services are in ~init/systemd/~. ** API Endpoints - ~GET /~ — Web UI - ~GET /status~ — Health check - ~GET /static/*~ — Embedded static assets - ~POST /v1/chat/voice~ — Voice API (JSON request/response with base64 audio) - ~POST /v1/chat/voice/stream~ — Streaming voice API (SSE with incremental messages and audio) - ~GET /v1/voices~ — List available TTS voices - ~GET /v1/models~ — List available LLM models ** License MIT