From 9193fb5a0b0c182d7742b6c9f9636f36b52b517f Mon Sep 17 00:00:00 2001 From: dwrz Date: Fri, 13 Feb 2026 15:04:26 +0000 Subject: [PATCH] Add README.org --- README.org | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 README.org diff --git a/README.org b/README.org new file mode 100644 index 0000000..8f5b45c --- /dev/null +++ b/README.org @@ -0,0 +1,89 @@ +* odidere +~odidere~ is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi. + +This project is under active development and breaking changes are possible. + +** Features +- HTTP API and web UI for voice interaction +- Speech-to-text via whisper-server +- LLM queries via OpenAI-compatible API (hosted or local) +- Text-to-speech via kokoro-fastapi +- Configurable concurrency for serializing access to a GPU, or to allow parallel message processing +- Streaming responses via Server-Sent Events (SSE) +- Tools defined with YAML, parsed with Go templates, and executed as subprocesses + +** Limitations +- Tools do not support variadic arguments. The ~{{ json . }}~ template function may serve as a workaround when a tool accepts a JSON blob. +- Audio is base64 encoded in JSON requests and responses. + +** Security +~odidere~ can expose its host to a variety of security risks: + +- Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack. +- Prompt injection ([[https://genai.owasp.org/llmrisk/llm01-prompt-injection/][direct, indirect, and crescendo attacks]]) +- Unexpected costs (from tokens, tools, retries) +- Data exposure, if using a hosted LLM provider, or via manipulated tools. +- Arbitrary command execution and all its consequences, depending on the tools exposed. + +Mitigations which a user can consider include: +- Constraining the runtime and sandboxing ~odidere~ +- Sandboxing tools (jails, containers) +- Opting for narrow tools over broad ones (e.g., scripts versus bash -c) +- Monitoring and constraining resource usage or LLM API costs +- Auditing logs +- Defensive system prompts that constrain model behavior +- Network isolation or authentication for the HTTP server + +This software is provided as-is; refer to the license. + +** Requirements +- Go 1.25+ +- An LLM server with OpenAI-compatible API (e.g., llama.cpp's ~llama-server~) +- whisper-server for speech-to-text +- kokoro-fastapi for text-to-speech + +** Installation +From source: + +#+begin_src sh +git clone https://code.chimeric.al/odidere.git +cd odidere +make +#+end_src + +The binary is written to ~bin/odidere~. + +To install to ~$GOBIN~: + +#+begin_src sh +make install +#+end_src + +** Configuration + +Environment variables can be referenced with ~${VAR}~ syntax. + +Reference ~config.example.yaml~ for all options. + +** Deployment + +The path to the configuration file is required: + +#+begin_src sh +odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml +#+end_src + +Example ~systemd~ service files for system or user-level services are in ~init/systemd/~. + +** API Endpoints +- ~GET /~ — Web UI +- ~GET /status~ — Health check +- ~GET /static/*~ — Embedded static assets +- ~POST /v1/chat/voice~ — Voice API (JSON request/response with base64 audio) +- ~POST /v1/chat/voice/stream~ — Streaming voice API (SSE with incremental messages and audio) +- ~GET /v1/voices~ — List available TTS voices +- ~GET /v1/models~ — List available LLM models + +** License + +MIT