90 lines
3.2 KiB
Org Mode
90 lines
3.2 KiB
Org Mode
* odidere
|
|
~odidere~ is a voice assistant service that orchestrates speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) services. It accepts audio or text input with attachments, transcribes speech via whisper-server, queries an OpenAI-compatible LLM (with optional tool calling), and returns synthesized audio via kokoro-fastapi.
|
|
|
|
This project is under active development and breaking changes are possible.
|
|
|
|
** Features
|
|
- HTTP API and web UI for voice interaction
|
|
- Speech-to-text via whisper-server
|
|
- LLM queries via OpenAI-compatible API (hosted or local)
|
|
- Text-to-speech via kokoro-fastapi
|
|
- Configurable concurrency for serializing access to a GPU, or to allow parallel message processing
|
|
- Streaming responses via Server-Sent Events (SSE)
|
|
- Tools defined with YAML, parsed with Go templates, and executed as subprocesses
|
|
|
|
** Limitations
|
|
- Tools do not support variadic arguments. The ~{{ json . }}~ template function may serve as a workaround when a tool accepts a JSON blob.
|
|
- Audio is base64 encoded in JSON requests and responses.
|
|
|
|
** Security
|
|
~odidere~ can expose its host to a variety of security risks:
|
|
|
|
- Denial-of-Service: local LLM inference consumes significant compute; hosted providers incur billing. Large audio files and frequent requests are another avenue of attack.
|
|
- Prompt injection ([[https://genai.owasp.org/llmrisk/llm01-prompt-injection/][direct, indirect, and crescendo attacks]])
|
|
- Unexpected costs (from tokens, tools, retries)
|
|
- Data exposure, if using a hosted LLM provider, or via manipulated tools.
|
|
- Arbitrary command execution and all its consequences, depending on the tools exposed.
|
|
|
|
Mitigations which a user can consider include:
|
|
- Constraining the runtime and sandboxing ~odidere~
|
|
- Sandboxing tools (jails, containers)
|
|
- Opting for narrow tools over broad ones (e.g., scripts versus bash -c)
|
|
- Monitoring and constraining resource usage or LLM API costs
|
|
- Auditing logs
|
|
- Defensive system prompts that constrain model behavior
|
|
- Network isolation or authentication for the HTTP server
|
|
|
|
This software is provided as-is; refer to the license.
|
|
|
|
** Requirements
|
|
- Go 1.25+
|
|
- An LLM server with OpenAI-compatible API (e.g., llama.cpp's ~llama-server~)
|
|
- whisper-server for speech-to-text
|
|
- kokoro-fastapi for text-to-speech
|
|
|
|
** Installation
|
|
From source:
|
|
|
|
#+begin_src sh
|
|
git clone https://code.chimeric.al/odidere.git
|
|
cd odidere
|
|
make
|
|
#+end_src
|
|
|
|
The binary is written to ~bin/odidere~.
|
|
|
|
To install to ~$GOBIN~:
|
|
|
|
#+begin_src sh
|
|
make install
|
|
#+end_src
|
|
|
|
** Configuration
|
|
|
|
Environment variables can be referenced with ~${VAR}~ syntax.
|
|
|
|
Reference ~config.example.yaml~ for all options.
|
|
|
|
** Deployment
|
|
|
|
The path to the configuration file is required:
|
|
|
|
#+begin_src sh
|
|
odidere -c ~/.config/odidere/config.yaml # or /etc/odidere/config.yaml
|
|
#+end_src
|
|
|
|
Example ~systemd~ service files for system or user-level services are in ~init/systemd/~.
|
|
|
|
** API Endpoints
|
|
- ~GET /~ — Web UI
|
|
- ~GET /status~ — Health check
|
|
- ~GET /static/*~ — Embedded static assets
|
|
- ~POST /v1/chat/voice~ — Voice API (JSON request/response with base64 audio)
|
|
- ~POST /v1/chat/voice/stream~ — Streaming voice API (SSE with incremental messages and audio)
|
|
- ~GET /v1/voices~ — List available TTS voices
|
|
- ~GET /v1/models~ — List available LLM models
|
|
|
|
** License
|
|
|
|
MIT
|