1.2 KiB
1.2 KiB
Deployment Topologies
This page compares local and distributed deployment shapes.
Local Development Topology
Purpose: fast iteration and debugging.
- App services run via
compose/dev/docker-compose.yml - Django, Celery, Redis, Postgres, Node, and inference can run together
- Suitable for feature work and integration checks
Distributed Topology (VPS + GPU Node)
Purpose: production-like separation of concerns.
- VPS node: web app, orchestration, API, websocket handling, task queue, database
- GPU node: dedicated inference service (chat + embeddings + chunking helpers)
- Request direction is primarily VPS -> GPU for model tasks
Why Split Nodes?
- Keeps model latency/VRAM pressure away from user/session services
- Allows independent scaling of orchestration and inference
- Improves operational clarity around failures and bottlenecks
Operational Notes
- Confirm inference host/port values in runtime container env
- Confirm pgvector extension is enabled in target database
- Keep role flow generation permissions constrained to trusted user types