# Deployment Topologies This page compares local and distributed deployment shapes. ## Local Development Topology Purpose: fast iteration and debugging. - App services run via `compose/dev/docker-compose.yml` - Django, Celery, Redis, Postgres, Node, and inference can run together - Suitable for feature work and integration checks ## Distributed Topology (VPS + GPU Node) Purpose: production-like separation of concerns. - **VPS node**: web app, orchestration, API, websocket handling, task queue, database - **GPU node**: dedicated inference service (chat + embeddings + chunking helpers) - Request direction is primarily **VPS -> GPU** for model tasks ## Why Split Nodes? - Keeps model latency/VRAM pressure away from user/session services - Allows independent scaling of orchestration and inference - Improves operational clarity around failures and bottlenecks ## Operational Notes - Confirm inference host/port/protocol values in runtime container env - Set `INFERENCE_USERNAME` and `INFERENCE_PASSWORD` — the GPU node requires HTTP Basic Auth on all endpoints - Confirm pgvector extension is enabled in target database - Keep role flow generation permissions constrained to trusted user types ## Navigation - [Distributed Runtime Flow](distributed-runtime-flow.md) - [Application Structure (Detailed)](application-structure.md) - [Project README](../README.md)