# Deployment Topologies

This page compares local and distributed deployment shapes.

## Local Development Topology

Purpose: fast iteration and debugging.

- App services run via `compose/dev/docker-compose.yml`
- Django, Celery, Redis, Postgres, Node, and inference can run together
- Suitable for feature work and integration checks

## Distributed Topology (VPS + GPU Node)

Purpose: production-like separation of concerns.

- **VPS node**: web app, orchestration, API, websocket handling, task queue, database
- **GPU node**: dedicated inference service (chat + embeddings + chunking helpers)
- Request direction is primarily **VPS -> GPU** for model tasks

## Why Split Nodes?

- Keeps model latency/VRAM pressure away from user/session services
- Allows independent scaling of orchestration and inference
- Improves operational clarity around failures and bottlenecks

## Operational Notes

- Confirm inference host/port/protocol values in runtime container env
- Set `INFERENCE_USERNAME` and `INFERENCE_PASSWORD` — the GPU node requires HTTP Basic Auth on all endpoints
- Confirm pgvector extension is enabled in target database
- Keep role flow generation permissions constrained to trusted user types

## Navigation

- [Distributed Runtime Flow](distributed-runtime-flow.md)
- [Application Structure (Detailed)](application-structure.md)
- [Project README](../README.md)