37 lines
1.2 KiB
Markdown
37 lines
1.2 KiB
Markdown
# Deployment Topologies
|
|
|
|
This page compares local and distributed deployment shapes.
|
|
|
|
## Local Development Topology
|
|
|
|
Purpose: fast iteration and debugging.
|
|
|
|
- App services run via `compose/dev/docker-compose.yml`
|
|
- Django, Celery, Redis, Postgres, Node, and inference can run together
|
|
- Suitable for feature work and integration checks
|
|
|
|
## Distributed Topology (VPS + GPU Node)
|
|
|
|
Purpose: production-like separation of concerns.
|
|
|
|
- **VPS node**: web app, orchestration, API, websocket handling, task queue, database
|
|
- **GPU node**: dedicated inference service (chat + embeddings + chunking helpers)
|
|
- Request direction is primarily **VPS -> GPU** for model tasks
|
|
|
|
## Why Split Nodes?
|
|
|
|
- Keeps model latency/VRAM pressure away from user/session services
|
|
- Allows independent scaling of orchestration and inference
|
|
- Improves operational clarity around failures and bottlenecks
|
|
|
|
## Operational Notes
|
|
|
|
- Confirm inference host/port values in runtime container env
|
|
- Confirm pgvector extension is enabled in target database
|
|
- Keep role flow generation permissions constrained to trusted user types
|
|
|
|
## Navigation
|
|
|
|
- [Distributed Runtime Flow](distributed-runtime-flow.md)
|
|
- [Application Structure (Detailed)](application-structure.md)
|
|
- [Project README](../README.md)
|