diff --git a/README.md b/README.md index 6d636d2..9ea78d0 100644 --- a/README.md +++ b/README.md @@ -1,66 +1,132 @@ # Dynavera: Distributed Agentic Onboarding System -Dynavera is a multi-agent AI platform designed to automate role-specific onboarding. The system utilizes a distributed architecture to separate application logic from high-latency LLM inference, employing the Model Context Protocol (MCP) for internal data retrieval and Retrieval-Augmented Generation (RAG). +Dynavera is a multi-agent onboarding platform that combines role-specific training flows, retrieval from organization documents, and LLM-powered guidance. The system is intentionally distributed so that app orchestration and heavy inference can run independently. + +Repository: https://git.cs.bham.ac.uk/projects-2025-26/vxn217 + +--- + +## Table of Contents + +- [At a Glance](#at-a-glance) +- [Inspector & Supervisor Notes](#inspector--supervisor-notes) +- [Screenshots](#screenshots) +- [System Architecture (High-Level)](#system-architecture-high-level) +- [Project Goals](#project-goals) +- [Tech Stack](#tech-stack) +- [Repository Guide](#repository-guide) +- [Evaluation Credentials](#evaluation-credentials) +- [Recommended Evaluation Walkthrough](#recommended-evaluation-walkthrough) +- [Local Setup (Cross-Platform)](#local-setup-cross-platform) +- [Common Commands](#common-commands) +- [Additional Documentation](#additional-documentation) + +--- + +## At a Glance + +Dynavera focuses on one question: **how do we deliver onboarding that is role-aware, context-aware, and operationally practical?** + +The platform does this by combining: + +- A Django management layer for accounts, roles, sessions, and APIs +- An agentic orchestration loop over WebSockets for responsive interactions +- A retrieval layer using pgvector and organization-provided documents +- A GPU inference service for chat completions, embeddings, and chunking support + +--- + +## Inspector & Supervisor Notes + +Primary locations relevant to technical quality, architecture reasoning, and evaluation: + +- Setup, context, and high-level flow: this `README.md` +- Architecture notes: `docs/` +- Orchestration runtime: `apps/onboarding/consumers.py` +- Retrieval bridge and tool routing: `apps/onboarding/mcp.py` +- Ingestion and vectorization pipeline: `apps/knowledge/tasks.py` +- Inference service entrypoint: `gpu_server.py` + +Evaluation-relevant themes represented in the codebase: + +- Role-scoped onboarding generation and progression +- Retrieval grounding through uploaded training files +- Separation of management services and inference services +- End-to-end flow from upload to onboarding completion + +--- + +## Screenshots + +Placeholder slots for final screenshots. + +### Home Page + +![Home Page Placeholder](docs/images/home-page-placeholder.png) + +### Organization Page + +![Organization Page Placeholder](docs/images/organization-page-placeholder.png) + +### Onboarding Loading / Generation State + +![Onboarding Loading Placeholder](docs/images/onboarding-loading-placeholder.png) + +### Onboarding Content Flow + +![Onboarding Flow Placeholder](docs/images/onboarding-flow-placeholder.png) + +--- + +## System Architecture (High-Level) + +At a high level, Dynavera is split into a management side and an inference side. The orchestrator coordinates user interaction, tool calls, and model responses between the two. + +![High Level System Architecture](docs/high-level-system-architecture.png) + +For the fuller architecture narrative (runtime flow and component placement), see: + +- [Distributed Runtime Flow](docs/distributed-runtime-flow.md) --- ## Project Goals -- [x] Distributed Orchestration: Implementation of a dual-node system (VPS/GPU) to manage real-time user interaction and heavy computational inference independently. - -- [x] Context-Aware Training: Development of a RAG pipeline that utilizes semantic chunking and vector similarity search to provide role-specific guidance. - -- [x] Agentic Workflow: Utilizing an orchestrator to manage stateful conversations, tool calls, and user progress tracking via WebSockets. - -- [x] Automated Ingestion: Creating a pipeline for converting raw organizational documents (PDF/TXT) into searchable vector embeddings. - ---- - -## System Architecture - - - -The application is split into two primary layers: - -### Management Layer (VPS) -* **Framework**: Django 5.x with Django Channels for WebSocket management. -* **Database**: PostgreSQL with the pgvector extension for semantic storage. -* **Task Queue**: Celery and Redis for asynchronous document processing and ingestion. -* **Internal Routing**: `apps/onboarding/mcp.py` serves as the Model Context Protocol router, bridging the agent to the PostgreSQL vector store. - -### Intelligence Layer (GPU Node) -* **Inference Server**: `gpu_server.py` (FastAPI) located in the root, exposing endpoints for LLM chat completions and embeddings. -* **Semantic Processor**: Custom logic within the inference server for smart chunking that detects topic shifts in text to optimize retrieval accuracy. +- [x] Distributed orchestration across VPS and GPU nodes +- [x] Context-aware onboarding with RAG (semantic chunking + vector search) +- [x] Stateful agent workflow over WebSockets +- [x] Automated ingestion from role training documents (PDF/TXT) --- ## Tech Stack -* **Backend**: Django, Django REST Framework, Django Channels. -* **Frontend**: Vue 3, Vite, Pinia. -* **Database**: PostgreSQL (pgvector). -* **AI/ML**: FastAPI, OpenAI-compatible API structures, Sentence-Transformers. -* **Infrastructure**: Docker, Redis, Celery. +- **Backend**: Django, Django REST Framework, Django Channels +- **Frontend**: Vue 3, Vite, Pinia +- **Database**: PostgreSQL with pgvector +- **AI/ML**: FastAPI, Sentence Transformers, llama.cpp-compatible serving +- **Infra**: Docker, Redis, Celery --- -## Application Structure +## Repository Guide -* **apps.accounts**: Manages User, Organization, and Role models, including invite-based onboarding logic. -* **apps.knowledge**: Handles the RAG pipeline, including TrainingFile management and RoleRagDocument vector storage. -* **apps.onboarding**: Contains the core logic for the onboarding experience: - * `consumers.py`: The Agent Orchestrator managing WebSocket handshakes and session loops. - * `mcp.py`: The internal router for Model Context Protocol tool execution. - * `models.py`: Stores AgentConfig (prompts/tools) and OnboardingSession state. -* **gpu_server.py**: The entry point for the Intelligence Layer, handling embedding generation and LLM inference. +Key areas in the repo: + +- `apps/accounts`: user model, organization/role ownership, membership flows +- `apps/knowledge`: file ingestion, chunking pipeline, vector document persistence +- `apps/onboarding`: role flows, sessions, websocket orchestration, MCP-style tool routing +- `config/`: settings, API/ASGI routing, environment wiring +- `compose/`: development and production deployment manifests +- `gpu_server.py`: inference and embedding service + +For a more detailed breakdown: + +- [Application Structure (Detailed)](docs/application-structure.md) --- -## Instructions for Evaluation - -The system is currently pre-loaded with demonstration data from internal configuration files. - -### Access Credentials +## Evaluation Credentials | Role | Email | Password | | :--- | :--- | :--- | @@ -68,36 +134,107 @@ The system is currently pre-loaded with demonstration data from internal configu | **Manager** | haleisaac@example.com | password | | **User** | j.thompson@example.com | password | -### Recommended Technical Walkthrough - -To verify the integration of the Knowledge Pipeline and the Agentic Orchestrator, follow these steps: - -1. **Environment Setup**: Navigate to https://fyp.viswamedha.com. * -2. **Document Ingestion**: Log in as the **Manager** (haleisaac@example.com). Navigate to the **University of Birmingham** organization. Upload a PDF relevant to a specific role. -3. **Vectorization**: Observe the ingestion status. The system will extract text, send it to the GPU node for semantic chunking, and store the resulting 1536-dimension vectors in PostgreSQL. -4. **Agent Interaction**: Access the **Role Onboarding** interface. Initiate a session. -5. **Retrieval Verification**: This will query the agent regarding specific details within the uploaded PDF. The agent in `consumers.py` will trigger a tool call via `mcp.py`, retrieve the relevant document chunks, and provide a contextualized response via onboarding pages. - -*Note: If the website that I hosted is not accessible, please set up the project locally by following the instructions in the Usage section below. +Manager registration code: `MANAGER2026` --- -## Usage +## Recommended Evaluation Walkthrough -1. Clone the repository. -2. Copy the `.env.example` file to `.env` or create a new `.env` file based on `.env.template`, and change the necessary environment variables. * -3. Deploy via Docker Compose: `docker compose -f compose/dev/docker-compose.yml --env-file .env up -d` in the root directory. -4. Access the frontend at the configured port (usually `localhost:8000`). +1. Open https://fyp.viswamedha.com +2. Log in as **Manager** and open the target organization +3. Upload a role-relevant document (PDF recommended) +4. Wait for ingestion and embedding completion +5. Start role onboarding and trigger generation +6. Check if responses are grounded in uploaded material +7. Optionally review progress details and logs -* Note: If you use a different secret key, when the fyp-django-dev container starts, you will need to execute the following command to reset all accounts to default passwords of "admin" for admin users and "password" for manager and user accounts: +If the hosted deployment is unavailable, local setup is documented below. + +--- + +## Local Setup (Cross-Platform) + +### Prerequisites + +- Docker Engine / Docker Desktop +- NVIDIA drivers + NVIDIA Container Toolkit (for GPU inference) + +### 1) Clone + +```bash +git clone https://git.cs.bham.ac.uk/projects-2025-26/vxn217 +cd vxn217 +``` + +### 2) Create `.env` + +**PowerShell** + +```powershell +Copy-Item .env.template .env +``` + +**CMD** + +```cmd +copy .env.template .env +``` + +**macOS/Linux** + +```bash +cp .env.template .env +``` + +Then update `.env` values for your environment. + +### 3) Start services (development) + +```bash +docker compose -f compose/dev/docker-compose.yml --env-file .env up -d --build +``` + +### 4) Access endpoints + +- App: http://localhost:8000 + +### 5) Optional: reset seeded passwords ```bash docker exec -it fyp-django-dev python manage.py reset_passwords ``` -### Warnings +Reset defaults: -* The development compose is used here to allow HMR and easier debugging. Please only use this file. -* Ensure that a GPU is available and CUDA drivers are properly installed for the inference server to function. -* I have tested this on an RTX 3060 with 12GB VRAM, so I am not sure if it will work on other GPUs. -* There is no guarantee that it will load on a CPU-only machine as the batch size and model parameters are configured for GPU inference. +- Admin users: `admin` +- Manager and user accounts: `password` + +--- + +## Common Commands + +Stop services: + +```bash +docker compose -f compose/dev/docker-compose.yml --env-file .env down +``` + +Tail logs: + +```bash +docker compose -f compose/dev/docker-compose.yml --env-file .env logs -f +``` + +Run migrations: + +```bash +docker exec -it fyp-django-dev python manage.py migrate +``` + +--- + +## Additional Documentation + +- [Distributed Runtime Flow](docs/distributed-runtime-flow.md) +- [Application Structure (Detailed)](docs/application-structure.md) +- [Deployment Topologies](docs/deployment-topologies.md) diff --git a/docs/application-structure.md b/docs/application-structure.md new file mode 100644 index 0000000..39a4f0e --- /dev/null +++ b/docs/application-structure.md @@ -0,0 +1,64 @@ +# Application Structure (Detailed) + +This page expands on where responsibilities live in the codebase. + +## Core Apps + +### `apps.accounts` + +Handles identity and tenancy concerns: + +- User model and role flags +- Organization ownership and membership +- Role assignment and invite flows + +### `apps.knowledge` + +Handles ingestion and retrieval data prep: + +- Upload and tracking of training files +- Content extraction and chunking pipeline +- Embedding persistence in role-scoped vector documents + +### `apps.onboarding` + +Handles the agentic onboarding runtime: + +- Session and flow models +- WebSocket consumer orchestrator +- Tool routing (MCP-style handler) +- Flow/session APIs for frontend integration + +## Infrastructure Modules + +### `config/*` + +Framework-level config and wiring: + +- Django settings +- URL/API routing +- ASGI/Channels entry points +- Celery config + +### `compose/*` + +Environment-specific deployment configuration: + +- Development compose stack +- Production compose stack +- Inference compose profile + +### `gpu_server.py` + +Inference service entry point: + +- Chat completions endpoint +- Embeddings endpoint +- Semantic chunking endpoint +- Health checks and model lifecycle + +## Navigation + +- [Distributed Runtime Flow](distributed-runtime-flow.md) +- [Deployment Topologies](deployment-topologies.md) +- [Project README](../README.md) diff --git a/docs/deployment-topologies.md b/docs/deployment-topologies.md new file mode 100644 index 0000000..f1455bf --- /dev/null +++ b/docs/deployment-topologies.md @@ -0,0 +1,37 @@ +# Deployment Topologies + +This page compares local and distributed deployment shapes. + +## Local Development Topology + +Purpose: fast iteration and debugging. + +- App services run via `compose/dev/docker-compose.yml` +- Django, Celery, Redis, Postgres, Node, and inference can run together +- Suitable for feature work and integration checks + +## Distributed Topology (VPS + GPU Node) + +Purpose: production-like separation of concerns. + +- **VPS node**: web app, orchestration, API, websocket handling, task queue, database +- **GPU node**: dedicated inference service (chat + embeddings + chunking helpers) +- Request direction is primarily **VPS -> GPU** for model tasks + +## Why Split Nodes? + +- Keeps model latency/VRAM pressure away from user/session services +- Allows independent scaling of orchestration and inference +- Improves operational clarity around failures and bottlenecks + +## Operational Notes + +- Confirm inference host/port values in runtime container env +- Confirm pgvector extension is enabled in target database +- Keep role flow generation permissions constrained to trusted user types + +## Navigation + +- [Distributed Runtime Flow](distributed-runtime-flow.md) +- [Application Structure (Detailed)](application-structure.md) +- [Project README](../README.md) diff --git a/docs/distributed-runtime-flow.md b/docs/distributed-runtime-flow.md new file mode 100644 index 0000000..3b764e5 --- /dev/null +++ b/docs/distributed-runtime-flow.md @@ -0,0 +1,54 @@ +# Distributed Runtime Flow + +Dynavera behaves like a streaming agentic system rather than a simple CRUD app. Runtime responsibility is split into three buckets. + +## 1) MCP Surface (Django-side tool layer) + +This is the tool-facing layer that lets the model request structured actions such as retrieval and session updates. + +Typical tool intents: + +- `search_knowledge(query, role_uuid)` +- `get_user_progress(user/session context)` +- `update_session_state(session_uuid, patch)` + +Conceptually, this layer translates model tool calls into standard Django queries and vector lookups. + +## 2) Orchestrator (Channels consumer + async control loop) + +The orchestrator lives in the WebSocket runtime and coordinates each user request lifecycle. + +Typical interaction path: + +1. User sends message over WebSocket +2. Orchestrator builds/updates context +3. Orchestrator calls inference endpoint +4. Model requests tool calls when needed +5. Orchestrator executes tool calls and continues generation +6. Streamed/assembled response returns to user + +This is the central control plane for session continuity, tool usage, and response streaming. + +## 3) GPU Inference Pipe (passive engine) + +The GPU service is designed as a passive inference engine: + +- Receives prompts/inference payloads +- Produces chat/embedding outputs +- Does not initiate calls back into the VPS + +Using OpenAI-style request/response patterns keeps integration predictable. + +## Interface Summary + +| Component | Typical Path / Endpoint | Role | +| :--- | :--- | :--- | +| MCP Surface | Internal Django tool handlers (and/or MCP endpoint) | Data/tool translation | +| Orchestrator | `apps.onboarding.consumers` | Coordination + streaming | +| GPU Inference | `gpu_server.py` HTTP endpoints | Generation + embeddings | + +## Navigation + +- [Application Structure (Detailed)](application-structure.md) +- [Deployment Topologies](deployment-topologies.md) +- [Project README](../README.md) diff --git a/docs/high-level-system-architecture.png b/docs/high-level-system-architecture.png new file mode 100644 index 0000000..1c0763e Binary files /dev/null and b/docs/high-level-system-architecture.png differ