diff --git a/README.md b/README.md
index 6d636d2..9ea78d0 100644
--- a/README.md
+++ b/README.md
@@ -1,66 +1,132 @@
 # Dynavera: Distributed Agentic Onboarding System
 
-Dynavera is a multi-agent AI platform designed to automate role-specific onboarding. The system utilizes a distributed architecture to separate application logic from high-latency LLM inference, employing the Model Context Protocol (MCP) for internal data retrieval and Retrieval-Augmented Generation (RAG).
+Dynavera is a multi-agent onboarding platform that combines role-specific training flows, retrieval from organization documents, and LLM-powered guidance. The system is intentionally distributed so that app orchestration and heavy inference can run independently.
+
+Repository: https://git.cs.bham.ac.uk/projects-2025-26/vxn217
+
+---
+
+## Table of Contents
+
+- [At a Glance](#at-a-glance)
+- [Inspector & Supervisor Notes](#inspector--supervisor-notes)
+- [Screenshots](#screenshots)
+- [System Architecture (High-Level)](#system-architecture-high-level)
+- [Project Goals](#project-goals)
+- [Tech Stack](#tech-stack)
+- [Repository Guide](#repository-guide)
+- [Evaluation Credentials](#evaluation-credentials)
+- [Recommended Evaluation Walkthrough](#recommended-evaluation-walkthrough)
+- [Local Setup (Cross-Platform)](#local-setup-cross-platform)
+- [Common Commands](#common-commands)
+- [Additional Documentation](#additional-documentation)
+
+---
+
+## At a Glance
+
+Dynavera focuses on one question: **how do we deliver onboarding that is role-aware, context-aware, and operationally practical?**
+
+The platform does this by combining:
+
+- A Django management layer for accounts, roles, sessions, and APIs
+- An agentic orchestration loop over WebSockets for responsive interactions
+- A retrieval layer using pgvector and organization-provided documents
+- A GPU inference service for chat completions, embeddings, and chunking support
+
+---
+
+## Inspector & Supervisor Notes
+
+Primary locations relevant to technical quality, architecture reasoning, and evaluation:
+
+- Setup, context, and high-level flow: this `README.md`
+- Architecture notes: `docs/`
+- Orchestration runtime: `apps/onboarding/consumers.py`
+- Retrieval bridge and tool routing: `apps/onboarding/mcp.py`
+- Ingestion and vectorization pipeline: `apps/knowledge/tasks.py`
+- Inference service entrypoint: `gpu_server.py`
+
+Evaluation-relevant themes represented in the codebase:
+
+- Role-scoped onboarding generation and progression
+- Retrieval grounding through uploaded training files
+- Separation of management services and inference services
+- End-to-end flow from upload to onboarding completion
+
+---
+
+## Screenshots
+
+Placeholder slots for final screenshots.
+
+### Home Page
+
+![Home Page Placeholder](docs/images/home-page-placeholder.png)
+
+### Organization Page
+
+![Organization Page Placeholder](docs/images/organization-page-placeholder.png)
+
+### Onboarding Loading / Generation State
+
+![Onboarding Loading Placeholder](docs/images/onboarding-loading-placeholder.png)
+
+### Onboarding Content Flow
+
+![Onboarding Flow Placeholder](docs/images/onboarding-flow-placeholder.png)
+
+---
+
+## System Architecture (High-Level)
+
+At a high level, Dynavera is split into a management side and an inference side. The orchestrator coordinates user interaction, tool calls, and model responses between the two.
+
+![High Level System Architecture](docs/high-level-system-architecture.png)
+
+For the fuller architecture narrative (runtime flow and component placement), see:
+
+- [Distributed Runtime Flow](docs/distributed-runtime-flow.md)
 
 ---
 
 ## Project Goals
 
-- [x] Distributed Orchestration: Implementation of a dual-node system (VPS/GPU) to manage real-time user interaction and heavy computational inference independently.
-
-- [x] Context-Aware Training: Development of a RAG pipeline that utilizes semantic chunking and vector similarity search to provide role-specific guidance.
-
-- [x] Agentic Workflow: Utilizing an orchestrator to manage stateful conversations, tool calls, and user progress tracking via WebSockets.
-
-- [x] Automated Ingestion: Creating a pipeline for converting raw organizational documents (PDF/TXT) into searchable vector embeddings.
-
----
-
-## System Architecture
-
-
-
-The application is split into two primary layers:
-
-### Management Layer (VPS)
-* **Framework**: Django 5.x with Django Channels for WebSocket management.
-* **Database**: PostgreSQL with the pgvector extension for semantic storage.
-* **Task Queue**: Celery and Redis for asynchronous document processing and ingestion.
-* **Internal Routing**: `apps/onboarding/mcp.py` serves as the Model Context Protocol router, bridging the agent to the PostgreSQL vector store.
-
-### Intelligence Layer (GPU Node)
-* **Inference Server**: `gpu_server.py` (FastAPI) located in the root, exposing endpoints for LLM chat completions and embeddings.
-* **Semantic Processor**: Custom logic within the inference server for smart chunking that detects topic shifts in text to optimize retrieval accuracy.
+- [x] Distributed orchestration across VPS and GPU nodes
+- [x] Context-aware onboarding with RAG (semantic chunking + vector search)
+- [x] Stateful agent workflow over WebSockets
+- [x] Automated ingestion from role training documents (PDF/TXT)
 
 ---
 
 ## Tech Stack
 
-* **Backend**: Django, Django REST Framework, Django Channels.
-* **Frontend**: Vue 3, Vite, Pinia.
-* **Database**: PostgreSQL (pgvector).
-* **AI/ML**: FastAPI, OpenAI-compatible API structures, Sentence-Transformers.
-* **Infrastructure**: Docker, Redis, Celery.
+- **Backend**: Django, Django REST Framework, Django Channels
+- **Frontend**: Vue 3, Vite, Pinia
+- **Database**: PostgreSQL with pgvector
+- **AI/ML**: FastAPI, Sentence Transformers, llama.cpp-compatible serving
+- **Infra**: Docker, Redis, Celery
 
 ---
 
-## Application Structure
+## Repository Guide
 
-* **apps.accounts**: Manages User, Organization, and Role models, including invite-based onboarding logic.
-* **apps.knowledge**: Handles the RAG pipeline, including TrainingFile management and RoleRagDocument vector storage.
-* **apps.onboarding**: Contains the core logic for the onboarding experience:
-    * `consumers.py`: The Agent Orchestrator managing WebSocket handshakes and session loops.
-    * `mcp.py`: The internal router for Model Context Protocol tool execution.
-    * `models.py`: Stores AgentConfig (prompts/tools) and OnboardingSession state.
-* **gpu_server.py**: The entry point for the Intelligence Layer, handling embedding generation and LLM inference.
+Key areas in the repo:
+
+- `apps/accounts`: user model, organization/role ownership, membership flows
+- `apps/knowledge`: file ingestion, chunking pipeline, vector document persistence
+- `apps/onboarding`: role flows, sessions, websocket orchestration, MCP-style tool routing
+- `config/`: settings, API/ASGI routing, environment wiring
+- `compose/`: development and production deployment manifests
+- `gpu_server.py`: inference and embedding service
+
+For a more detailed breakdown:
+
+- [Application Structure (Detailed)](docs/application-structure.md)
 
 ---
 
-## Instructions for Evaluation
-
-The system is currently pre-loaded with demonstration data from internal configuration files.
-
-### Access Credentials
+## Evaluation Credentials
 
 | Role | Email | Password |
 | :--- | :--- | :--- |
@@ -68,36 +134,107 @@ The system is currently pre-loaded with demonstration data from internal configu
 | **Manager** | haleisaac@example.com | password |
 | **User** | j.thompson@example.com | password |
 
-### Recommended Technical Walkthrough
-
-To verify the integration of the Knowledge Pipeline and the Agentic Orchestrator, follow these steps:
-
-1. **Environment Setup**: Navigate to https://fyp.viswamedha.com. *
-2. **Document Ingestion**: Log in as the **Manager** (haleisaac@example.com). Navigate to the **University of Birmingham** organization. Upload a PDF relevant to a specific role.
-3. **Vectorization**: Observe the ingestion status. The system will extract text, send it to the GPU node for semantic chunking, and store the resulting 1536-dimension vectors in PostgreSQL.
-4. **Agent Interaction**: Access the **Role Onboarding** interface. Initiate a session.
-5. **Retrieval Verification**: This will query the agent regarding specific details within the uploaded PDF. The agent in `consumers.py` will trigger a tool call via `mcp.py`, retrieve the relevant document chunks, and provide a contextualized response via onboarding pages.
-
-*Note: If the website that I hosted is not accessible, please set up the project locally by following the instructions in the Usage section below.
+Manager registration code: `MANAGER2026`
 
 ---
 
-## Usage
+## Recommended Evaluation Walkthrough
 
-1. Clone the repository.
-2. Copy the `.env.example` file to `.env` or create a new `.env` file based on `.env.template`, and change the necessary environment variables. *
-3. Deploy via Docker Compose: `docker compose -f compose/dev/docker-compose.yml --env-file .env up -d` in the root directory.
-4. Access the frontend at the configured port (usually `localhost:8000`).
+1. Open https://fyp.viswamedha.com
+2. Log in as **Manager** and open the target organization
+3. Upload a role-relevant document (PDF recommended)
+4. Wait for ingestion and embedding completion
+5. Start role onboarding and trigger generation
+6. Check if responses are grounded in uploaded material
+7. Optionally review progress details and logs
 
-* Note: If you use a different secret key, when the fyp-django-dev container starts, you will need to execute the following command to reset all accounts to default passwords of "admin" for admin users and "password" for manager and user accounts:
+If the hosted deployment is unavailable, local setup is documented below.
+
+---
+
+## Local Setup (Cross-Platform)
+
+### Prerequisites
+
+- Docker Engine / Docker Desktop
+- NVIDIA drivers + NVIDIA Container Toolkit (for GPU inference)
+
+### 1) Clone
+
+```bash
+git clone https://git.cs.bham.ac.uk/projects-2025-26/vxn217
+cd vxn217
+```
+
+### 2) Create `.env`
+
+**PowerShell**
+
+```powershell
+Copy-Item .env.template .env
+```
+
+**CMD**
+
+```cmd
+copy .env.template .env
+```
+
+**macOS/Linux**
+
+```bash
+cp .env.template .env
+```
+
+Then update `.env` values for your environment.
+
+### 3) Start services (development)
+
+```bash
+docker compose -f compose/dev/docker-compose.yml --env-file .env up -d --build
+```
+
+### 4) Access endpoints
+
+- App: http://localhost:8000
+
+### 5) Optional: reset seeded passwords
 
 ```bash
 docker exec -it fyp-django-dev python manage.py reset_passwords
 ```
 
-### Warnings
+Reset defaults:
 
-* The development compose is used here to allow HMR and easier debugging. Please only use this file.
-* Ensure that a GPU is available and CUDA drivers are properly installed for the inference server to function.
-* I have tested this on an RTX 3060 with 12GB VRAM, so I am not sure if it will work on other GPUs. 
-* There is no guarantee that it will load on a CPU-only machine as the batch size and model parameters are configured for GPU inference.
+- Admin users: `admin`
+- Manager and user accounts: `password`
+
+---
+
+## Common Commands
+
+Stop services:
+
+```bash
+docker compose -f compose/dev/docker-compose.yml --env-file .env down
+```
+
+Tail logs:
+
+```bash
+docker compose -f compose/dev/docker-compose.yml --env-file .env logs -f
+```
+
+Run migrations:
+
+```bash
+docker exec -it fyp-django-dev python manage.py migrate
+```
+
+---
+
+## Additional Documentation
+
+- [Distributed Runtime Flow](docs/distributed-runtime-flow.md)
+- [Application Structure (Detailed)](docs/application-structure.md)
+- [Deployment Topologies](docs/deployment-topologies.md)
diff --git a/docs/application-structure.md b/docs/application-structure.md
new file mode 100644
index 0000000..39a4f0e
--- /dev/null
+++ b/docs/application-structure.md
@@ -0,0 +1,64 @@
+# Application Structure (Detailed)
+
+This page expands on where responsibilities live in the codebase.
+
+## Core Apps
+
+### `apps.accounts`
+
+Handles identity and tenancy concerns:
+
+- User model and role flags
+- Organization ownership and membership
+- Role assignment and invite flows
+
+### `apps.knowledge`
+
+Handles ingestion and retrieval data prep:
+
+- Upload and tracking of training files
+- Content extraction and chunking pipeline
+- Embedding persistence in role-scoped vector documents
+
+### `apps.onboarding`
+
+Handles the agentic onboarding runtime:
+
+- Session and flow models
+- WebSocket consumer orchestrator
+- Tool routing (MCP-style handler)
+- Flow/session APIs for frontend integration
+
+## Infrastructure Modules
+
+### `config/*`
+
+Framework-level config and wiring:
+
+- Django settings
+- URL/API routing
+- ASGI/Channels entry points
+- Celery config
+
+### `compose/*`
+
+Environment-specific deployment configuration:
+
+- Development compose stack
+- Production compose stack
+- Inference compose profile
+
+### `gpu_server.py`
+
+Inference service entry point:
+
+- Chat completions endpoint
+- Embeddings endpoint
+- Semantic chunking endpoint
+- Health checks and model lifecycle
+
+## Navigation
+
+- [Distributed Runtime Flow](distributed-runtime-flow.md)
+- [Deployment Topologies](deployment-topologies.md)
+- [Project README](../README.md)
diff --git a/docs/deployment-topologies.md b/docs/deployment-topologies.md
new file mode 100644
index 0000000..f1455bf
--- /dev/null
+++ b/docs/deployment-topologies.md
@@ -0,0 +1,37 @@
+# Deployment Topologies
+
+This page compares local and distributed deployment shapes.
+
+## Local Development Topology
+
+Purpose: fast iteration and debugging.
+
+- App services run via `compose/dev/docker-compose.yml`
+- Django, Celery, Redis, Postgres, Node, and inference can run together
+- Suitable for feature work and integration checks
+
+## Distributed Topology (VPS + GPU Node)
+
+Purpose: production-like separation of concerns.
+
+- **VPS node**: web app, orchestration, API, websocket handling, task queue, database
+- **GPU node**: dedicated inference service (chat + embeddings + chunking helpers)
+- Request direction is primarily **VPS -> GPU** for model tasks
+
+## Why Split Nodes?
+
+- Keeps model latency/VRAM pressure away from user/session services
+- Allows independent scaling of orchestration and inference
+- Improves operational clarity around failures and bottlenecks
+
+## Operational Notes
+
+- Confirm inference host/port values in runtime container env
+- Confirm pgvector extension is enabled in target database
+- Keep role flow generation permissions constrained to trusted user types
+
+## Navigation
+
+- [Distributed Runtime Flow](distributed-runtime-flow.md)
+- [Application Structure (Detailed)](application-structure.md)
+- [Project README](../README.md)
diff --git a/docs/distributed-runtime-flow.md b/docs/distributed-runtime-flow.md
new file mode 100644
index 0000000..3b764e5
--- /dev/null
+++ b/docs/distributed-runtime-flow.md
@@ -0,0 +1,54 @@
+# Distributed Runtime Flow
+
+Dynavera behaves like a streaming agentic system rather than a simple CRUD app. Runtime responsibility is split into three buckets.
+
+## 1) MCP Surface (Django-side tool layer)
+
+This is the tool-facing layer that lets the model request structured actions such as retrieval and session updates.
+
+Typical tool intents:
+
+- `search_knowledge(query, role_uuid)`
+- `get_user_progress(user/session context)`
+- `update_session_state(session_uuid, patch)`
+
+Conceptually, this layer translates model tool calls into standard Django queries and vector lookups.
+
+## 2) Orchestrator (Channels consumer + async control loop)
+
+The orchestrator lives in the WebSocket runtime and coordinates each user request lifecycle.
+
+Typical interaction path:
+
+1. User sends message over WebSocket
+2. Orchestrator builds/updates context
+3. Orchestrator calls inference endpoint
+4. Model requests tool calls when needed
+5. Orchestrator executes tool calls and continues generation
+6. Streamed/assembled response returns to user
+
+This is the central control plane for session continuity, tool usage, and response streaming.
+
+## 3) GPU Inference Pipe (passive engine)
+
+The GPU service is designed as a passive inference engine:
+
+- Receives prompts/inference payloads
+- Produces chat/embedding outputs
+- Does not initiate calls back into the VPS
+
+Using OpenAI-style request/response patterns keeps integration predictable.
+
+## Interface Summary
+
+| Component | Typical Path / Endpoint | Role |
+| :--- | :--- | :--- |
+| MCP Surface | Internal Django tool handlers (and/or MCP endpoint) | Data/tool translation |
+| Orchestrator | `apps.onboarding.consumers` | Coordination + streaming |
+| GPU Inference | `gpu_server.py` HTTP endpoints | Generation + embeddings |
+
+## Navigation
+
+- [Application Structure (Detailed)](application-structure.md)
+- [Deployment Topologies](deployment-topologies.md)
+- [Project README](../README.md)
diff --git a/docs/high-level-system-architecture.png b/docs/high-level-system-architecture.png
new file mode 100644
index 0000000..1c0763e
Binary files /dev/null and b/docs/high-level-system-architecture.png differ