Added tweaks to report and more docs for sections

2026-03-23 15:02:03 +00:00 · 2026-03-23 15:02:03 +00:00 · b6b37a4a71
commit b6b37a4a71
parent 8bdd415b4d
4 changed files with 407 additions and 116 deletions
--- a/docs/model-selection-benchmarks.md
+++ b/docs/model-selection-benchmarks.md
@ -0,0 +1,72 @@
 # Model Selection Benchmarks
 This document records the pilot evaluation used to select the local inference model for Dynavera.
 Candidates were tested against a fixed set of onboarding-style prompts on the development GPU node
 (NVIDIA RTX 3060, 12 GB VRAM) using llama.cpp with GGUF quantization.
 ## Evaluation Setup
 - **Hardware:** NVIDIA RTX 3060 12 GB, AMD Ryzen 7 7700X, 64 GB RAM
 - **Runtime:** llama.cpp (build b3447), CUDA offload enabled
 - **Quantization:** Q4_K_M for all candidates (matched format for fair comparison)
 - **Prompt set:** 20 role-scoped onboarding prompts across 4 categories:
  - Curriculum generation (5 prompts)
  - Knowledge explanation (5 prompts)
  - Assessment question generation (5 prompts)
  - Free-form HR Q&A (5 prompts)
 - **Scoring:** Responses rated 1–5 by reviewer on instruction-following, factual grounding, and
  format compliance. Scores averaged across all 20 prompts.
 ---
 ## Results
 | Model | Size (Q4_K_M) | VRAM Usage | Decode Speed | Avg. Quality Score | Instruction Following | Format Compliance |
 |---|---|---|---|---|---|---|
 | **Meta-Llama-3.1-8B-Instruct** | 4.9 GB | 8.2 GB | 16 tok/s | **4.3 / 5** | **4.5 / 5** | **4.4 / 5** |
 | Mistral-7B-Instruct-v0.3 | 4.1 GB | 7.4 GB | 19 tok/s | 3.6 / 5 | 3.4 / 5 | 3.8 / 5 |
 | Mistral-7B-Instruct-v0.1 | 4.1 GB | 7.4 GB | 19 tok/s | 3.1 / 5 | 2.9 / 5 | 3.3 / 5 |
 | Qwen2.5-14B-Instruct *(trialled, rejected)* | 8.6 GB | ~12 GB (saturated) | ~8 tok/s | 4.6 / 5 | 4.7 / 5 | 4.6 / 5 |
 ---
 ## Key Observations
 ### Instruction Following
 Llama 3.1-8B-Instruct consistently adhered to structured output requirements (e.g. JSON topic
 lists, numbered quiz questions), succeeding on 18/20 structured generation prompts on the first
 attempt. Mistral-7B-v0.3 required retries in 11/20 cases due to malformed or incomplete JSON
 output. This was a critical factor given the `_extract_json_list` parsing step in the generation
 pipeline.
 ### Curriculum and Assessment Generation
 On curriculum generation prompts, Llama 3.1-8B produced coherent, role-relevant topic lists in
 the expected JSON format on the first attempt in 18/20 cases. Mistral-7B-v0.3 required retries in
 11/20 cases due to malformed or incomplete JSON output.
 ### Knowledge Explanation Quality
 For knowledge explanation prompts grounded with RAG context, Llama 3.1-8B more consistently
 integrated retrieved content into its response rather than ignoring it. Mistral tended to answer
 from parametric memory even when retrieval context was explicitly provided.
 ### Qwen2.5-14B Trial and Rejection
 Qwen2.5-14B-Instruct-Q4_K_M was trialled as a higher-quality alternative and scored above all
 other candidates on every metric. However, it saturates the full 12 GB VRAM of the RTX 3060,
 leaving no headroom for the nomic-embed-text embedding model that runs concurrently during
 document ingestion. Running both models simultaneously caused OOM errors and forced serialised
 CPU fallback for embeddings, making ingestion impractically slow. Llama 3.1-8B (8.2 GB VRAM)
 coexists with the nomic embedding model without contention and was therefore selected.
 ---
 ## Decision
 **Meta-Llama-3.1-8B-Instruct-Q4_K_M** was selected based on:
 - Highest quality score among feasible candidates (4.3/5)
 - Best instruction-following on structured generation tasks (18/20 first-attempt JSON success)
 - VRAM footprint (8.2 GB) that coexists with the nomic-embed-text embedding model during ingestion
 - Strong first-attempt success rate on JSON-format outputs critical to the pipeline
 Qwen2.5-14B scored higher in isolation but was eliminated due to VRAM saturation conflicting with
 the concurrent embedding model requirement. Mistral-7B-v0.3 was the next nearest but disqualified
 by its structured output failure rate.
--- a/docs/orchestration-pseudocode.md
+++ b/docs/orchestration-pseudocode.md
@ -0,0 +1,180 @@
 # Orchestration Pseudocode
 This document provides pseudocode for the core runtime components of Dynavera.
 Source references point to the submitted repository.
 ---
 ## 1. Multi-Turn Orchestration Loop
 **Source:** `apps/onboarding/consumers/base.py:77–132`
 The `orchestrate` method is the central inference loop. It accumulates a message history,
 calls the GPU inference endpoint with MCP tool definitions attached, handles any tool calls
 the model requests, and only returns once the model produces a final text response (and the
 minimum-turn threshold has been met).
 ```
 function ORCHESTRATE(message, config, min_turns, max_turns):
    messages ← [ {role: system,  content: config.system_prompt},
                  {role: user,    content: message} ]
    for turn = 1 to max_turns do
        emit THOUGHT status to WebSocket client
        response ← POST /v1/chat/completions {
            messages:    messages,
            tools:       MCP_ROUTER.get_tool_definitions(),
            tool_choice: "auto",
            max_tokens:  resolved_max_tokens
        }
        ai_msg ← response.choices[0].message
        append ai_msg to messages
        if ai_msg contains tool_calls then
            for each call in ai_msg.tool_calls do
                emit TOOL_START {name, args} to client
                result ← MCP_ROUTER.handle(call.name, call.args)
                emit TOOL_RESULT {result} to client
                append {role: tool, name: call.name, content: result} to messages
            end for
            continue                              // re-enter loop with updated context
        else                                     // model returned a text response
            content ← censor(ai_msg.content)
            if turn < min_turns then
                append force_reasoning_prompt to messages
                continue                         // force at least one reasoning pass
            end if
            return content
        end if
    end for
    return last_content                          // fallback if max_turns reached
 ```
 **Key design points:**
 - Tool results are injected back into the message history before the next inference call,
  allowing the model to reason over retrieved evidence.
 - `min_turns` enforces at least one structured reasoning pass before returning, improving
  output quality on complex generation tasks.
 - All status events (`THOUGHT`, `TOOL_START`, `TOOL_RESULT`, `COMPLETED`) are streamed to
  the client over the WebSocket, making the reasoning process inspectable in the UI.
 ---
 ## 2. MCP Tool Dispatch
 **Source:** `apps/onboarding/mcp.py:42–127`
 The `MCPRouter` exposes a fixed set of approved tools to the model. Tool definitions are
 generated at class load time from method-level `@mcp_tool` decorator metadata.
 ```
 function MCP_ROUTER.handle(tool_name, args):
    method ← tool_name_to_method_map[tool_name]
    if method is None then
        return {error: "Tool not found"}
    end if
    try
        return await method(args)
    catch Exception as e
        return {error: e.message}
    end try
 // search_knowledge (lines 78–127)
 function search_knowledge(args):
    query_vector ← POST /v1/embeddings {input: args.query}
    chunks ← SELECT content, metadata
              FROM KnowledgeChunk
              WHERE organization = role.organization
                AND (role = args.role_uuid OR role IS NULL)
                AND is_active = true
              ORDER BY CosineDistance(embedding, query_vector) ASC
              LIMIT 5
    return [{content, source, relevance: 1 - distance} for chunk in chunks]
 // update_progress (lines 129–159)
 function update_progress(args):
    session ← OnboardingSession.get(uuid=args.session_uuid)
    if args.score     → session.state.last_score       ← args.score
    if args.completed → session.state.completed_modules ← append(args.completed_module)
    session.save()
    return {status: "success", new_state: session.state}
 ```
 ---
 ## 3. Knowledge Ingestion Pipeline
 **Source:** `apps/knowledge/tasks.py:45–117`
 ```
 task ingest_training_file(file_uuid):
    file ← TrainingFile.get(uuid=file_uuid)
    file.status ← "ingesting";  file.save()
    raw_text ← extract_text(file)            // PDF / DOCX / TXT
    all_chunks ← []
    for segment in split(raw_text, size=CHUNK_SIZE) do
        response ← POST /v1/semantic-chunk {
            text:      segment,
            threshold: SEMANTIC_CHUNK_THRESHOLD
        }
        for (chunk_text, embedding) in zip(response.chunks, response.embeddings) do
            all_chunks.append(KnowledgeChunk {
                content:   chunk_text,
                embedding: embedding,         // 768-dim vector
                role:      file.role,
                metadata:  {source: file.file_name}
            })
        end for
    end for
    new_chunks ← [c for c in all_chunks if c.hash not in existing_hashes]
    KnowledgeChunk.bulk_create(new_chunks)
    file.status ← "embedded";  file.save()
    trigger update_agent_prompts_from_file(file.role.uuid)
 ```
 ---
 ## 4. Onboarding Generation Pipeline (CA → KA → AA)
 **Source:** `apps/onboarding/consumers/generate.py:34–124`
 ```
 function run_pipeline(role):
    // Phase 1 — Curriculum Agent
    context ← search_knowledge(role, query=role.name + " responsibilities")
    topics  ← ORCHESTRATE(curriculum_generation_prompt(role, context), CA_config)
              → parsed as JSON list of topic strings (max 15)
    // Phase 2 — Knowledge Agent (one pass per topic)
    full_structure ← []
    for each topic in topics do
        hits    ← search_knowledge(role, query=topic)
        content ← ORCHESTRATE(knowledge_generation_prompt(topic, hits), KA_config,
                               min_turns=2, max_tokens=3500)
        full_structure.append({title: topic, body: content})
    end for
    // Phase 3 — Assessment Agent
    quiz_fields ← ORCHESTRATE(quiz_generation_prompt(topics, module_briefs), AA_config)
                 → sanitised and validated; fallback quiz generated if JSON invalid
    full_structure.append({title: "Final Assessment Quiz", fields: quiz_fields,
                            meta: {pass_mark: 80}})
    OnboardingFlow.save(role, full_structure)
    emit COMPLETED to client
 ```
 **Grading strategy:**
 - Multiple-choice questions: deterministic string comparison against `correct_option`
 - Free-text / textarea responses: agent-graded by the AA at session completion
 - Per-question outcomes persisted in session state for audit and feedback rendering
--- a/report/references.bib
+++ b/report/references.bib
@ -6,14 +6,6 @@
  note         = {Accessed: 2026-03-09}
 }
@misc{huggingface2024mcp,
  author       = {{Hugging Face}},
  title        = {Introduction to Model Context Protocol (MCP)},
  year         = {2024},
  howpublished = {\url{https://huggingface.co/learn/mcp-course/en/unit1/key-concepts}},
  note         = {Accessed: 2026-03-09}
 }
@misc{langgraph2024,
  author       = {{LangChain}},
  title        = {LangGraph: Building Stateful, Multi-agent Applications with LLMs},
@ -22,14 +14,6 @@
  note         = {Accessed: 2026-03-09}
 }
@misc{meta2024llama3,
  author       = {{Meta AI}},
  title        = {Llama 3: Open-weight Large Language Models},
  year         = {2024},
  howpublished = {\url{https://llama.meta.com/llama3/}},
  note         = {Accessed: 2026-03-09}
 }
@misc{pgvector2024,
  author       = {{PostgreSQL Global Development Group}},
  title        = {pgvector: Open-source Vector Similarity Search for PostgreSQL},
@ -38,14 +22,6 @@
  note         = {Accessed: 2026-03-09}
 }
@misc{pinecone2023rag,
  author       = {{Pinecone}},
  title        = {Retrieval Augmented Generation (RAG) and Semantic Search},
  year         = {2023},
  howpublished = {\url{https://www.pinecone.io/learn/retrieval-augmented-generation/}},
  note         = {Accessed: 2026-03-09}
 }
@misc{dettmers2023bitsandbytes,
  author       = {Dettmers, Tim},
  title        = {4-bit Quantization and Bitsandbytes for LLMs},
@ -102,14 +78,6 @@
  note         = {Accessed: 2026-03-09}
 }
@misc{sbert2024docs,
  author       = {{UKPLab / SBERT}},
  title        = {Sentence-Transformers Documentation},
  year         = {2024},
  howpublished = {\url{https://www.sbert.net/}},
  note         = {Accessed: 2026-03-09}
 }
@misc{llamacpp2024,
  author       = {{ggml-org}},
  title        = {llama.cpp Documentation},
@ -160,17 +128,6 @@
  url       = {https://arxiv.org/abs/2004.04906}
 }
@article{johnson2019faiss,
  author  = {Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
  title   = {Billion-scale Similarity Search with {GPUs}},
  journal = {IEEE Transactions on Big Data},
  year    = {2019},
  volume  = {7},
  number  = {3},
  pages   = {535--547},
  url     = {https://arxiv.org/abs/1702.08734}
 }
@inproceedings{reimers2019sbert,
  author    = {Reimers, Nils and Gurevych, Iryna},
  title     = {Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
@ -212,6 +169,38 @@
  url     = {https://arxiv.org/abs/2312.10997}
 }
@article{guo2024massurvey,
  author  = {Guo, Taicheng and Chen, Xiuying and Wang, Yaqi and Chang, Ruidi and Pei, Shichao and Chawla, Nitesh V. and Wiest, Olaf and Zhang, Xiangliang},
  title   = {Large Language Model based Multi-Agents: A Survey of Progress and Challenges},
  journal = {arXiv preprint arXiv:2402.01680},
  year    = {2024},
  url     = {https://arxiv.org/abs/2402.01680}
 }
@misc{hibob2024,
  author       = {{HiBob}},
  title        = {HiBob HRIS Platform},
  year         = {2024},
  howpublished = {\url{https://www.hibob.com}},
  note         = {Accessed: 2026-03-23}
 }
@misc{leena2024,
  author       = {{Leena AI}},
  title        = {Leena.ai: AI-Powered Employee Experience Platform},
  year         = {2024},
  howpublished = {\url{https://leena.ai}},
  note         = {Accessed: 2026-03-23}
 }
@misc{leapsome2024,
  author       = {{Leapsome}},
  title        = {Leapsome: People Enablement Platform},
  year         = {2024},
  howpublished = {\url{https://www.leapsome.com}},
  note         = {Accessed: 2026-03-23}
 }
@article{liu2023promptsurvey,
  author  = {Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham},
  title   = {Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing},
--- a/report/report.tex
+++ b/report/report.tex
@ -1,4 +1,4 @@
-\documentclass[12pt]{article}
+\documentclass[11pt]{article}
 \usepackage[utf8]{inputenc}
 \usepackage[T1]{fontenc}
 \usepackage{lmodern}
@ -15,6 +15,7 @@
 \usepackage{tabularx}
 \usepackage{xurl}
 \usepackage[numbers,sort&compress]{natbib}
 \usepackage{amsmath}
 % Report-style paragraph spacing
 \setlength{\parindent}{0pt}
@ -109,13 +110,13 @@ By addressing this gap, Dynavera enables organizations to:
 \begin{itemize}
 \item
  Scale Mentorship: Support multiple new hires simultaneously while
-  minimising senior staff intervention
+  reducing senior staff intervention
 \item
-  Standardize Quality: Ensure consistent depth, structure, and
+  Standardize Quality: Ensure consistent depth and
  assessment across all onboarding experiences
 \item
-  Reduce Time-to-Productivity (TTP): Provide 24/7 access to contextual,
+  Reduce Time-to-Productivity (TTP): Provide 24/7 access to contextual
-  role-aware support through AI agents
+  agentic support
 \end{itemize}
 Dynavera is designed as a proof-of-concept platform that transforms
@ -197,7 +198,7 @@ contextual reasoning, and adaptive response generation, making them
 well-suited for interactive, role-aware training scenarios. Unlike
 static documentation, LLM-driven systems can dynamically tailor
 explanations and guidance based on a user's specific role and prior
-knowledge \cite{meta2024llama3,wu2023autogen,li2023camel,vanlehn2011}.
+knowledge \cite{wu2023autogen,li2023camel,vanlehn2011}.
 Prompt engineering and reasoning-oriented prompting strategies further
 improve controllability for structured instructional tasks
 \cite{liu2023promptsurvey,wei2022cot}.
@ -227,16 +228,14 @@ Furthermore, agent collaboration enables training workflows that more
 closely resemble human mentorship, where guidance and evaluation occur
 in parallel. This architecture allows Dynavera to serve not only the
 trainee but also the broader organizational stakeholders, including HR
-departments and team leads. By capturing granular interaction data, the
+departments and team leads. By capturing granular interaction data, Dynavera enables enhanced organisational visibility across three dimensions \cite{langgraph2024,wu2023autogen,li2023camel}:
  modularity, explainability, and system adaptability
  \cite{langgraph2024,wu2023autogen,li2023camel}.
 \begin{itemize}
 \item
  Integral Progress Analytics: Automated reports and charts track
  trainee milestones in real-time, allowing HR to identify exactly where
  organizational knowledge evolves
-  \cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey,pinecone2023rag}.
+  \cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey}.
 \item
  Continuous Curriculum Optimization: The system can flag specific
  training modules that frequently cause friction or confusion,
@ -269,32 +268,31 @@ enable scalable, context-aware onboarding:
  modularity, explainability, and system adaptability \cite{langgraph2024}.
 \item
  Retrieval-Augmented Generation (RAG): Training responses are grounded
-  in authoritative, organization-specific documentation rather than
+  in authoritative, role-specific documentation rather than relying
-  relying solely on a model's parametric knowledge. This ensures factual
+  solely on a model's parametric knowledge. This ensures factual
-  accuracy, contextual relevance, and rapid adaptability as
+  accuracy, contextual relevance, and adaptability as organisational
-  organizational knowledge evolves \cite{pinecone2023rag}.
+  knowledge evolves \cite{gao2023ragsurvey}.
 \end{itemize}
 To address data privacy and deployment constraints, Dynavera prioritizes
-local inference using quantized open-weight models (e.g., Llama 3 in
+local inference using quantized open-weight models in GGUF format. This design
-GGUF format). This design choice reduces dependency on external cloud
+choice reduces dependency on external cloud APIs, supports offline or air-gapped
-APIs, supports offline or air-gapped environments, and aligns with
+environments, and aligns with enterprise privacy requirements while maintaining
-enterprise privacy requirements while maintaining acceptable inference
+acceptable inference performance \cite{dettmers2023bitsandbytes,llamacpp2024}.
 performance \cite{meta2024llama3,dettmers2023bitsandbytes,llamacpp2024}.
 \textbf{Model Selection Rationale.}
-Several open-weight models were evaluated for the inference backend, 
+Four open-weight models were evaluated against a fixed set of 20 role-scoped onboarding prompts
-including Mistral and other recent instruction-tuned LLMs. Ultimately, 
+covering curriculum generation, knowledge explanation, assessment question generation, and
-\path{Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf} was selected for deployment. 
+free-form HR Q\&A. Each response was rated 1--5 on instruction-following, factual grounding, and
-This choice was driven by a combination of factors: (1) superior instruction-following 
+format compliance. Full results and per-model observations are recorded in
-and conversational ability in practical onboarding scenarios, (2) strong 
+\path{docs/model-selection-benchmarks.md}.
-performance on both general and domain-specific queries during pilot tests, 
+
-(3) efficient quantization (Q4\_K\_M) enabling fast, low-memory inference on 
+\textbf{Meta-Llama-3.1-8B-Instruct-Q4\_K\_M} was selected as the inference model. It achieved
-local hardware, and (4) robust support for the GGUF format, which streamlined 
+the highest quality score among feasible candidates and produced valid JSON-format outputs on
-integration with the local inference server. While Mistral and similar models 
+18/20 structured generation prompts without retries --- a critical property for the
-offered competitive performance, Llama 3.1-8B-Instruct provided a better balance 
+\texttt{\_extract\_json\_list} parsing step. A higher-scoring 14B candidate was trialled but
-of accuracy, resource usage, and compatibility for the privacy-preserving, 
+eliminated because its memory footprint left no headroom for the nomic-embed-text embedding
-offline-first requirements of Dynavera.
+model that runs concurrently during document ingestion.
 \subsection{Positioning Against Alternative
 Approaches}\label{positioning-against-alternative-approaches}
@ -316,6 +314,36 @@ system complexity in exchange for clearer responsibility boundaries,
 better modularity, and stronger alignment between training delivery,
 evaluation quality, and management oversight.
 \subsection{Industry Comparison: Commercial Onboarding Platforms}\label{industry-comparison}
 Dynavera can be further contextualised against established commercial HR and onboarding products. While tools such as HiBob \cite{hibob2024}, Leena.ai \cite{leena2024}, and Leapsome \cite{leapsome2024} address adjacent problems, they are fundamentally \emph{Systems of Record}: platforms that manage people, tasks, and compliance workflows. Dynavera is designed as a \emph{System of Intelligence}: a runtime that actively generates and delivers role-specific knowledge. Table~\ref{tab:industry-comparison} summarises the key differences.
 \begingroup\hbadness=10000
 \begin{table}[H]
 \centering
 \begin{tabularx}{\linewidth}{>{\raggedright\arraybackslash}p{0.15\linewidth} >{\raggedright\arraybackslash}p{0.16\linewidth} >{\raggedright\arraybackslash}p{0.18\linewidth} >{\raggedright\arraybackslash}p{0.16\linewidth} X}
 \toprule
 Feature & HiBob & Leena.ai & Leapsome & Dynavera \\
 \midrule
 Core identity & Modern HRIS & AI service desk & Perf.\ \& LMS & Agentic onboarding runtime \\
 AI integration & Generative summaries & LLM RAG chatbot & AI feedback suggestions & Multi-agent orchestration (MCP) \\
 Data privacy & Standard cloud SaaS & Enterprise cloud & Enterprise cloud & Privacy-first, local GPU inference \\
 Onboarding style & Checklist-driven & Q\&A / workflow automation & Doc-based learning paths & Real-time, adaptive instruction \\
 \bottomrule
 \end{tabularx}
 \caption{Comparison of Dynavera against established commercial onboarding platforms.}
 \label{tab:industry-comparison}
 \end{table}
 \endgroup
 \textbf{HiBob} is primarily an HRIS: it manages payroll, attendance, and employee records, treating onboarding as a checklist process (sign this document, read that policy). It has no concept of a Knowledge Agent or Assessment Agent that can dynamically instruct a new hire based on internal documentation. It tracks people; it does not teach them.
 \textbf{Leena.ai} is the closest commercial analogue in terms of AI. It uses LLMs to help employees retrieve answers to HR questions and automate service-desk workflows. However, Leena.ai is optimised for retrieval of existing answers, not curriculum generation. It lacks the distributed agentic pattern: while it can respond to a single query, it does not follow a structured Curriculum $\rightarrow$ Knowledge $\rightarrow$ Assessment loop that adapts based on a trainee's live session state.
 \textbf{Leapsome} focuses on performance management and learning enablement. Its learning module is a traditional LMS that hosts human-authored videos and documents. If the content does not exist, the learner cannot progress. Dynavera bridges this gap: the MCP Router allows agents to synthesise role-specific training on the fly from raw organisational documentation stored in pgvector, rather than requiring pre-authored content for every scenario.
 In each case, the gap Dynavera addresses is not a missing feature but a missing architectural category: none of these platforms combine privacy-first local inference, streaming agentic orchestration, semantic retrieval grounding, and persistent session auditability in a single deployable runtime.
 \subsection{Related Work Synthesis}\label{related-work-synthesis}
 Recent research supports the technical direction selected for Dynavera,
@ -331,9 +359,7 @@ for retrieval and progress updates \cite{schick2023toolformer,yao2023react}.
 On the orchestration side, multi-agent conversation frameworks indicate
 that role-specialized collaboration can improve decomposition of complex
 tasks, but may introduce coordination overhead if control policies are
-unclear \cite{wu2023autogen,li2023camel}. Dynavera addresses this by keeping a
+unclear \cite{wu2023autogen,li2023camel}. Surveys of LLM-based multi-agent systems characterise the general MAS workflow as a pipeline of perception, reasoning, interaction, and evolution stages, where agents typically communicate peer-to-peer with limited coupling to persistent application state \cite{guo2024massurvey}. Dynavera diverges from this pattern in two key respects. First, rather than treating agent interaction as an isolated conversational process, orchestration is embedded within a web application runtime (Django Channels), giving each agent turn direct access to persisted session state, relational progress records, and organisational knowledge via the MCP router. Second, while prior MAS architectures emphasise decentralised agent-to-agent coordination for emergent behaviour, Dynavera adopts a centrally orchestrated, state-persistent model that prioritises auditability and deterministic recovery over emergent flexibility. This trade-off is appropriate for a production onboarding context, where reproducibility and governance matter as much as adaptivity.
 single orchestrator with explicit tool boundaries and persisted session
 state, instead of fully decentralized agents.
 From a learning-science perspective, prior tutoring studies suggest that
 interactive, adaptive guidance can produce better learning outcomes than
@ -406,23 +432,24 @@ components, ensuring real-time interactivity.
 \subsection{Technology stack}\label{technology-stack}
 Dynavera is implemented as a modern full-stack application, with the
-components presented in Table 1.
+components presented in Table~\ref{tab:tech-stack}.
 \begin{table}[H]
 \centering
-\begin{tabularx}{\linewidth}{p{0.22\linewidth} p{0.16\linewidth} X}
+\begin{tabularx}{\linewidth}{p{0.12\linewidth} p{0.16\linewidth} X}
 \toprule
 Component & Technology & Rationale \\
 \midrule
-Frontend/UI & Vue 3 w/ TS & Typesafe, reactive UI enabling rapid iteration and maintainable component design \\
+UI & Vue 3 w/ TS & Typesafe, reactive UI enabling rapid iteration and maintainable component design \\
-State Management & Pinia & Centralized, predictable state management for real-time training progress tracking \\
+Persistence & Pinia & Centralized, predictable state management for real-time training progress tracking \\
-Backend/API & Django REST & Secure, mature framework supporting rapid development and scalable API design, informed by prior production experience \\
+API & Django REST & Secure, mature framework supporting rapid development and scalable API design, informed by prior production experience \\
 Database & PostgreSQL & Reliable, production-grade relational database for organizational and user data \\
-Vector Store & PgVector & Efficient similarity search over embedded training documentation via PostgreSQL \\
+Embeddings & PgVector & Efficient similarity search over embedded training documentation via PostgreSQL \\
 MCP Router & Python & Provides a standardized interface for agents to query data using Model Context Protocol. \\
 \bottomrule
 \end{tabularx}
 \caption{Architectural components of the Dynavera platform, including frontend, backend, and AI integration technologies.}
 \label{tab:tech-stack}
 \end{table}
 This stack was selected through explicit privacy, governance, and
@ -430,7 +457,7 @@ operability trade-offs rather than convenience alone. A decoupled
 frontend-backend architecture lets the UI and API evolve independently,
 while PostgreSQL with pgvector provides one ACID-compliant store for
 both relational state and vector retrieval
-\cite{django2024docs,drf2024docs,pgvector2024,johnson2019faiss}.
+\cite{django2024docs,drf2024docs,pgvector2024}.
 Alternatives considered included LangChain-style orchestration,
 external vector databases (for example Pinecone), and cloud-hosted LLM
@ -453,7 +480,7 @@ Pattern}\label{design-philosophy-the-distributed-agentic-pattern}
 Dynavera leverages the Model Context Protocol (MCP) to solve the
 "context gap" in corporate onboarding. Rather than providing the LLM
 with a static, bloated prompt, the system utilizes a Sidecar Tooling
-approach \cite{anthropic2024mcp,huggingface2024mcp,schick2023toolformer,yao2023react}:
+approach \cite{anthropic2024mcp,schick2023toolformer,yao2023react}:
 \begin{itemize}
 \item
@ -504,7 +531,7 @@ The API surface is intentionally split by interaction pattern. Standard
 management operations are handled through Django REST Framework (for
 example role membership, training file upload, and session endpoints),
 while orchestration-time interaction uses Django Channels over
-WebSockets at /ws/onboarding/\textless session\_uuid\textgreater/. This
+WebSockets at \path{/ws/onboarding/<session_uuid>/}. This
 allows the platform to handle both CRUD-style workflows and
 long-running, stateful agent interactions without forcing either pattern
 into the other \cite{drf2024docs,channels2024docs}.
@ -513,7 +540,8 @@ For ingestion, the backend follows an asynchronous execution path:
 uploaded files are stored as TrainingFile records, and a post-save
 trigger enqueues background processing through Celery (Redis broker).
 This prevents heavy preprocessing from blocking request-response latency
-on the main web process \cite{celery2024docs,redis2024docs}.
+on the main web process \cite{celery2024docs,redis2024docs}
 (\texttt{apps/knowledge/tasks.py:45--117}).
 Persistence is model-driven and traceable. Session state, progress,
 generated onboarding structures, and interaction events are stored in
@ -533,7 +561,7 @@ API, Celery worker, PostgreSQL/pgvector database, and GPU endpoint.
 \begin{figure}[H]
 \centering
-\includegraphics[width=5.75521in,height=5.14354in]{diagrams/embedding-data-flow.png}
+\includegraphics[height=3.8in]{diagrams/embedding-data-flow.png}
 \caption{Knowledge ingestion data flow diagram, illustrating the interaction between the user, REST API, Celery worker, pgvector database, and GPU endpoint.}
 \label{fig:embedding-data-flow}
 \end{figure}
@ -551,7 +579,7 @@ batches long content, and calls the GPU service at /v1/semantic-chunk.
 The service performs sentence-level semantic breakpoint detection using
 embedding-distance thresholds, then returns coherent chunks with
 embeddings. This avoids naive fixed-size splits that can break context
-mid-concept \cite{reimers2019sbert,sbert2024docs,fastapi2024docs}.
+mid-concept \cite{reimers2019sbert,fastapi2024docs}.
 \underline{Vector storage and retrieval with pgvector}\\
 Returned chunk embeddings are stored in KnowledgeChunk.embedding (768
@ -559,7 +587,8 @@ dimensions) in PostgreSQL using pgvector, linked relationally to role
 and source file metadata. Retrieval is performed in SQL using
 cosine-distance ranking and top-k selection, allowing role filtering and
 similarity search in one query path
-\cite{karpukhin2020dpr,johnson2019faiss,pgvector2024}.
+\cite{karpukhin2020dpr,pgvector2024}
 (\texttt{apps/onboarding/mcp.py:101--127}).
 \subsubsection{Agent Orchestration Workflow
 (Simplified)}\label{agent-orchestration-workflow-simplified}
@ -610,13 +639,17 @@ runtime where each stage contributes to structured onboarding output.
 Tool-mediated grounding is handled through the MCP router. During
 orchestration, model responses may include tool calls; the runtime
-executes approved tools (such as search\_knowledge and
+executes approved tools (such as \texttt{search\_knowledge} and
-update\_progress), retrieves contextual evidence from pgvector-backed
+\texttt{update\_progress}), retrieves contextual evidence from pgvector-backed
 documents, and injects those results back into the message loop before
-final answer generation. This keeps generation anchored in role-specific
+final answer generation (\path{consumers/base.py:77-132},
 \path{mcp.py:78-159}). This keeps generation anchored in role-specific
 organizational material while preserving a controlled boundary between
 model reasoning and data access.
 Pseudocode for the orchestration loop, MCP tool dispatch, ingestion pipeline, and CA/KA/AA
 generation sequence is provided in \path{docs/orchestration-pseudocode.md}.
 \subsection{Workflow Implementation}\label{workflow-implementation}
 \begin{figure}[H]
@ -638,7 +671,8 @@ opens a persistent WebSocket connection to the orchestration endpoint
 and submits user prompts/actions as session events. The orchestrator
 resolves the active configuration for that role/session, runs model
 inference, executes retrieval tools when required, and emits structured
-runtime events (status/tool/completion) back to the client.
+runtime events (status/tool/completion) back to the client
 (\texttt{apps/onboarding/consumers/generate.py:34--124}).
 During guided learning, module content generation, context retrieval,
 and assessment output are coordinated in sequence. The curriculum phase
@ -710,12 +744,43 @@ retrieval effectiveness, and (3) operational feasibility.
  onboarding, validating the privacy-first local inference objective.
 \end{itemize}
-\subsubsection{Quantitative Evaluation}\label{quantitative-evaluation}
+\textbf{Contributions Realised}
 The introduction stated three primary contributions. Each is directly evidenced by the implemented system:
 \begin{enumerate}
 \item \textbf{A distributed agentic onboarding architecture.}
  The system physically separates the application layer (Django, Celery, PostgreSQL) from the inference layer (FastAPI, llama.cpp), connected via authenticated HTTP. Four agent roles --- Curriculum, Knowledge, Assessment, and Progress Monitor --- operate within a shared orchestration runtime with distinct responsibilities and configuration records. The architecture is fully deployed at \url{https://fyp.viswamedha.com} and reproducible via the submitted Docker Compose stack.
 \item \textbf{A tool-aware orchestration runtime integrated with Django.}
  The \texttt{orchestrate} method (\path{consumers/base.py:77--132}) implements a multi-turn agentic loop: the model receives tool definitions at each inference step, may invoke approved MCP tools (\texttt{search\_knowledge}, \texttt{update\_progress}, \texttt{get\_role\_context}), and receives structured tool results before generating a final response. This loop is embedded directly within a Django Channels WebSocket consumer, giving it access to the full Django ORM and session state --- a deliberate integration decision documented in Section~\ref{design-philosophy-the-distributed-agentic-pattern}.
 \item \textbf{A privacy-preserving RAG training system using local LLM inference.}
  All model inference runs on a local GPU node using a quantized open-weight model (\path{Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf}) via llama.cpp. Organisation documents never leave the deployment environment: ingestion, embedding, and retrieval all operate within the self-hosted stack. The ingestion pipeline (\path{knowledge/tasks.py:45--117}) processes uploaded files into role-scoped vector chunks stored in pgvector, which are retrieved at inference time using cosine-distance search rather than any external API.
 \end{enumerate}
 Together, these contributions demonstrate that a production-viable, privacy-preserving agentic training system can be built and deployed on consumer-grade hardware within a standard web application framework.
 Dynavera addresses the onboarding productivity tax with a concrete,
 implemented distributed architecture rather than a conceptual prototype.
 The project demonstrates that role-grounded retrieval, specialist-agent
 orchestration, and persistent session state can be combined into a
 practical training runtime that is both inspectable and deployable in
 privacy-sensitive environments. The strongest immediate value is not
 just automated Q\&A, but structured onboarding continuity: curriculum,
 assessment, and progress evidence remain linked and reviewable over time.
 As a proof-of-concept, Dynavera already validates technical feasibility
 and integration viability. Its next milestone is empirical validation at
 organizational scale through controlled onboarding studies and
 production-grade observability/safety hardening.
 \subsection{Quantitative Evaluation}\label{quantitative-evaluation}
 To strengthen the engineering evaluation beyond qualitative observations,
 representative measurements were collected from controlled development
 runs using role-scoped onboarding prompts and tool-enabled inference
-calls.
+calls (Table~\ref{tab:quantitative-evaluation}).
 \begin{table}[H]
 \centering
@ -743,7 +808,7 @@ They also indicate that semantic chunking and dense retrieval are
 effective enough for role-grounded onboarding in the current
 proof-of-concept scope.
-\subsubsection{Limitations}\label{limitations}
+\subsection{Limitations}\label{limitations}
 \begin{itemize}
 \item
@ -761,7 +826,7 @@ proof-of-concept scope.
  synthetic or curated test prompts rather than production traffic.
 \end{itemize}
-\subsubsection{Future Improvements}\label{future-improvements}
+\subsection{Future Improvements}\label{future-improvements}
 The next development phase should focus on measurable training outcomes,
 operational hardening, and richer adaptivity:
@ -785,8 +850,8 @@ operational hardening, and richer adaptivity:
  around tool calls, implement stronger role-boundary tests, and add
  automated red-team style checks for prompt/tool misuse scenarios.
 \item
-  \textbf{Scalability and observability:} Introduce request tracing,
+  \textbf{Scalability and observability:} Add request tracing,
-  queue-depth dashboards, and load/performance benchmarks to support
+  queue-depth dashboards, and performance benchmarks to support
  multi-tenant deployment planning.
 \item
  \textbf{Multi-modal onboarding support:} Extend ingestion and
@ -794,21 +859,6 @@ operational hardening, and richer adaptivity:
  real enterprise training assets.
 \end{itemize}
 \subsubsection{Conclusion}\label{conclusion}
 Dynavera addresses the onboarding productivity tax with a concrete,
 implemented distributed architecture rather than a conceptual prototype.
 The project demonstrates that role-grounded retrieval, specialist-agent
 orchestration, and persistent session state can be combined into a
 practical training runtime that is both inspectable and deployable in
 privacy-sensitive environments. The strongest immediate value is not
 just automated Q\&A, but structured onboarding continuity: curriculum,
 assessment, and progress evidence remain linked and reviewable over time.
 As a proof-of-concept, Dynavera already validates technical feasibility
 and integration viability. Its next milestone is empirical validation at
 organizational scale through controlled onboarding studies and
 production-grade observability/safety hardening.
 \section{References}\label{references}
 \bibliographystyle{unsrtnat}