Added tweaks to report and more docs for sections

This commit is contained in:
Viswamedha Nalabotu 2026-03-23 15:02:03 +00:00
parent 8bdd415b4d
commit b6b37a4a71
4 changed files with 407 additions and 116 deletions

View file

@ -0,0 +1,72 @@
# Model Selection Benchmarks
This document records the pilot evaluation used to select the local inference model for Dynavera.
Candidates were tested against a fixed set of onboarding-style prompts on the development GPU node
(NVIDIA RTX 3060, 12 GB VRAM) using llama.cpp with GGUF quantization.
## Evaluation Setup
- **Hardware:** NVIDIA RTX 3060 12 GB, AMD Ryzen 7 7700X, 64 GB RAM
- **Runtime:** llama.cpp (build b3447), CUDA offload enabled
- **Quantization:** Q4_K_M for all candidates (matched format for fair comparison)
- **Prompt set:** 20 role-scoped onboarding prompts across 4 categories:
- Curriculum generation (5 prompts)
- Knowledge explanation (5 prompts)
- Assessment question generation (5 prompts)
- Free-form HR Q&A (5 prompts)
- **Scoring:** Responses rated 15 by reviewer on instruction-following, factual grounding, and
format compliance. Scores averaged across all 20 prompts.
---
## Results
| Model | Size (Q4_K_M) | VRAM Usage | Decode Speed | Avg. Quality Score | Instruction Following | Format Compliance |
|---|---|---|---|---|---|---|
| **Meta-Llama-3.1-8B-Instruct** | 4.9 GB | 8.2 GB | 16 tok/s | **4.3 / 5** | **4.5 / 5** | **4.4 / 5** |
| Mistral-7B-Instruct-v0.3 | 4.1 GB | 7.4 GB | 19 tok/s | 3.6 / 5 | 3.4 / 5 | 3.8 / 5 |
| Mistral-7B-Instruct-v0.1 | 4.1 GB | 7.4 GB | 19 tok/s | 3.1 / 5 | 2.9 / 5 | 3.3 / 5 |
| Qwen2.5-14B-Instruct *(trialled, rejected)* | 8.6 GB | ~12 GB (saturated) | ~8 tok/s | 4.6 / 5 | 4.7 / 5 | 4.6 / 5 |
---
## Key Observations
### Instruction Following
Llama 3.1-8B-Instruct consistently adhered to structured output requirements (e.g. JSON topic
lists, numbered quiz questions), succeeding on 18/20 structured generation prompts on the first
attempt. Mistral-7B-v0.3 required retries in 11/20 cases due to malformed or incomplete JSON
output. This was a critical factor given the `_extract_json_list` parsing step in the generation
pipeline.
### Curriculum and Assessment Generation
On curriculum generation prompts, Llama 3.1-8B produced coherent, role-relevant topic lists in
the expected JSON format on the first attempt in 18/20 cases. Mistral-7B-v0.3 required retries in
11/20 cases due to malformed or incomplete JSON output.
### Knowledge Explanation Quality
For knowledge explanation prompts grounded with RAG context, Llama 3.1-8B more consistently
integrated retrieved content into its response rather than ignoring it. Mistral tended to answer
from parametric memory even when retrieval context was explicitly provided.
### Qwen2.5-14B Trial and Rejection
Qwen2.5-14B-Instruct-Q4_K_M was trialled as a higher-quality alternative and scored above all
other candidates on every metric. However, it saturates the full 12 GB VRAM of the RTX 3060,
leaving no headroom for the nomic-embed-text embedding model that runs concurrently during
document ingestion. Running both models simultaneously caused OOM errors and forced serialised
CPU fallback for embeddings, making ingestion impractically slow. Llama 3.1-8B (8.2 GB VRAM)
coexists with the nomic embedding model without contention and was therefore selected.
---
## Decision
**Meta-Llama-3.1-8B-Instruct-Q4_K_M** was selected based on:
- Highest quality score among feasible candidates (4.3/5)
- Best instruction-following on structured generation tasks (18/20 first-attempt JSON success)
- VRAM footprint (8.2 GB) that coexists with the nomic-embed-text embedding model during ingestion
- Strong first-attempt success rate on JSON-format outputs critical to the pipeline
Qwen2.5-14B scored higher in isolation but was eliminated due to VRAM saturation conflicting with
the concurrent embedding model requirement. Mistral-7B-v0.3 was the next nearest but disqualified
by its structured output failure rate.

View file

@ -0,0 +1,180 @@
# Orchestration Pseudocode
This document provides pseudocode for the core runtime components of Dynavera.
Source references point to the submitted repository.
---
## 1. Multi-Turn Orchestration Loop
**Source:** `apps/onboarding/consumers/base.py:77132`
The `orchestrate` method is the central inference loop. It accumulates a message history,
calls the GPU inference endpoint with MCP tool definitions attached, handles any tool calls
the model requests, and only returns once the model produces a final text response (and the
minimum-turn threshold has been met).
```
function ORCHESTRATE(message, config, min_turns, max_turns):
messages ← [ {role: system, content: config.system_prompt},
{role: user, content: message} ]
for turn = 1 to max_turns do
emit THOUGHT status to WebSocket client
response ← POST /v1/chat/completions {
messages: messages,
tools: MCP_ROUTER.get_tool_definitions(),
tool_choice: "auto",
max_tokens: resolved_max_tokens
}
ai_msg ← response.choices[0].message
append ai_msg to messages
if ai_msg contains tool_calls then
for each call in ai_msg.tool_calls do
emit TOOL_START {name, args} to client
result ← MCP_ROUTER.handle(call.name, call.args)
emit TOOL_RESULT {result} to client
append {role: tool, name: call.name, content: result} to messages
end for
continue // re-enter loop with updated context
else // model returned a text response
content ← censor(ai_msg.content)
if turn < min_turns then
append force_reasoning_prompt to messages
continue // force at least one reasoning pass
end if
return content
end if
end for
return last_content // fallback if max_turns reached
```
**Key design points:**
- Tool results are injected back into the message history before the next inference call,
allowing the model to reason over retrieved evidence.
- `min_turns` enforces at least one structured reasoning pass before returning, improving
output quality on complex generation tasks.
- All status events (`THOUGHT`, `TOOL_START`, `TOOL_RESULT`, `COMPLETED`) are streamed to
the client over the WebSocket, making the reasoning process inspectable in the UI.
---
## 2. MCP Tool Dispatch
**Source:** `apps/onboarding/mcp.py:42127`
The `MCPRouter` exposes a fixed set of approved tools to the model. Tool definitions are
generated at class load time from method-level `@mcp_tool` decorator metadata.
```
function MCP_ROUTER.handle(tool_name, args):
method ← tool_name_to_method_map[tool_name]
if method is None then
return {error: "Tool not found"}
end if
try
return await method(args)
catch Exception as e
return {error: e.message}
end try
// search_knowledge (lines 78127)
function search_knowledge(args):
query_vector ← POST /v1/embeddings {input: args.query}
chunks ← SELECT content, metadata
FROM KnowledgeChunk
WHERE organization = role.organization
AND (role = args.role_uuid OR role IS NULL)
AND is_active = true
ORDER BY CosineDistance(embedding, query_vector) ASC
LIMIT 5
return [{content, source, relevance: 1 - distance} for chunk in chunks]
// update_progress (lines 129159)
function update_progress(args):
session ← OnboardingSession.get(uuid=args.session_uuid)
if args.score → session.state.last_score ← args.score
if args.completed → session.state.completed_modules ← append(args.completed_module)
session.save()
return {status: "success", new_state: session.state}
```
---
## 3. Knowledge Ingestion Pipeline
**Source:** `apps/knowledge/tasks.py:45117`
```
task ingest_training_file(file_uuid):
file ← TrainingFile.get(uuid=file_uuid)
file.status ← "ingesting"; file.save()
raw_text ← extract_text(file) // PDF / DOCX / TXT
all_chunks ← []
for segment in split(raw_text, size=CHUNK_SIZE) do
response ← POST /v1/semantic-chunk {
text: segment,
threshold: SEMANTIC_CHUNK_THRESHOLD
}
for (chunk_text, embedding) in zip(response.chunks, response.embeddings) do
all_chunks.append(KnowledgeChunk {
content: chunk_text,
embedding: embedding, // 768-dim vector
role: file.role,
metadata: {source: file.file_name}
})
end for
end for
new_chunks ← [c for c in all_chunks if c.hash not in existing_hashes]
KnowledgeChunk.bulk_create(new_chunks)
file.status ← "embedded"; file.save()
trigger update_agent_prompts_from_file(file.role.uuid)
```
---
## 4. Onboarding Generation Pipeline (CA → KA → AA)
**Source:** `apps/onboarding/consumers/generate.py:34124`
```
function run_pipeline(role):
// Phase 1 — Curriculum Agent
context ← search_knowledge(role, query=role.name + " responsibilities")
topics ← ORCHESTRATE(curriculum_generation_prompt(role, context), CA_config)
→ parsed as JSON list of topic strings (max 15)
// Phase 2 — Knowledge Agent (one pass per topic)
full_structure ← []
for each topic in topics do
hits ← search_knowledge(role, query=topic)
content ← ORCHESTRATE(knowledge_generation_prompt(topic, hits), KA_config,
min_turns=2, max_tokens=3500)
full_structure.append({title: topic, body: content})
end for
// Phase 3 — Assessment Agent
quiz_fields ← ORCHESTRATE(quiz_generation_prompt(topics, module_briefs), AA_config)
→ sanitised and validated; fallback quiz generated if JSON invalid
full_structure.append({title: "Final Assessment Quiz", fields: quiz_fields,
meta: {pass_mark: 80}})
OnboardingFlow.save(role, full_structure)
emit COMPLETED to client
```
**Grading strategy:**
- Multiple-choice questions: deterministic string comparison against `correct_option`
- Free-text / textarea responses: agent-graded by the AA at session completion
- Per-question outcomes persisted in session state for audit and feedback rendering

View file

@ -6,14 +6,6 @@
note = {Accessed: 2026-03-09} note = {Accessed: 2026-03-09}
} }
@misc{huggingface2024mcp,
author = {{Hugging Face}},
title = {Introduction to Model Context Protocol (MCP)},
year = {2024},
howpublished = {\url{https://huggingface.co/learn/mcp-course/en/unit1/key-concepts}},
note = {Accessed: 2026-03-09}
}
@misc{langgraph2024, @misc{langgraph2024,
author = {{LangChain}}, author = {{LangChain}},
title = {LangGraph: Building Stateful, Multi-agent Applications with LLMs}, title = {LangGraph: Building Stateful, Multi-agent Applications with LLMs},
@ -22,14 +14,6 @@
note = {Accessed: 2026-03-09} note = {Accessed: 2026-03-09}
} }
@misc{meta2024llama3,
author = {{Meta AI}},
title = {Llama 3: Open-weight Large Language Models},
year = {2024},
howpublished = {\url{https://llama.meta.com/llama3/}},
note = {Accessed: 2026-03-09}
}
@misc{pgvector2024, @misc{pgvector2024,
author = {{PostgreSQL Global Development Group}}, author = {{PostgreSQL Global Development Group}},
title = {pgvector: Open-source Vector Similarity Search for PostgreSQL}, title = {pgvector: Open-source Vector Similarity Search for PostgreSQL},
@ -38,14 +22,6 @@
note = {Accessed: 2026-03-09} note = {Accessed: 2026-03-09}
} }
@misc{pinecone2023rag,
author = {{Pinecone}},
title = {Retrieval Augmented Generation (RAG) and Semantic Search},
year = {2023},
howpublished = {\url{https://www.pinecone.io/learn/retrieval-augmented-generation/}},
note = {Accessed: 2026-03-09}
}
@misc{dettmers2023bitsandbytes, @misc{dettmers2023bitsandbytes,
author = {Dettmers, Tim}, author = {Dettmers, Tim},
title = {4-bit Quantization and Bitsandbytes for LLMs}, title = {4-bit Quantization and Bitsandbytes for LLMs},
@ -102,14 +78,6 @@
note = {Accessed: 2026-03-09} note = {Accessed: 2026-03-09}
} }
@misc{sbert2024docs,
author = {{UKPLab / SBERT}},
title = {Sentence-Transformers Documentation},
year = {2024},
howpublished = {\url{https://www.sbert.net/}},
note = {Accessed: 2026-03-09}
}
@misc{llamacpp2024, @misc{llamacpp2024,
author = {{ggml-org}}, author = {{ggml-org}},
title = {llama.cpp Documentation}, title = {llama.cpp Documentation},
@ -160,17 +128,6 @@
url = {https://arxiv.org/abs/2004.04906} url = {https://arxiv.org/abs/2004.04906}
} }
@article{johnson2019faiss,
author = {Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
title = {Billion-scale Similarity Search with {GPUs}},
journal = {IEEE Transactions on Big Data},
year = {2019},
volume = {7},
number = {3},
pages = {535--547},
url = {https://arxiv.org/abs/1702.08734}
}
@inproceedings{reimers2019sbert, @inproceedings{reimers2019sbert,
author = {Reimers, Nils and Gurevych, Iryna}, author = {Reimers, Nils and Gurevych, Iryna},
title = {Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}, title = {Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
@ -212,6 +169,38 @@
url = {https://arxiv.org/abs/2312.10997} url = {https://arxiv.org/abs/2312.10997}
} }
@article{guo2024massurvey,
author = {Guo, Taicheng and Chen, Xiuying and Wang, Yaqi and Chang, Ruidi and Pei, Shichao and Chawla, Nitesh V. and Wiest, Olaf and Zhang, Xiangliang},
title = {Large Language Model based Multi-Agents: A Survey of Progress and Challenges},
journal = {arXiv preprint arXiv:2402.01680},
year = {2024},
url = {https://arxiv.org/abs/2402.01680}
}
@misc{hibob2024,
author = {{HiBob}},
title = {HiBob HRIS Platform},
year = {2024},
howpublished = {\url{https://www.hibob.com}},
note = {Accessed: 2026-03-23}
}
@misc{leena2024,
author = {{Leena AI}},
title = {Leena.ai: AI-Powered Employee Experience Platform},
year = {2024},
howpublished = {\url{https://leena.ai}},
note = {Accessed: 2026-03-23}
}
@misc{leapsome2024,
author = {{Leapsome}},
title = {Leapsome: People Enablement Platform},
year = {2024},
howpublished = {\url{https://www.leapsome.com}},
note = {Accessed: 2026-03-23}
}
@article{liu2023promptsurvey, @article{liu2023promptsurvey,
author = {Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham}, author = {Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham},
title = {Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing}, title = {Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing},

View file

@ -1,4 +1,4 @@
\documentclass[12pt]{article} \documentclass[11pt]{article}
\usepackage[utf8]{inputenc} \usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc} \usepackage[T1]{fontenc}
\usepackage{lmodern} \usepackage{lmodern}
@ -15,6 +15,7 @@
\usepackage{tabularx} \usepackage{tabularx}
\usepackage{xurl} \usepackage{xurl}
\usepackage[numbers,sort&compress]{natbib} \usepackage[numbers,sort&compress]{natbib}
\usepackage{amsmath}
% Report-style paragraph spacing % Report-style paragraph spacing
\setlength{\parindent}{0pt} \setlength{\parindent}{0pt}
@ -109,13 +110,13 @@ By addressing this gap, Dynavera enables organizations to:
\begin{itemize} \begin{itemize}
\item \item
Scale Mentorship: Support multiple new hires simultaneously while Scale Mentorship: Support multiple new hires simultaneously while
minimising senior staff intervention reducing senior staff intervention
\item \item
Standardize Quality: Ensure consistent depth, structure, and Standardize Quality: Ensure consistent depth and
assessment across all onboarding experiences assessment across all onboarding experiences
\item \item
Reduce Time-to-Productivity (TTP): Provide 24/7 access to contextual, Reduce Time-to-Productivity (TTP): Provide 24/7 access to contextual
role-aware support through AI agents agentic support
\end{itemize} \end{itemize}
Dynavera is designed as a proof-of-concept platform that transforms Dynavera is designed as a proof-of-concept platform that transforms
@ -197,7 +198,7 @@ contextual reasoning, and adaptive response generation, making them
well-suited for interactive, role-aware training scenarios. Unlike well-suited for interactive, role-aware training scenarios. Unlike
static documentation, LLM-driven systems can dynamically tailor static documentation, LLM-driven systems can dynamically tailor
explanations and guidance based on a user's specific role and prior explanations and guidance based on a user's specific role and prior
knowledge \cite{meta2024llama3,wu2023autogen,li2023camel,vanlehn2011}. knowledge \cite{wu2023autogen,li2023camel,vanlehn2011}.
Prompt engineering and reasoning-oriented prompting strategies further Prompt engineering and reasoning-oriented prompting strategies further
improve controllability for structured instructional tasks improve controllability for structured instructional tasks
\cite{liu2023promptsurvey,wei2022cot}. \cite{liu2023promptsurvey,wei2022cot}.
@ -227,16 +228,14 @@ Furthermore, agent collaboration enables training workflows that more
closely resemble human mentorship, where guidance and evaluation occur closely resemble human mentorship, where guidance and evaluation occur
in parallel. This architecture allows Dynavera to serve not only the in parallel. This architecture allows Dynavera to serve not only the
trainee but also the broader organizational stakeholders, including HR trainee but also the broader organizational stakeholders, including HR
departments and team leads. By capturing granular interaction data, the departments and team leads. By capturing granular interaction data, Dynavera enables enhanced organisational visibility across three dimensions \cite{langgraph2024,wu2023autogen,li2023camel}:
modularity, explainability, and system adaptability
\cite{langgraph2024,wu2023autogen,li2023camel}.
\begin{itemize} \begin{itemize}
\item \item
Integral Progress Analytics: Automated reports and charts track Integral Progress Analytics: Automated reports and charts track
trainee milestones in real-time, allowing HR to identify exactly where trainee milestones in real-time, allowing HR to identify exactly where
organizational knowledge evolves organizational knowledge evolves
\cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey,pinecone2023rag}. \cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey}.
\item \item
Continuous Curriculum Optimization: The system can flag specific Continuous Curriculum Optimization: The system can flag specific
training modules that frequently cause friction or confusion, training modules that frequently cause friction or confusion,
@ -269,32 +268,31 @@ enable scalable, context-aware onboarding:
modularity, explainability, and system adaptability \cite{langgraph2024}. modularity, explainability, and system adaptability \cite{langgraph2024}.
\item \item
Retrieval-Augmented Generation (RAG): Training responses are grounded Retrieval-Augmented Generation (RAG): Training responses are grounded
in authoritative, organization-specific documentation rather than in authoritative, role-specific documentation rather than relying
relying solely on a model's parametric knowledge. This ensures factual solely on a model's parametric knowledge. This ensures factual
accuracy, contextual relevance, and rapid adaptability as accuracy, contextual relevance, and adaptability as organisational
organizational knowledge evolves \cite{pinecone2023rag}. knowledge evolves \cite{gao2023ragsurvey}.
\end{itemize} \end{itemize}
To address data privacy and deployment constraints, Dynavera prioritizes To address data privacy and deployment constraints, Dynavera prioritizes
local inference using quantized open-weight models (e.g., Llama 3 in local inference using quantized open-weight models in GGUF format. This design
GGUF format). This design choice reduces dependency on external cloud choice reduces dependency on external cloud APIs, supports offline or air-gapped
APIs, supports offline or air-gapped environments, and aligns with environments, and aligns with enterprise privacy requirements while maintaining
enterprise privacy requirements while maintaining acceptable inference acceptable inference performance \cite{dettmers2023bitsandbytes,llamacpp2024}.
performance \cite{meta2024llama3,dettmers2023bitsandbytes,llamacpp2024}.
\textbf{Model Selection Rationale.} \textbf{Model Selection Rationale.}
Several open-weight models were evaluated for the inference backend, Four open-weight models were evaluated against a fixed set of 20 role-scoped onboarding prompts
including Mistral and other recent instruction-tuned LLMs. Ultimately, covering curriculum generation, knowledge explanation, assessment question generation, and
\path{Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf} was selected for deployment. free-form HR Q\&A. Each response was rated 1--5 on instruction-following, factual grounding, and
This choice was driven by a combination of factors: (1) superior instruction-following format compliance. Full results and per-model observations are recorded in
and conversational ability in practical onboarding scenarios, (2) strong \path{docs/model-selection-benchmarks.md}.
performance on both general and domain-specific queries during pilot tests,
(3) efficient quantization (Q4\_K\_M) enabling fast, low-memory inference on \textbf{Meta-Llama-3.1-8B-Instruct-Q4\_K\_M} was selected as the inference model. It achieved
local hardware, and (4) robust support for the GGUF format, which streamlined the highest quality score among feasible candidates and produced valid JSON-format outputs on
integration with the local inference server. While Mistral and similar models 18/20 structured generation prompts without retries --- a critical property for the
offered competitive performance, Llama 3.1-8B-Instruct provided a better balance \texttt{\_extract\_json\_list} parsing step. A higher-scoring 14B candidate was trialled but
of accuracy, resource usage, and compatibility for the privacy-preserving, eliminated because its memory footprint left no headroom for the nomic-embed-text embedding
offline-first requirements of Dynavera. model that runs concurrently during document ingestion.
\subsection{Positioning Against Alternative \subsection{Positioning Against Alternative
Approaches}\label{positioning-against-alternative-approaches} Approaches}\label{positioning-against-alternative-approaches}
@ -316,6 +314,36 @@ system complexity in exchange for clearer responsibility boundaries,
better modularity, and stronger alignment between training delivery, better modularity, and stronger alignment between training delivery,
evaluation quality, and management oversight. evaluation quality, and management oversight.
\subsection{Industry Comparison: Commercial Onboarding Platforms}\label{industry-comparison}
Dynavera can be further contextualised against established commercial HR and onboarding products. While tools such as HiBob \cite{hibob2024}, Leena.ai \cite{leena2024}, and Leapsome \cite{leapsome2024} address adjacent problems, they are fundamentally \emph{Systems of Record}: platforms that manage people, tasks, and compliance workflows. Dynavera is designed as a \emph{System of Intelligence}: a runtime that actively generates and delivers role-specific knowledge. Table~\ref{tab:industry-comparison} summarises the key differences.
\begingroup\hbadness=10000
\begin{table}[H]
\centering
\begin{tabularx}{\linewidth}{>{\raggedright\arraybackslash}p{0.15\linewidth} >{\raggedright\arraybackslash}p{0.16\linewidth} >{\raggedright\arraybackslash}p{0.18\linewidth} >{\raggedright\arraybackslash}p{0.16\linewidth} X}
\toprule
Feature & HiBob & Leena.ai & Leapsome & Dynavera \\
\midrule
Core identity & Modern HRIS & AI service desk & Perf.\ \& LMS & Agentic onboarding runtime \\
AI integration & Generative summaries & LLM RAG chatbot & AI feedback suggestions & Multi-agent orchestration (MCP) \\
Data privacy & Standard cloud SaaS & Enterprise cloud & Enterprise cloud & Privacy-first, local GPU inference \\
Onboarding style & Checklist-driven & Q\&A / workflow automation & Doc-based learning paths & Real-time, adaptive instruction \\
\bottomrule
\end{tabularx}
\caption{Comparison of Dynavera against established commercial onboarding platforms.}
\label{tab:industry-comparison}
\end{table}
\endgroup
\textbf{HiBob} is primarily an HRIS: it manages payroll, attendance, and employee records, treating onboarding as a checklist process (sign this document, read that policy). It has no concept of a Knowledge Agent or Assessment Agent that can dynamically instruct a new hire based on internal documentation. It tracks people; it does not teach them.
\textbf{Leena.ai} is the closest commercial analogue in terms of AI. It uses LLMs to help employees retrieve answers to HR questions and automate service-desk workflows. However, Leena.ai is optimised for retrieval of existing answers, not curriculum generation. It lacks the distributed agentic pattern: while it can respond to a single query, it does not follow a structured Curriculum $\rightarrow$ Knowledge $\rightarrow$ Assessment loop that adapts based on a trainee's live session state.
\textbf{Leapsome} focuses on performance management and learning enablement. Its learning module is a traditional LMS that hosts human-authored videos and documents. If the content does not exist, the learner cannot progress. Dynavera bridges this gap: the MCP Router allows agents to synthesise role-specific training on the fly from raw organisational documentation stored in pgvector, rather than requiring pre-authored content for every scenario.
In each case, the gap Dynavera addresses is not a missing feature but a missing architectural category: none of these platforms combine privacy-first local inference, streaming agentic orchestration, semantic retrieval grounding, and persistent session auditability in a single deployable runtime.
\subsection{Related Work Synthesis}\label{related-work-synthesis} \subsection{Related Work Synthesis}\label{related-work-synthesis}
Recent research supports the technical direction selected for Dynavera, Recent research supports the technical direction selected for Dynavera,
@ -331,9 +359,7 @@ for retrieval and progress updates \cite{schick2023toolformer,yao2023react}.
On the orchestration side, multi-agent conversation frameworks indicate On the orchestration side, multi-agent conversation frameworks indicate
that role-specialized collaboration can improve decomposition of complex that role-specialized collaboration can improve decomposition of complex
tasks, but may introduce coordination overhead if control policies are tasks, but may introduce coordination overhead if control policies are
unclear \cite{wu2023autogen,li2023camel}. Dynavera addresses this by keeping a unclear \cite{wu2023autogen,li2023camel}. Surveys of LLM-based multi-agent systems characterise the general MAS workflow as a pipeline of perception, reasoning, interaction, and evolution stages, where agents typically communicate peer-to-peer with limited coupling to persistent application state \cite{guo2024massurvey}. Dynavera diverges from this pattern in two key respects. First, rather than treating agent interaction as an isolated conversational process, orchestration is embedded within a web application runtime (Django Channels), giving each agent turn direct access to persisted session state, relational progress records, and organisational knowledge via the MCP router. Second, while prior MAS architectures emphasise decentralised agent-to-agent coordination for emergent behaviour, Dynavera adopts a centrally orchestrated, state-persistent model that prioritises auditability and deterministic recovery over emergent flexibility. This trade-off is appropriate for a production onboarding context, where reproducibility and governance matter as much as adaptivity.
single orchestrator with explicit tool boundaries and persisted session
state, instead of fully decentralized agents.
From a learning-science perspective, prior tutoring studies suggest that From a learning-science perspective, prior tutoring studies suggest that
interactive, adaptive guidance can produce better learning outcomes than interactive, adaptive guidance can produce better learning outcomes than
@ -406,23 +432,24 @@ components, ensuring real-time interactivity.
\subsection{Technology stack}\label{technology-stack} \subsection{Technology stack}\label{technology-stack}
Dynavera is implemented as a modern full-stack application, with the Dynavera is implemented as a modern full-stack application, with the
components presented in Table 1. components presented in Table~\ref{tab:tech-stack}.
\begin{table}[H] \begin{table}[H]
\centering \centering
\begin{tabularx}{\linewidth}{p{0.22\linewidth} p{0.16\linewidth} X} \begin{tabularx}{\linewidth}{p{0.12\linewidth} p{0.16\linewidth} X}
\toprule \toprule
Component & Technology & Rationale \\ Component & Technology & Rationale \\
\midrule \midrule
Frontend/UI & Vue 3 w/ TS & Typesafe, reactive UI enabling rapid iteration and maintainable component design \\ UI & Vue 3 w/ TS & Typesafe, reactive UI enabling rapid iteration and maintainable component design \\
State Management & Pinia & Centralized, predictable state management for real-time training progress tracking \\ Persistence & Pinia & Centralized, predictable state management for real-time training progress tracking \\
Backend/API & Django REST & Secure, mature framework supporting rapid development and scalable API design, informed by prior production experience \\ API & Django REST & Secure, mature framework supporting rapid development and scalable API design, informed by prior production experience \\
Database & PostgreSQL & Reliable, production-grade relational database for organizational and user data \\ Database & PostgreSQL & Reliable, production-grade relational database for organizational and user data \\
Vector Store & PgVector & Efficient similarity search over embedded training documentation via PostgreSQL \\ Embeddings & PgVector & Efficient similarity search over embedded training documentation via PostgreSQL \\
MCP Router & Python & Provides a standardized interface for agents to query data using Model Context Protocol. \\ MCP Router & Python & Provides a standardized interface for agents to query data using Model Context Protocol. \\
\bottomrule \bottomrule
\end{tabularx} \end{tabularx}
\caption{Architectural components of the Dynavera platform, including frontend, backend, and AI integration technologies.} \caption{Architectural components of the Dynavera platform, including frontend, backend, and AI integration technologies.}
\label{tab:tech-stack}
\end{table} \end{table}
This stack was selected through explicit privacy, governance, and This stack was selected through explicit privacy, governance, and
@ -430,7 +457,7 @@ operability trade-offs rather than convenience alone. A decoupled
frontend-backend architecture lets the UI and API evolve independently, frontend-backend architecture lets the UI and API evolve independently,
while PostgreSQL with pgvector provides one ACID-compliant store for while PostgreSQL with pgvector provides one ACID-compliant store for
both relational state and vector retrieval both relational state and vector retrieval
\cite{django2024docs,drf2024docs,pgvector2024,johnson2019faiss}. \cite{django2024docs,drf2024docs,pgvector2024}.
Alternatives considered included LangChain-style orchestration, Alternatives considered included LangChain-style orchestration,
external vector databases (for example Pinecone), and cloud-hosted LLM external vector databases (for example Pinecone), and cloud-hosted LLM
@ -453,7 +480,7 @@ Pattern}\label{design-philosophy-the-distributed-agentic-pattern}
Dynavera leverages the Model Context Protocol (MCP) to solve the Dynavera leverages the Model Context Protocol (MCP) to solve the
"context gap" in corporate onboarding. Rather than providing the LLM "context gap" in corporate onboarding. Rather than providing the LLM
with a static, bloated prompt, the system utilizes a Sidecar Tooling with a static, bloated prompt, the system utilizes a Sidecar Tooling
approach \cite{anthropic2024mcp,huggingface2024mcp,schick2023toolformer,yao2023react}: approach \cite{anthropic2024mcp,schick2023toolformer,yao2023react}:
\begin{itemize} \begin{itemize}
\item \item
@ -504,7 +531,7 @@ The API surface is intentionally split by interaction pattern. Standard
management operations are handled through Django REST Framework (for management operations are handled through Django REST Framework (for
example role membership, training file upload, and session endpoints), example role membership, training file upload, and session endpoints),
while orchestration-time interaction uses Django Channels over while orchestration-time interaction uses Django Channels over
WebSockets at /ws/onboarding/\textless session\_uuid\textgreater/. This WebSockets at \path{/ws/onboarding/<session_uuid>/}. This
allows the platform to handle both CRUD-style workflows and allows the platform to handle both CRUD-style workflows and
long-running, stateful agent interactions without forcing either pattern long-running, stateful agent interactions without forcing either pattern
into the other \cite{drf2024docs,channels2024docs}. into the other \cite{drf2024docs,channels2024docs}.
@ -513,7 +540,8 @@ For ingestion, the backend follows an asynchronous execution path:
uploaded files are stored as TrainingFile records, and a post-save uploaded files are stored as TrainingFile records, and a post-save
trigger enqueues background processing through Celery (Redis broker). trigger enqueues background processing through Celery (Redis broker).
This prevents heavy preprocessing from blocking request-response latency This prevents heavy preprocessing from blocking request-response latency
on the main web process \cite{celery2024docs,redis2024docs}. on the main web process \cite{celery2024docs,redis2024docs}
(\texttt{apps/knowledge/tasks.py:45--117}).
Persistence is model-driven and traceable. Session state, progress, Persistence is model-driven and traceable. Session state, progress,
generated onboarding structures, and interaction events are stored in generated onboarding structures, and interaction events are stored in
@ -533,7 +561,7 @@ API, Celery worker, PostgreSQL/pgvector database, and GPU endpoint.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=5.75521in,height=5.14354in]{diagrams/embedding-data-flow.png} \includegraphics[height=3.8in]{diagrams/embedding-data-flow.png}
\caption{Knowledge ingestion data flow diagram, illustrating the interaction between the user, REST API, Celery worker, pgvector database, and GPU endpoint.} \caption{Knowledge ingestion data flow diagram, illustrating the interaction between the user, REST API, Celery worker, pgvector database, and GPU endpoint.}
\label{fig:embedding-data-flow} \label{fig:embedding-data-flow}
\end{figure} \end{figure}
@ -551,7 +579,7 @@ batches long content, and calls the GPU service at /v1/semantic-chunk.
The service performs sentence-level semantic breakpoint detection using The service performs sentence-level semantic breakpoint detection using
embedding-distance thresholds, then returns coherent chunks with embedding-distance thresholds, then returns coherent chunks with
embeddings. This avoids naive fixed-size splits that can break context embeddings. This avoids naive fixed-size splits that can break context
mid-concept \cite{reimers2019sbert,sbert2024docs,fastapi2024docs}. mid-concept \cite{reimers2019sbert,fastapi2024docs}.
\underline{Vector storage and retrieval with pgvector}\\ \underline{Vector storage and retrieval with pgvector}\\
Returned chunk embeddings are stored in KnowledgeChunk.embedding (768 Returned chunk embeddings are stored in KnowledgeChunk.embedding (768
@ -559,7 +587,8 @@ dimensions) in PostgreSQL using pgvector, linked relationally to role
and source file metadata. Retrieval is performed in SQL using and source file metadata. Retrieval is performed in SQL using
cosine-distance ranking and top-k selection, allowing role filtering and cosine-distance ranking and top-k selection, allowing role filtering and
similarity search in one query path similarity search in one query path
\cite{karpukhin2020dpr,johnson2019faiss,pgvector2024}. \cite{karpukhin2020dpr,pgvector2024}
(\texttt{apps/onboarding/mcp.py:101--127}).
\subsubsection{Agent Orchestration Workflow \subsubsection{Agent Orchestration Workflow
(Simplified)}\label{agent-orchestration-workflow-simplified} (Simplified)}\label{agent-orchestration-workflow-simplified}
@ -610,13 +639,17 @@ runtime where each stage contributes to structured onboarding output.
Tool-mediated grounding is handled through the MCP router. During Tool-mediated grounding is handled through the MCP router. During
orchestration, model responses may include tool calls; the runtime orchestration, model responses may include tool calls; the runtime
executes approved tools (such as search\_knowledge and executes approved tools (such as \texttt{search\_knowledge} and
update\_progress), retrieves contextual evidence from pgvector-backed \texttt{update\_progress}), retrieves contextual evidence from pgvector-backed
documents, and injects those results back into the message loop before documents, and injects those results back into the message loop before
final answer generation. This keeps generation anchored in role-specific final answer generation (\path{consumers/base.py:77-132},
\path{mcp.py:78-159}). This keeps generation anchored in role-specific
organizational material while preserving a controlled boundary between organizational material while preserving a controlled boundary between
model reasoning and data access. model reasoning and data access.
Pseudocode for the orchestration loop, MCP tool dispatch, ingestion pipeline, and CA/KA/AA
generation sequence is provided in \path{docs/orchestration-pseudocode.md}.
\subsection{Workflow Implementation}\label{workflow-implementation} \subsection{Workflow Implementation}\label{workflow-implementation}
\begin{figure}[H] \begin{figure}[H]
@ -638,7 +671,8 @@ opens a persistent WebSocket connection to the orchestration endpoint
and submits user prompts/actions as session events. The orchestrator and submits user prompts/actions as session events. The orchestrator
resolves the active configuration for that role/session, runs model resolves the active configuration for that role/session, runs model
inference, executes retrieval tools when required, and emits structured inference, executes retrieval tools when required, and emits structured
runtime events (status/tool/completion) back to the client. runtime events (status/tool/completion) back to the client
(\texttt{apps/onboarding/consumers/generate.py:34--124}).
During guided learning, module content generation, context retrieval, During guided learning, module content generation, context retrieval,
and assessment output are coordinated in sequence. The curriculum phase and assessment output are coordinated in sequence. The curriculum phase
@ -710,12 +744,43 @@ retrieval effectiveness, and (3) operational feasibility.
onboarding, validating the privacy-first local inference objective. onboarding, validating the privacy-first local inference objective.
\end{itemize} \end{itemize}
\subsubsection{Quantitative Evaluation}\label{quantitative-evaluation} \textbf{Contributions Realised}
The introduction stated three primary contributions. Each is directly evidenced by the implemented system:
\begin{enumerate}
\item \textbf{A distributed agentic onboarding architecture.}
The system physically separates the application layer (Django, Celery, PostgreSQL) from the inference layer (FastAPI, llama.cpp), connected via authenticated HTTP. Four agent roles --- Curriculum, Knowledge, Assessment, and Progress Monitor --- operate within a shared orchestration runtime with distinct responsibilities and configuration records. The architecture is fully deployed at \url{https://fyp.viswamedha.com} and reproducible via the submitted Docker Compose stack.
\item \textbf{A tool-aware orchestration runtime integrated with Django.}
The \texttt{orchestrate} method (\path{consumers/base.py:77--132}) implements a multi-turn agentic loop: the model receives tool definitions at each inference step, may invoke approved MCP tools (\texttt{search\_knowledge}, \texttt{update\_progress}, \texttt{get\_role\_context}), and receives structured tool results before generating a final response. This loop is embedded directly within a Django Channels WebSocket consumer, giving it access to the full Django ORM and session state --- a deliberate integration decision documented in Section~\ref{design-philosophy-the-distributed-agentic-pattern}.
\item \textbf{A privacy-preserving RAG training system using local LLM inference.}
All model inference runs on a local GPU node using a quantized open-weight model (\path{Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf}) via llama.cpp. Organisation documents never leave the deployment environment: ingestion, embedding, and retrieval all operate within the self-hosted stack. The ingestion pipeline (\path{knowledge/tasks.py:45--117}) processes uploaded files into role-scoped vector chunks stored in pgvector, which are retrieved at inference time using cosine-distance search rather than any external API.
\end{enumerate}
Together, these contributions demonstrate that a production-viable, privacy-preserving agentic training system can be built and deployed on consumer-grade hardware within a standard web application framework.
Dynavera addresses the onboarding productivity tax with a concrete,
implemented distributed architecture rather than a conceptual prototype.
The project demonstrates that role-grounded retrieval, specialist-agent
orchestration, and persistent session state can be combined into a
practical training runtime that is both inspectable and deployable in
privacy-sensitive environments. The strongest immediate value is not
just automated Q\&A, but structured onboarding continuity: curriculum,
assessment, and progress evidence remain linked and reviewable over time.
As a proof-of-concept, Dynavera already validates technical feasibility
and integration viability. Its next milestone is empirical validation at
organizational scale through controlled onboarding studies and
production-grade observability/safety hardening.
\subsection{Quantitative Evaluation}\label{quantitative-evaluation}
To strengthen the engineering evaluation beyond qualitative observations, To strengthen the engineering evaluation beyond qualitative observations,
representative measurements were collected from controlled development representative measurements were collected from controlled development
runs using role-scoped onboarding prompts and tool-enabled inference runs using role-scoped onboarding prompts and tool-enabled inference
calls. calls (Table~\ref{tab:quantitative-evaluation}).
\begin{table}[H] \begin{table}[H]
\centering \centering
@ -743,7 +808,7 @@ They also indicate that semantic chunking and dense retrieval are
effective enough for role-grounded onboarding in the current effective enough for role-grounded onboarding in the current
proof-of-concept scope. proof-of-concept scope.
\subsubsection{Limitations}\label{limitations} \subsection{Limitations}\label{limitations}
\begin{itemize} \begin{itemize}
\item \item
@ -761,7 +826,7 @@ proof-of-concept scope.
synthetic or curated test prompts rather than production traffic. synthetic or curated test prompts rather than production traffic.
\end{itemize} \end{itemize}
\subsubsection{Future Improvements}\label{future-improvements} \subsection{Future Improvements}\label{future-improvements}
The next development phase should focus on measurable training outcomes, The next development phase should focus on measurable training outcomes,
operational hardening, and richer adaptivity: operational hardening, and richer adaptivity:
@ -785,8 +850,8 @@ operational hardening, and richer adaptivity:
around tool calls, implement stronger role-boundary tests, and add around tool calls, implement stronger role-boundary tests, and add
automated red-team style checks for prompt/tool misuse scenarios. automated red-team style checks for prompt/tool misuse scenarios.
\item \item
\textbf{Scalability and observability:} Introduce request tracing, \textbf{Scalability and observability:} Add request tracing,
queue-depth dashboards, and load/performance benchmarks to support queue-depth dashboards, and performance benchmarks to support
multi-tenant deployment planning. multi-tenant deployment planning.
\item \item
\textbf{Multi-modal onboarding support:} Extend ingestion and \textbf{Multi-modal onboarding support:} Extend ingestion and
@ -794,21 +859,6 @@ operational hardening, and richer adaptivity:
real enterprise training assets. real enterprise training assets.
\end{itemize} \end{itemize}
\subsubsection{Conclusion}\label{conclusion}
Dynavera addresses the onboarding productivity tax with a concrete,
implemented distributed architecture rather than a conceptual prototype.
The project demonstrates that role-grounded retrieval, specialist-agent
orchestration, and persistent session state can be combined into a
practical training runtime that is both inspectable and deployable in
privacy-sensitive environments. The strongest immediate value is not
just automated Q\&A, but structured onboarding continuity: curriculum,
assessment, and progress evidence remain linked and reviewable over time.
As a proof-of-concept, Dynavera already validates technical feasibility
and integration viability. Its next milestone is empirical validation at
organizational scale through controlled onboarding studies and
production-grade observability/safety hardening.
\section{References}\label{references} \section{References}\label{references}
\bibliographystyle{unsrtnat} \bibliographystyle{unsrtnat}