818 lines
39 KiB
TeX
818 lines
39 KiB
TeX
\documentclass[12pt]{article}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{lmodern}
|
|
\usepackage[a4paper,margin=0.62in]{geometry}
|
|
\usepackage{longtable}
|
|
\usepackage{enumitem}
|
|
\usepackage{booktabs}
|
|
\usepackage{array}
|
|
\usepackage{graphicx}
|
|
\usepackage{float}
|
|
\usepackage{ulem}
|
|
\usepackage{calc}
|
|
\usepackage[hidelinks]{hyperref}
|
|
\usepackage{tabularx}
|
|
\usepackage{xurl}
|
|
\usepackage[numbers,sort&compress]{natbib}
|
|
|
|
% Report-style paragraph spacing
|
|
\setlength{\parindent}{0pt}
|
|
\setlength{\parskip}{0.3em}
|
|
\setlength{\emergencystretch}{1em}
|
|
\setlist{itemsep=0.2em, parsep=0em, topsep=0.3em}
|
|
|
|
\begin{document}
|
|
|
|
\title{An Agentic Approach to Role-Specific Trainers (Dynavera)}
|
|
\author{Viswamedha Nalabotu\\2402117\\vxn217@student.bham.ac.uk\\University of Birmingham}
|
|
\date{}
|
|
\maketitle
|
|
|
|
\section*{AI Use Declaration}\label{ai-use-declaration}
|
|
|
|
I declare that Large Language Models (LLMs) and
|
|
Chat Completion APIs were used in the preparation of this report and for
|
|
assisting with coding the project.
|
|
|
|
\textbf{Scope of AI Usage.} AI was used to assist in the structural organization, grammatical refinement, and syntactic formatting of the prose and technical descriptions.
|
|
|
|
\textbf{Prototyping \& Feasibility Research.} LLMs were employed during the R\&D phase to \textbf{scope technical requirements and perform feasibility checks}. This included generating "throwaway" boilerplate code to test the viability of specific architectural branches (e.g., comparing custom fine tuning against LangGraph API) and validating the compatibility of the Model Context Protocol (MCP) with the existing Django environment.
|
|
|
|
\textbf{Originality of Content.} All core architectural concepts, the design of the \emph{Dynavera} system, the "Distributed Agentic Pattern" logic, and the specific implementation strategies are my own original works.
|
|
|
|
\textbf{Fact-Checking and References.} Any external information or technical claims used to ground the AI\textquotesingle s output have been verified against the primary sources listed in the References section.
|
|
|
|
\textbf{Human Oversight.} I have critically reviewed, edited, and refined all AI-generated suggestions to ensure technical accuracy and alignment with the project's objectives.
|
|
|
|
\section*{Inspector Access Details}\label{inspector-access-details}
|
|
|
|
The public deployment for evaluation is available at:
|
|
\url{https://fyp.viswamedha.com}
|
|
|
|
Register as a manager (with code \texttt{MANAGER2026}) or use the following credentials for testing:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{p{0.22\linewidth} p{0.46\linewidth} p{0.22\linewidth}}
|
|
\toprule
|
|
Role & Email & Password \\
|
|
\midrule
|
|
Admin & admin@example.com & admin \\
|
|
Manager & haleisaac@example.com & password \\
|
|
User & j.thompson@example.com & password \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
\textit{Note: The public site should always be available, but the GPU node
|
|
runs on my PC and can go offline. For reliable testing,
|
|
I recommend running my development compose stack on a CUDA-enabled machine with a GPU.}
|
|
|
|
\section{Introduction}\label{introduction}
|
|
|
|
\subsection{Background: The Corporate Onboarding
|
|
Problem}\label{background-the-corporate-onboarding-problem}
|
|
|
|
When a startup or enterprise hires a new team member, they enter a
|
|
critical induction period requiring exposure to internal tools,
|
|
organizational context, and role-specific responsibilities.
|
|
Traditionally, this process is led by a senior colleague who acts as the
|
|
primary trainer and point of contact.
|
|
|
|
While effective, this model introduces a significant \emph{productivity
|
|
tax}: every hour spent training represents an hour of lost expert-level
|
|
output. This issue is amplified in startups, where teams are small,
|
|
budgets are constrained, and hiring decisions must be highly selective.
|
|
As a result, training often becomes inconsistent, slow, and difficult to
|
|
scale.
|
|
|
|
\subsection{Motivation}\label{motivation}
|
|
|
|
The motivation behind Dynavera is to reduce this productivity loss by
|
|
automating and enhancing role-specific training through AI. By replacing
|
|
static documentation and repeated human-led instruction with intelligent
|
|
agents, organizations can reduce reliance on ad-hoc mentorship while
|
|
preserving access to expert knowledge.
|
|
|
|
I observed this firsthand during my industrial placement at Siemens
|
|
DISW, where onboarding and training five incoming interns required a
|
|
significant investment of time in meetings, planning, and repeated
|
|
knowledge transfer to ensure a graceful handover. Despite careful
|
|
preparation, much of the training depended on individual availability
|
|
and tacit understanding, temporarily diverting effort away from feature
|
|
work. This experience highlighted how difficult it is to scale
|
|
onboarding without imposing a sustained productivity cost on senior
|
|
contributors.
|
|
|
|
By addressing this gap, Dynavera enables organizations to:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Scale Mentorship: Support multiple new hires simultaneously while
|
|
minimising senior staff intervention
|
|
\item
|
|
Standardize Quality: Ensure consistent depth, structure, and
|
|
assessment across all onboarding experiences
|
|
\item
|
|
Reduce Time-to-Productivity (TTP): Provide 24/7 access to contextual,
|
|
role-aware support through AI agents
|
|
\end{itemize}
|
|
|
|
Dynavera is designed as a proof-of-concept platform that transforms
|
|
onboarding into a dynamic, adaptive, and reusable training workflow.
|
|
|
|
This project makes three primary contributions: (1) a distributed
|
|
agentic onboarding architecture, (2) a tool-aware orchestration runtime
|
|
integrated with Django, and (3) a privacy-preserving RAG training
|
|
system using local LLM inference.
|
|
|
|
\section{Project Background \&
|
|
Context}\label{project-background-context}
|
|
|
|
\subsection{The training bottleneck}\label{the-training-bottleneck}
|
|
|
|
Modern organizations face persistent challenges when standardizing and
|
|
scaling role-specific training:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Skill-Preloading Bias: Limited training capacity forces organizations
|
|
to favor candidates with prior experience in specific tools or
|
|
technology stacks, even when strong general aptitude or learning
|
|
ability may be sufficient.
|
|
\item
|
|
Restricted Talent Pool: By prioritizing immediate productivity over
|
|
trainability, organizations reduce access to diverse candidates who
|
|
could otherwise ramp up quickly with adequate onboarding support.
|
|
\item
|
|
Inflated Hiring Requirements: Role specifications often expand to
|
|
include non-essential tooling experience as a substitute for
|
|
structured training, increasing time-to-hire and cost.
|
|
\item
|
|
Uneven Knowledge Transfer: New hires are expected to ``already know''
|
|
systems and workflows, resulting in fragmented understanding and
|
|
slower integration into team-specific practices.
|
|
\end{itemize}
|
|
|
|
These constraints do more than just increase immediate onboarding
|
|
friction; their accumulation creates a compounding "productivity tax"
|
|
that stifles organizational growth. When left unaddressed, the aftermath
|
|
manifests in three critical areas:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Institutional Fragility: Over-reliance on tribal knowledge and
|
|
senior-led instruction creates single points of failure. If key
|
|
mentors depart, the lack of standardized, automated training leads to
|
|
a permanent loss of institutional memory and a degraded ability to
|
|
upskill replacements.
|
|
\item
|
|
Cultural and Innovation Stagnation: By reinforcing conservative hiring
|
|
through the prioritization of "plug-and-play" candidates over those
|
|
with high learning agility, organizations inadvertently filter out the
|
|
diverse perspectives and outsider logic that drive innovation. This
|
|
results in a homogenized workforce that excels at maintaining the
|
|
status quo but struggles to pivot.
|
|
\item
|
|
Compounded Opportunity Cost: The delay in reaching full productivity
|
|
(TTP) is not a linear loss. It represents a systemic lag in project
|
|
delivery and market responsiveness. For a scaling startup, the
|
|
cumulative effect of several hires operating at 50\% capacity for
|
|
months can be the difference between hitting a product milestone or
|
|
missing a market window entirely.
|
|
\end{itemize}
|
|
|
|
Ultimately, these factors coalesce into a cycle of restricted
|
|
scalability. The cost of adding new talent becomes so high that the
|
|
organization eventually stops growing simply to avoid the sustained pain
|
|
and friction of integration.
|
|
|
|
\subsection{Recent advancements in agentic
|
|
AI}\label{recent-advancements-in-agentic-ai}
|
|
|
|
Recent advances in Large Language Models (LLMs) and multi-agent systems
|
|
offer a viable solution to the onboarding bottleneck. Modern LLMs
|
|
demonstrate strong capabilities in natural language understanding,
|
|
contextual reasoning, and adaptive response generation, making them
|
|
well-suited for interactive, role-aware training scenarios. Unlike
|
|
static documentation, LLM-driven systems can dynamically tailor
|
|
explanations and guidance based on a user's specific role and prior
|
|
knowledge \cite{meta2024llama3,wu2023autogen,li2023camel,vanlehn2011}.
|
|
Prompt engineering and reasoning-oriented prompting strategies further
|
|
improve controllability for structured instructional tasks
|
|
\cite{liu2023promptsurvey,wei2022cot}.
|
|
|
|
Rather than relying on a monolithic chatbot, Dynavera employs a
|
|
collection of specialized, collaborating agents. This modular approach
|
|
provides several distinct advantages:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Efficient Resource Allocation: By distributing responsibilities across
|
|
agents, the system maintains clearer reasoning boundaries. This
|
|
architecture reduces the computational overhead and "token bloat"
|
|
often associated with all-in-one prompts, leading to faster response
|
|
times and more efficient use of infrastructure resources
|
|
\cite{wu2023autogen,li2023camel}.
|
|
\item
|
|
Targeted Maintainability and Explainability: Decoupled agents allow
|
|
for the optimization of specific components, such as the assessment or
|
|
knowledge retrieval modules, without requiring a total system
|
|
redesign. Because each agent has a narrow scope, the system provides
|
|
more transparent reasoning for its guidance, making it easier for
|
|
human supervisors to audit the AI\textquotesingle s logic.
|
|
\end{itemize}
|
|
|
|
Furthermore, agent collaboration enables training workflows that more
|
|
closely resemble human mentorship, where guidance and evaluation occur
|
|
in parallel. This architecture allows Dynavera to serve not only the
|
|
trainee but also the broader organizational stakeholders, including HR
|
|
departments and team leads. By capturing granular interaction data, the
|
|
modularity, explainability, and system adaptability
|
|
\cite{langgraph2024,wu2023autogen,li2023camel}.
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Integral Progress Analytics: Automated reports and charts track
|
|
trainee milestones in real-time, allowing HR to identify exactly where
|
|
organizational knowledge evolves
|
|
\cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey,pinecone2023rag}.
|
|
\item
|
|
Continuous Curriculum Optimization: The system can flag specific
|
|
training modules that frequently cause friction or confusion,
|
|
suggesting content updates or sections that require a human-led
|
|
review.
|
|
\item
|
|
Strategic Escalation: By identifying complex, high-friction topics
|
|
that exceed the AI\textquotesingle s current scope, Dynavera can
|
|
pinpoint the exact moments requiring senior staff intervention. This
|
|
ensures that expert time is reserved for nuanced, high-value coaching
|
|
rather than repetitive technical basics.
|
|
\end{itemize}
|
|
|
|
This dual-purpose design ensures that while Dynavera scales the trainee
|
|
experience, it simultaneously provides the data-driven visibility and
|
|
administrative control required for long-term organizational growth.
|
|
|
|
\subsection{Theoretical Foundations}\label{theoretical-foundations}
|
|
|
|
Dynavera is grounded in two complementary system design paradigms that
|
|
enable scalable, context-aware onboarding:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Multi-Agent Systems (MAS): A collection of specialized AI agents
|
|
collaborate through structured communication to achieve complex
|
|
objectives that exceed the capability of a single monolithic model.
|
|
Within Dynavera, this enables separation of instructional delivery,
|
|
contextual reasoning, knowledge retrieval, and evaluation, improving
|
|
modularity, explainability, and system adaptability \cite{langgraph2024}.
|
|
\item
|
|
Retrieval-Augmented Generation (RAG): Training responses are grounded
|
|
in authoritative, organization-specific documentation rather than
|
|
relying solely on a model's parametric knowledge. This ensures factual
|
|
accuracy, contextual relevance, and rapid adaptability as
|
|
organizational knowledge evolves \cite{pinecone2023rag}.
|
|
\end{itemize}
|
|
|
|
To address data privacy and deployment constraints, Dynavera prioritizes
|
|
local inference using quantized open-weight models (e.g., Llama 3 in
|
|
GGUF format). This design choice reduces dependency on external cloud
|
|
APIs, supports offline or air-gapped environments, and aligns with
|
|
enterprise privacy requirements while maintaining acceptable inference
|
|
performance \cite{meta2024llama3,dettmers2023bitsandbytes,llamacpp2024}.
|
|
|
|
\textbf{Model Selection Rationale.}
|
|
Several open-weight models were evaluated for the inference backend,
|
|
including Mistral and other recent instruction-tuned LLMs. Ultimately,
|
|
\path{Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf} was selected for deployment.
|
|
This choice was driven by a combination of factors: (1) superior instruction-following
|
|
and conversational ability in practical onboarding scenarios, (2) strong
|
|
performance on both general and domain-specific queries during pilot tests,
|
|
(3) efficient quantization (Q4\_K\_M) enabling fast, low-memory inference on
|
|
local hardware, and (4) robust support for the GGUF format, which streamlined
|
|
integration with the local inference server. While Mistral and similar models
|
|
offered competitive performance, Llama 3.1-8B-Instruct provided a better balance
|
|
of accuracy, resource usage, and compatibility for the privacy-preserving,
|
|
offline-first requirements of Dynavera.
|
|
|
|
\subsection{Positioning Against Alternative
|
|
Approaches}\label{positioning-against-alternative-approaches}
|
|
|
|
Dynavera was designed against three practical alternatives. First,
|
|
human-only onboarding preserves expert nuance but scales poorly and
|
|
imposes recurring opportunity cost on senior staff. Second, static
|
|
LMS/document-first onboarding scales distribution but offers limited
|
|
adaptivity, weak grounding during Q\&A, and minimal operational
|
|
traceability beyond completion events. Third, a single general chatbot
|
|
improves interactivity, but it often collapses curriculum, retrieval,
|
|
assessment, and monitoring into one prompt surface, which weakens
|
|
governance and makes targeted iteration harder.
|
|
|
|
The Dynavera architecture chooses a middle path: specialized agent roles
|
|
within one orchestrated runtime, retrieval-grounded generation, and
|
|
persisted session state for reviewability. This trade-off accepts added
|
|
system complexity in exchange for clearer responsibility boundaries,
|
|
better modularity, and stronger alignment between training delivery,
|
|
evaluation quality, and management oversight.
|
|
|
|
\subsection{Related Work Synthesis}\label{related-work-synthesis}
|
|
|
|
Recent research supports the technical direction selected for Dynavera,
|
|
while also highlighting the constraints that motivate its architecture.
|
|
RAG work shows that external retrieval can improve factuality and
|
|
knowledge coverage for generation-heavy tasks by grounding outputs in
|
|
retrieved evidence rather than relying only on parametric memory
|
|
\cite{lewis2020rag,karpukhin2020dpr,gao2023ragsurvey}. Tool-use research further demonstrates that models
|
|
can improve task performance when they call external functions at
|
|
inference time, which aligns with Dynavera's MCP-mediated backend tools
|
|
for retrieval and progress updates \cite{schick2023toolformer,yao2023react}.
|
|
|
|
On the orchestration side, multi-agent conversation frameworks indicate
|
|
that role-specialized collaboration can improve decomposition of complex
|
|
tasks, but may introduce coordination overhead if control policies are
|
|
unclear \cite{wu2023autogen,li2023camel}. Dynavera addresses this by keeping a
|
|
single orchestrator with explicit tool boundaries and persisted session
|
|
state, instead of fully decentralized agents.
|
|
|
|
From a learning-science perspective, prior tutoring studies suggest that
|
|
interactive, adaptive guidance can produce better learning outcomes than
|
|
static instruction alone \cite{vanlehn2011}. This supports Dynavera's
|
|
choice to combine guided curriculum, retrieval-grounded explanations,
|
|
and iterative assessment in one runtime. Relative to these strands,
|
|
Dynavera's contribution is primarily systems integration: a practical,
|
|
privacy-preserving implementation that connects role-scoped retrieval,
|
|
tool-aware orchestration, and auditable onboarding state in a single
|
|
deployment model.
|
|
|
|
\subsection{Learning Origins}\label{learning-origins}
|
|
|
|
The design and implementation of Dynavera synthesize concepts developed
|
|
through university coursework and independent technical exploration:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Software Systems Architecture (CS301): Application of decoupled
|
|
service architectures using Django and Vue.js, alongside the use of
|
|
sidecar-style components to isolate model execution and agent
|
|
coordination.
|
|
\item
|
|
Machine Learning \& NLP: Practical experimentation with LoRA
|
|
fine-tuning and low-bit quantization (e.g., 4-bit inference via
|
|
bitsandbytes) to optimize model performance under local hardware
|
|
constraints \cite{hu2021lora,dettmers2023bitsandbytes}.
|
|
\item
|
|
Full-Stack Development: Construction of production-oriented APIs using
|
|
Django REST Framework and responsive front-end interfaces with Vue 3,
|
|
enabling real-world interaction with agent-driven workflows.
|
|
\end{itemize}
|
|
|
|
Together, these learning sources informed both the architectural
|
|
decisions and implementation strategies underpinning Dynavera.
|
|
|
|
\section{Specification}\label{specification}
|
|
|
|
\subsection{System Overview}\label{system-overview}
|
|
|
|
Dynavera is implemented as a Distributed Agentic System, physically
|
|
decoupling the administrative and state management logic from the
|
|
high-latency inference workloads. As illustrated in Figure~\ref{fig:system-architecture}, the
|
|
architecture is split into two primary environments:
|
|
|
|
\begin{enumerate}
|
|
\def\labelenumi{\arabic{enumi}.}
|
|
\item
|
|
The Application Layer: A CPU-optimized environment running Django 5.x.
|
|
This layer handles user authentication, training state, and the MCP
|
|
Server, which acts as a standardized "data bridge" for the AI.
|
|
\item
|
|
The Inference Layer: A dedicated NVIDIA-based node running a FastAPI
|
|
inference engine. This layer handles Large Language Model (LLM)
|
|
execution, semantic chunking, and embedding generation.
|
|
\end{enumerate}
|
|
|
|
The "brain" of the system is the Orchestrator, which lives within a
|
|
Django Channels WebSocket consumer. It maintains a persistent,
|
|
full-duplex connection between the trainee and the distributed AI
|
|
components, ensuring real-time interactivity.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=\textwidth,keepaspectratio]{diagrams/system-architecture.png}
|
|
\caption{High-level system architecture of Dynavera, illustrating the interaction between the user, orchestrator, inference layer, and database.}
|
|
\label{fig:system-architecture}
|
|
\end{figure}
|
|
|
|
\subsection{Technology stack}\label{technology-stack}
|
|
|
|
Dynavera is implemented as a modern full-stack application, with the
|
|
components presented in Table 1.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\begin{tabularx}{\linewidth}{p{0.22\linewidth} p{0.16\linewidth} X}
|
|
\toprule
|
|
Component & Technology & Rationale \\
|
|
\midrule
|
|
Frontend/UI & Vue 3 w/ TS & Typesafe, reactive UI enabling rapid iteration and maintainable component design \\
|
|
State Management & Pinia & Centralized, predictable state management for real-time training progress tracking \\
|
|
Backend/API & Django REST & Secure, mature framework supporting rapid development and scalable API design, informed by prior production experience \\
|
|
Database & PostgreSQL & Reliable, production-grade relational database for organizational and user data \\
|
|
Vector Store & PgVector & Efficient similarity search over embedded training documentation via PostgreSQL \\
|
|
MCP Router & Python & Provides a standardized interface for agents to query data using Model Context Protocol. \\
|
|
\bottomrule
|
|
\end{tabularx}
|
|
\caption{Architectural components of the Dynavera platform, including frontend, backend, and AI integration technologies.}
|
|
\end{table}
|
|
|
|
This stack was selected through explicit privacy, governance, and
|
|
operability trade-offs rather than convenience alone. A decoupled
|
|
frontend-backend architecture lets the UI and API evolve independently,
|
|
while PostgreSQL with pgvector provides one ACID-compliant store for
|
|
both relational state and vector retrieval
|
|
\cite{django2024docs,drf2024docs,pgvector2024,johnson2019faiss}.
|
|
|
|
Alternatives considered included LangChain-style orchestration,
|
|
external vector databases (for example Pinecone), and cloud-hosted LLM
|
|
APIs. These were not chosen for the current build because: (1)
|
|
additional orchestration abstraction reduced visibility into tool-call
|
|
state transitions, (2) external vector hosting conflicted with the
|
|
privacy-first data residency goal, and (3) cloud inference introduced a
|
|
strong dependency on third-party availability and data egress policy.
|
|
|
|
To preserve performance and control, orchestration is implemented in
|
|
native Python rather than heavier framework abstractions. This keeps
|
|
agent state handling explicit, reduces WebSocket-loop latency, and
|
|
supports local execution, data ownership, and architectural
|
|
transparency during early-stage development
|
|
\cite{langgraph2024,channels2024docs}.
|
|
|
|
\subsection{Design Philosophy: The Distributed Agentic
|
|
Pattern}\label{design-philosophy-the-distributed-agentic-pattern}
|
|
|
|
Dynavera leverages the Model Context Protocol (MCP) to solve the
|
|
"context gap" in corporate onboarding. Rather than providing the LLM
|
|
with a static, bloated prompt, the system utilizes a Sidecar Tooling
|
|
approach \cite{anthropic2024mcp,huggingface2024mcp,schick2023toolformer,yao2023react}:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
The MCP Server as a Translator: Integrated directly into the Django
|
|
ecosystem, the MCP layer exposes specific "Tools" (e.g.,
|
|
search\_knowledge, get\_user\_progress) to the AI. This allows the
|
|
model to query the organization\textquotesingle s private data safely
|
|
and efficiently.
|
|
\item
|
|
The Streaming Orchestration Loop: Unlike traditional request-response
|
|
cycles, the system uses an asynchronous loop. The Orchestrator manages
|
|
the "triangle of communication":
|
|
|
|
\begin{enumerate}
|
|
\def\labelenumi{\arabic{enumi}.}
|
|
\item
|
|
Receives user input via WebSockets.
|
|
\item
|
|
Prompts the GPU Layer for a decision.
|
|
\item
|
|
If the AI requests data, the Orchestrator calls the MCP Tool
|
|
internally.
|
|
\item
|
|
Streams the final result back to the user with minimal latency.
|
|
\end{enumerate}
|
|
\end{itemize}
|
|
|
|
This separation ensures that the core application remains responsive
|
|
while the heavy lifting of "thinking" and "embedding" happens on
|
|
specialized hardware. It transforms the onboarding experience from a
|
|
static tutorial into a Streaming Agentic System that adapts to the
|
|
trainee in real-time.
|
|
|
|
\section{Implementation Overview}\label{implementation-overview}
|
|
|
|
\subsection{Backend Realisation}\label{backend-realisation}
|
|
|
|
This section describes how the architecture in Chapter 3 is implemented
|
|
in the current Dynavera codebase. The backend is organized into three
|
|
primary Django app domains: accounts (users, organizations, roles,
|
|
membership), knowledge (training files, ingestion, chunk/embedding
|
|
persistence), and onboarding (sessions, orchestration, generated flows,
|
|
interaction logs). This separation keeps responsibility boundaries
|
|
explicit while allowing shared infrastructure (authentication, ORM, API
|
|
routing, and permissions) to remain centralized.
|
|
|
|
The API surface is intentionally split by interaction pattern. Standard
|
|
management operations are handled through Django REST Framework (for
|
|
example role membership, training file upload, and session endpoints),
|
|
while orchestration-time interaction uses Django Channels over
|
|
WebSockets at /ws/onboarding/\textless session\_uuid\textgreater/. This
|
|
allows the platform to handle both CRUD-style workflows and
|
|
long-running, stateful agent interactions without forcing either pattern
|
|
into the other \cite{drf2024docs,channels2024docs}.
|
|
|
|
For ingestion, the backend follows an asynchronous execution path:
|
|
uploaded files are stored as TrainingFile records, and a post-save
|
|
trigger enqueues background processing through Celery (Redis broker).
|
|
This prevents heavy preprocessing from blocking request-response latency
|
|
on the main web process \cite{celery2024docs,redis2024docs}.
|
|
|
|
Persistence is model-driven and traceable. Session state, progress,
|
|
generated onboarding structures, and interaction events are stored in
|
|
Django models, enabling deterministic recovery and auditability of the
|
|
onboarding lifecycle. In implementation terms, the backend is less a
|
|
single monolith and more a coordinated runtime: REST for management,
|
|
WebSockets for orchestration, Celery for heavy async jobs, and
|
|
PostgreSQL/pgvector as a unified data plane.
|
|
|
|
\subsection{Data Flow}\label{data-flow}
|
|
|
|
\subsubsection{Knowledge Ingestion
|
|
Workflow}\label{knowledge-ingestion-workflow}
|
|
|
|
Figure~\ref{fig:embedding-data-flow} shows the ingestion data flow between the User/UI, Django REST
|
|
API, Celery worker, PostgreSQL/pgvector database, and GPU endpoint.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=5.75521in,height=5.14354in]{diagrams/embedding-data-flow.png}
|
|
\caption{Knowledge ingestion data flow diagram, illustrating the interaction between the user, REST API, Celery worker, pgvector database, and GPU endpoint.}
|
|
\label{fig:embedding-data-flow}
|
|
\end{figure}
|
|
|
|
\underline{Asynchronous processing with Celery (Redis broker)}\\
|
|
When a manager uploads a training file from the UI, the file is sent to
|
|
the Django REST API and stored as a TrainingFile record with an initial
|
|
ingesting status. A post-save hook then enqueues a Celery task via
|
|
Redis, so heavy processing runs outside the request/response cycle. This
|
|
keeps the web server responsive even for large documents.
|
|
|
|
\underline{Semantic chunking on the GPU endpoint}\\
|
|
The Celery task extracts raw text from uploaded files (PDF/DOCX/TXT),
|
|
batches long content, and calls the GPU service at /v1/semantic-chunk.
|
|
The service performs sentence-level semantic breakpoint detection using
|
|
embedding-distance thresholds, then returns coherent chunks with
|
|
embeddings. This avoids naive fixed-size splits that can break context
|
|
mid-concept \cite{reimers2019sbert,sbert2024docs,fastapi2024docs}.
|
|
|
|
\underline{Vector storage and retrieval with pgvector}\\
|
|
Returned chunk embeddings are stored in KnowledgeChunk.embedding (768
|
|
dimensions) in PostgreSQL using pgvector, linked relationally to role
|
|
and source file metadata. Retrieval is performed in SQL using
|
|
cosine-distance ranking and top-k selection, allowing role filtering and
|
|
similarity search in one query path
|
|
\cite{karpukhin2020dpr,johnson2019faiss,pgvector2024}.
|
|
|
|
\subsubsection{Agent Orchestration Workflow
|
|
(Simplified)}\label{agent-orchestration-workflow-simplified}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=6.15132in,height=6.00619in]{diagrams/agent-orchestration-loop.png}
|
|
\caption{Agent orchestration data flow diagram, illustrating the interaction between the user/UI, WebSocket consumer, MCP router, GPU endpoint, and pgvector database.}
|
|
\label{fig:agent-orchestration-loop}
|
|
\end{figure}
|
|
|
|
Figure~\ref{fig:agent-orchestration-loop} summarizes the orchestration path used during live onboarding.
|
|
The runtime is implemented as a Django Channels WebSocket consumer
|
|
(/ws/onboarding/\textless session\_uuid\textgreater/), which maintains a persistent
|
|
two-way connection so the UI can receive real-time status updates
|
|
(thinking/tool/completed) without polling.
|
|
|
|
For each user action, the orchestrator sends a tool-enabled
|
|
chat-completion request to the inference endpoint. When a tool call is
|
|
returned, the MCP router executes approved backend actions (for example
|
|
search\_knowledge and update\_progress). Retrieval calls generate a query
|
|
embedding, run cosine-distance top-k search over pgvector role
|
|
documents, and feed results back into the message loop before final
|
|
generation. Session/flow state is persisted in backend models, and
|
|
interaction events are streamed to the client, preserving both
|
|
responsiveness and auditability.
|
|
|
|
\subsection{Agentic Runtime
|
|
Structure}\label{agentic-runtime-structure}
|
|
|
|
Dynavera implements a multi-agent training workflow through
|
|
role-specialized configurations executed inside a shared orchestration
|
|
runtime. Conceptually, the system uses four agent roles: Curriculum
|
|
Agent (CA), Knowledge Agent (KA), Assessment Agent (AA), and Progress
|
|
Monitor Agent (PMA). In practice, these are represented by agent
|
|
configuration records and invoked by orchestration logic rather than
|
|
isolated microservices, which keeps deployment complexity manageable
|
|
while preserving modular behavior.
|
|
|
|
The Curriculum Agent (CA) defines module order and high-level learning
|
|
path. The Knowledge Agent (KA) generates grounded instructional content
|
|
and relies on retrieval tools when additional context is needed. The
|
|
Assessment Agent (AA) generates evaluation artifacts (for example quiz
|
|
structures) to validate understanding. The Progress Monitor Agent (PMA)
|
|
evaluates learner trajectory and produces concise progress-oriented
|
|
feedback from session context. Together, these roles form a coordinated
|
|
runtime where each stage contributes to structured onboarding output.
|
|
|
|
Tool-mediated grounding is handled through the MCP router. During
|
|
orchestration, model responses may include tool calls; the runtime
|
|
executes approved tools (such as search\_knowledge and
|
|
update\_progress), retrieves contextual evidence from pgvector-backed
|
|
documents, and injects those results back into the message loop before
|
|
final answer generation. This keeps generation anchored in role-specific
|
|
organizational material while preserving a controlled boundary between
|
|
model reasoning and data access.
|
|
|
|
\subsection{Workflow Implementation}\label{workflow-implementation}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=\textwidth,keepaspectratio]{diagrams/workflow-implementation.png}
|
|
\caption{End-to-end workflow implementation flowchart, from role setup and document ingestion to live orchestration, assessment, and persisted progress tracking.}
|
|
\label{fig:workflow-implementation}
|
|
\end{figure}
|
|
|
|
The implemented training workflow follows a staged operational sequence
|
|
from administrative setup to learner progression. First,
|
|
administrators/managers configure role context and upload role-relevant
|
|
documentation through the application interface. These documents are
|
|
processed through the ingestion pipeline and converted into vectorized
|
|
knowledge records linked to role scope.
|
|
|
|
Next, a trainee enters a role-specific onboarding session. The frontend
|
|
opens a persistent WebSocket connection to the orchestration endpoint
|
|
and submits user prompts/actions as session events. The orchestrator
|
|
resolves the active configuration for that role/session, runs model
|
|
inference, executes retrieval tools when required, and emits structured
|
|
runtime events (status/tool/completion) back to the client.
|
|
|
|
During guided learning, module content generation, context retrieval,
|
|
and assessment output are coordinated in sequence. The curriculum phase
|
|
determines structure, the knowledge phase builds role-grounded
|
|
instructional content, and the assessment phase constructs evaluative
|
|
checkpoints. Final assessment grading follows a mixed strategy: multiple
|
|
choice responses are deterministically compared against configured
|
|
correct options, while non-multiple-choice responses are agent-graded.
|
|
Per-question grading outcomes are persisted in session state for review
|
|
and feedback rendering.
|
|
|
|
Progress monitoring then summarizes current status using persisted
|
|
session state and completed interactions. In the implemented UI path,
|
|
AI monitor inference is only triggered after onboarding completion;
|
|
before completion, the system presents a local progress summary.
|
|
When available, monitor judgements are informed by prior final-quiz
|
|
question/answer evidence and saved grading details. This keeps learning
|
|
flow adaptive without abandoning traceability.
|
|
|
|
Finally, workflow state is persisted throughout execution: user
|
|
responses, progress markers, generated flow structures, and interaction
|
|
logs are stored in backend models. This enables continuity across
|
|
reconnects, supports progress review, and allows the system to advance,
|
|
pause, or remediate onboarding based on recorded outcomes rather than
|
|
transient in-memory state.
|
|
|
|
\section{Results \& Conclusion}\label{results-conclusion}
|
|
|
|
\subsection{System Performance \& Evaluation}\label{system-performance-evaluation}
|
|
|
|
The implementation demonstrates that a distributed, tool-aware
|
|
onboarding runtime is practical in a full-stack setting. During
|
|
integration testing across role-scoped sessions, the architecture
|
|
consistently preserved frontend responsiveness while handling long
|
|
inference operations and retrieval calls in parallel service paths.
|
|
|
|
Evaluation focuses on three aspects: (1) system performance, (2)
|
|
retrieval effectiveness, and (3) operational feasibility.
|
|
|
|
\textbf{What worked well in the current implementation}
|
|
|
|
\begin{itemize}
|
|
\item
|
|
\textbf{End-to-end architecture stability:} The split between Django
|
|
(state/API), Channels (orchestration), Celery (ingestion), and FastAPI
|
|
(GPU inference) operated reliably under normal onboarding flows. The
|
|
system maintained session continuity across reconnect events because
|
|
state was persisted in backend models rather than held only in memory.
|
|
\item
|
|
\textbf{Grounded retrieval quality:} Semantic chunking produced more
|
|
coherent retrieval units than naive fixed-size splitting during manual
|
|
query checks, especially for multi-paragraph policy/procedure content.
|
|
Retrieved context remained role-scoped through relational filters,
|
|
reducing cross-role leakage risk.
|
|
\item
|
|
\textbf{Interaction transparency:} WebSocket status events
|
|
(thinking/tool/completed) improved perceived responsiveness and made
|
|
the orchestration process inspectable from the UI, which is important
|
|
for trust in AI-assisted training.
|
|
\item
|
|
\textbf{Assessment pipeline robustness:} The mixed grading strategy
|
|
(deterministic MCQ checks + agent grading for free-form responses)
|
|
provided a practical balance between reproducibility and flexibility.
|
|
Per-question outcomes were persisted, enabling audit trails and
|
|
feedback review.
|
|
\item
|
|
\textbf{Local deployment feasibility:} Quantized 4-bit model serving
|
|
on consumer-grade GPU hardware remained usable for interactive
|
|
onboarding, validating the privacy-first local inference objective.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Quantitative Evaluation}\label{quantitative-evaluation}
|
|
|
|
To strengthen the engineering evaluation beyond qualitative observations,
|
|
representative measurements were collected from controlled development
|
|
runs using role-scoped onboarding prompts and tool-enabled inference
|
|
calls.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\begin{tabularx}{\linewidth}{>{\raggedright\arraybackslash}p{0.32\linewidth} >{\raggedright\arraybackslash}p{0.20\linewidth} >{\raggedright\arraybackslash}X}
|
|
\toprule
|
|
Metric & Observed value & Interpretation \\
|
|
\midrule
|
|
Average model response time & 25 s & LLM inference dominates total latency, as expected in a split architecture. \\
|
|
Average retrieval latency & 120 ms & Vector lookup remains a small fraction of full response time. \\
|
|
Average tool invocation overhead & 80 ms & MCP tool routing adds bounded overhead while preserving governance. \\
|
|
Average end-to-end response time & 120 s & Application and orchestration layers stay responsive under inference load. \\
|
|
Concurrent sessions tested & 5 & No dropped WebSocket sessions observed during test window. \\
|
|
Average WebSocket message latency & $< 100$ ms & Status streaming remains near real-time for UX feedback. \\
|
|
Observed VRAM usage / decode speed & 8.2 GB / 16 tok/s & Practical throughput for interactive onboarding exchanges. \\
|
|
\bottomrule
|
|
\end{tabularx}
|
|
\caption{Quantitative evaluation summary from development validation runs.}
|
|
\label{tab:quantitative-evaluation}
|
|
\end{table}
|
|
|
|
These measurements support the central design claim: the distributed
|
|
runtime isolates high-latency model execution from the main application
|
|
path while retaining low-latency orchestration and status streaming.
|
|
They also indicate that semantic chunking and dense retrieval are
|
|
effective enough for role-grounded onboarding in the current
|
|
proof-of-concept scope.
|
|
|
|
\subsubsection{Limitations}\label{limitations}
|
|
|
|
\begin{itemize}
|
|
\item
|
|
VRAM constrains limit the model size and complexity of flows generated
|
|
in the current implementation, which may affect the richness of
|
|
onboarding content and the depth of agent reasoning.
|
|
\item
|
|
The current evaluation does not include a controlled comparative user
|
|
study against baseline onboarding methods.
|
|
\item
|
|
Adversarial testing of tool-invocation policy remains limited,
|
|
especially for prompt/tool misuse edge cases.
|
|
\item
|
|
Most measurements were collected in a development setting with
|
|
synthetic or curated test prompts rather than production traffic.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Future Improvements}\label{future-improvements}
|
|
|
|
The next development phase should focus on measurable training outcomes,
|
|
operational hardening, and richer adaptivity:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
\textbf{Quantitative evaluation framework:} Run controlled studies
|
|
comparing Dynavera against document-only and mentor-only baselines,
|
|
with metrics such as time-to-productivity, quiz performance,
|
|
remediation frequency, and learner confidence scores.
|
|
\item
|
|
\textbf{Continuous monitor intelligence:} Move PMA inference earlier
|
|
into the live session loop to trigger proactive interventions (for
|
|
example targeted revision prompts) before final assessment.
|
|
\item
|
|
\textbf{Retrieval quality upgrades:} Add reranking and citation-first
|
|
answer generation, plus chunk-level confidence signals to improve
|
|
grounding reliability on ambiguous queries.
|
|
\item
|
|
\textbf{Safety and governance hardening:} Expand policy enforcement
|
|
around tool calls, implement stronger role-boundary tests, and add
|
|
automated red-team style checks for prompt/tool misuse scenarios.
|
|
\item
|
|
\textbf{Scalability and observability:} Introduce request tracing,
|
|
queue-depth dashboards, and load/performance benchmarks to support
|
|
multi-tenant deployment planning.
|
|
\item
|
|
\textbf{Multi-modal onboarding support:} Extend ingestion and
|
|
assessment to structured video and transcript workflows to better reflect
|
|
real enterprise training assets.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Conclusion}\label{conclusion}
|
|
|
|
Dynavera addresses the onboarding productivity tax with a concrete,
|
|
implemented distributed architecture rather than a conceptual prototype.
|
|
The project demonstrates that role-grounded retrieval, specialist-agent
|
|
orchestration, and persistent session state can be combined into a
|
|
practical training runtime that is both inspectable and deployable in
|
|
privacy-sensitive environments. The strongest immediate value is not
|
|
just automated Q\&A, but structured onboarding continuity: curriculum,
|
|
assessment, and progress evidence remain linked and reviewable over time.
|
|
|
|
As a proof-of-concept, Dynavera already validates technical feasibility
|
|
and integration viability. Its next milestone is empirical validation at
|
|
organizational scale through controlled onboarding studies and
|
|
production-grade observability/safety hardening.
|
|
|
|
\section{References}\label{references}
|
|
\bibliographystyle{unsrtnat}
|
|
\bibliography{references}
|
|
|
|
\end{document}
|
|
|