# Dynavera Benchmark Results **Date:** 2026-03-24 13:29:55 **Inference endpoint:** `http://fyp-inference-dev:8001` **Repetitions per benchmark:** 10 ## 1. GPU Server Health | Field | Value | |---|---| | Status | OK | | LLM Ready | True | | Embed Ready | True | | Health check RTT | 44.5 ms | ## 2. Embedding Latency | Query type | Chars | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) | |---|---|---|---|---|---|---| | short | 19 | 25.0 | 25.3 | 31.9 | 20.8 | 31.9 | | medium | 172 | 24.0 | 22.8 | 31.8 | 21.0 | 31.8 | | long | 428 | 29.8 | 27.5 | 37.7 | 25.0 | 37.7 | ## 3. Semantic Chunking Latency | Input size | Chars | Chunks produced | Latency (ms) | |---|---|---|---| | small (~200 c) | 200 | 1 | 26.7 | | medium (~2k c) | 1810 | 1 | 62.7 | | large (~8k c) | 7740 | 1 | 204.0 | ## 4. LLM Inference Latency | Prompt type | Elapsed (s) | Prompt tokens | Completion tokens | Tok/s | |---|---|---|---|---| | short_qa | 1.26 | 55 | 69 | 54.9 | | progress_summary | 1.24 | 74 | 68 | 54.9 | | curriculum_gen | 1.4 | 79 | 76 | 54.4 | | assessment_gen | 4.75 | 83 | 249 | 52.4 | | knowledge_explanation | 10.34 | 83 | 541 | 52.3 | > **Note on end-to-end session time:** A full onboarding session invokes multiple sequential > inference calls (curriculum generation → knowledge explanation × N modules → assessment generation → progress summary). > Total wall-clock time accumulates across all turns plus retrieval and tool-call overhead. ## 5. Database Statistics | Entity | Count | |---|---| | Organizations | 3 | | Roles | 10 | | Users | 12 | | Training Files (total) | 1 | | Training Files (embedded) | 0 | | Knowledge Chunks (with embeddings) | 8 | | Onboarding Sessions | 4 | ## 6. pgvector Retrieval Latency **Role:** fNIRS Specialist **Organisation:** University of Birmingham **Query:** "What are the key responsibilities, tools, and procedures for this role?" **Total chunks in DB:** 8 | Top-K | Results returned | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) | |---|---|---|---|---|---|---| | top_5 | 5 | 2.3 | 2.0 | 5.0 | 1.9 | 5.0 | | top_10 | 8 | 2.4 | 2.4 | 3.1 | 2.3 | 3.1 | | top_20 | 8 | 2.3 | 2.3 | 2.6 | 2.2 | 2.6 | ## Raw JSON ```json { "health": { "status": "OK", "llm_ready": true, "embed_ready": true, "latency_ms": 44.5 }, "embeddings": { "short": { "query_chars": 19, "mean_ms": 25.0, "median_ms": 25.3, "p95_ms": 31.9, "min_ms": 20.8, "max_ms": 31.9 }, "medium": { "query_chars": 172, "mean_ms": 24.0, "median_ms": 22.8, "p95_ms": 31.8, "min_ms": 21.0, "max_ms": 31.8 }, "long": { "query_chars": 428, "mean_ms": 29.8, "median_ms": 27.5, "p95_ms": 37.7, "min_ms": 25.0, "max_ms": 37.7 } }, "chunking": { "small (~200 c)": { "chars": 200, "chunks_produced": 1, "latency_ms": 26.7 }, "medium (~2k c)": { "chars": 1810, "chunks_produced": 1, "latency_ms": 62.7 }, "large (~8k c)": { "chars": 7740, "chunks_produced": 1, "latency_ms": 204.0 } }, "llm": { "short_qa": { "elapsed_s": 1.26, "prompt_tokens": 55, "completion_tokens": 69, "tokens_per_sec": 54.9, "response_preview": "A Kubernetes pod is the basic execution unit of a containerized application, and it represents a log" }, "progress_summary": { "elapsed_s": 1.24, "prompt_tokens": 74, "completion_tokens": 68, "tokens_per_sec": 54.9, "response_preview": "The trainee has demonstrated a strong foundation in the fundamentals of version control with Git, as" }, "curriculum_gen": { "elapsed_s": 1.4, "prompt_tokens": 79, "completion_tokens": 76, "tokens_per_sec": 54.4, "response_preview": "[ \"Module 1: Introduction to Backend Services\", \"Module 2: Fundamentals of API Design\", \"Modul" }, "assessment_gen": { "elapsed_s": 4.75, "prompt_tokens": 83, "completion_tokens": 249, "tokens_per_sec": 52.4, "response_preview": "[ { \"question\": \"What is the primary purpose of a Continuous Integration (CI) pipeline?\", " }, "knowledge_explanation": { "elapsed_s": 10.34, "prompt_tokens": 83, "completion_tokens": 541, "tokens_per_sec": 52.3, "response_preview": "**Git Branching Strategy Best Practices** As a new engineer, understanding Git branching strategies" } }, "database": { "organizations": 3, "roles": 10, "users": 12, "training_files_total": 1, "training_files_embedded": 0, "knowledge_chunks_with_embeddings": 8, "onboarding_sessions": 4 }, "retrieval": { "role": "fNIRS Specialist", "organization": "University of Birmingham", "query": "What are the key responsibilities, tools, and procedures for this role?", "total_chunks_in_db": 8, "results": { "top_5": { "results_returned": 5, "mean_ms": 2.3, "median_ms": 2.0, "p95_ms": 5.0, "min_ms": 1.9, "max_ms": 5.0 }, "top_10": { "results_returned": 8, "mean_ms": 2.4, "median_ms": 2.4, "p95_ms": 3.1, "min_ms": 2.3, "max_ms": 3.1 }, "top_20": { "results_returned": 8, "mean_ms": 2.3, "median_ms": 2.3, "p95_ms": 2.6, "min_ms": 2.2, "max_ms": 2.6 } } } } ```