203 lines
5.4 KiB
Markdown
203 lines
5.4 KiB
Markdown
# Dynavera Benchmark Results
|
||
|
||
**Date:** 2026-03-24 13:29:55
|
||
**Inference endpoint:** `http://fyp-inference-dev:8001`
|
||
**Repetitions per benchmark:** 10
|
||
|
||
## 1. GPU Server Health
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Status | OK |
|
||
| LLM Ready | True |
|
||
| Embed Ready | True |
|
||
| Health check RTT | 44.5 ms |
|
||
|
||
## 2. Embedding Latency
|
||
|
||
| Query type | Chars | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) |
|
||
|---|---|---|---|---|---|---|
|
||
| short | 19 | 25.0 | 25.3 | 31.9 | 20.8 | 31.9 |
|
||
| medium | 172 | 24.0 | 22.8 | 31.8 | 21.0 | 31.8 |
|
||
| long | 428 | 29.8 | 27.5 | 37.7 | 25.0 | 37.7 |
|
||
|
||
## 3. Semantic Chunking Latency
|
||
|
||
| Input size | Chars | Chunks produced | Latency (ms) |
|
||
|---|---|---|---|
|
||
| small (~200 c) | 200 | 1 | 26.7 |
|
||
| medium (~2k c) | 1810 | 1 | 62.7 |
|
||
| large (~8k c) | 7740 | 1 | 204.0 |
|
||
|
||
## 4. LLM Inference Latency
|
||
|
||
| Prompt type | Elapsed (s) | Prompt tokens | Completion tokens | Tok/s |
|
||
|---|---|---|---|---|
|
||
| short_qa | 1.26 | 55 | 69 | 54.9 |
|
||
| progress_summary | 1.24 | 74 | 68 | 54.9 |
|
||
| curriculum_gen | 1.4 | 79 | 76 | 54.4 |
|
||
| assessment_gen | 4.75 | 83 | 249 | 52.4 |
|
||
| knowledge_explanation | 10.34 | 83 | 541 | 52.3 |
|
||
|
||
> **Note on end-to-end session time:** A full onboarding session invokes multiple sequential
|
||
> inference calls (curriculum generation → knowledge explanation × N modules → assessment generation → progress summary).
|
||
> Total wall-clock time accumulates across all turns plus retrieval and tool-call overhead.
|
||
|
||
## 5. Database Statistics
|
||
|
||
| Entity | Count |
|
||
|---|---|
|
||
| Organizations | 3 |
|
||
| Roles | 10 |
|
||
| Users | 12 |
|
||
| Training Files (total) | 1 |
|
||
| Training Files (embedded) | 0 |
|
||
| Knowledge Chunks (with embeddings) | 8 |
|
||
| Onboarding Sessions | 4 |
|
||
|
||
## 6. pgvector Retrieval Latency
|
||
|
||
**Role:** fNIRS Specialist
|
||
**Organisation:** University of Birmingham
|
||
**Query:** "What are the key responsibilities, tools, and procedures for this role?"
|
||
**Total chunks in DB:** 8
|
||
|
||
| Top-K | Results returned | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) |
|
||
|---|---|---|---|---|---|---|
|
||
| top_5 | 5 | 2.3 | 2.0 | 5.0 | 1.9 | 5.0 |
|
||
| top_10 | 8 | 2.4 | 2.4 | 3.1 | 2.3 | 3.1 |
|
||
| top_20 | 8 | 2.3 | 2.3 | 2.6 | 2.2 | 2.6 |
|
||
|
||
## Raw JSON
|
||
|
||
```json
|
||
{
|
||
"health": {
|
||
"status": "OK",
|
||
"llm_ready": true,
|
||
"embed_ready": true,
|
||
"latency_ms": 44.5
|
||
},
|
||
"embeddings": {
|
||
"short": {
|
||
"query_chars": 19,
|
||
"mean_ms": 25.0,
|
||
"median_ms": 25.3,
|
||
"p95_ms": 31.9,
|
||
"min_ms": 20.8,
|
||
"max_ms": 31.9
|
||
},
|
||
"medium": {
|
||
"query_chars": 172,
|
||
"mean_ms": 24.0,
|
||
"median_ms": 22.8,
|
||
"p95_ms": 31.8,
|
||
"min_ms": 21.0,
|
||
"max_ms": 31.8
|
||
},
|
||
"long": {
|
||
"query_chars": 428,
|
||
"mean_ms": 29.8,
|
||
"median_ms": 27.5,
|
||
"p95_ms": 37.7,
|
||
"min_ms": 25.0,
|
||
"max_ms": 37.7
|
||
}
|
||
},
|
||
"chunking": {
|
||
"small (~200 c)": {
|
||
"chars": 200,
|
||
"chunks_produced": 1,
|
||
"latency_ms": 26.7
|
||
},
|
||
"medium (~2k c)": {
|
||
"chars": 1810,
|
||
"chunks_produced": 1,
|
||
"latency_ms": 62.7
|
||
},
|
||
"large (~8k c)": {
|
||
"chars": 7740,
|
||
"chunks_produced": 1,
|
||
"latency_ms": 204.0
|
||
}
|
||
},
|
||
"llm": {
|
||
"short_qa": {
|
||
"elapsed_s": 1.26,
|
||
"prompt_tokens": 55,
|
||
"completion_tokens": 69,
|
||
"tokens_per_sec": 54.9,
|
||
"response_preview": "A Kubernetes pod is the basic execution unit of a containerized application, and it represents a log"
|
||
},
|
||
"progress_summary": {
|
||
"elapsed_s": 1.24,
|
||
"prompt_tokens": 74,
|
||
"completion_tokens": 68,
|
||
"tokens_per_sec": 54.9,
|
||
"response_preview": "The trainee has demonstrated a strong foundation in the fundamentals of version control with Git, as"
|
||
},
|
||
"curriculum_gen": {
|
||
"elapsed_s": 1.4,
|
||
"prompt_tokens": 79,
|
||
"completion_tokens": 76,
|
||
"tokens_per_sec": 54.4,
|
||
"response_preview": "[ \"Module 1: Introduction to Backend Services\", \"Module 2: Fundamentals of API Design\", \"Modul"
|
||
},
|
||
"assessment_gen": {
|
||
"elapsed_s": 4.75,
|
||
"prompt_tokens": 83,
|
||
"completion_tokens": 249,
|
||
"tokens_per_sec": 52.4,
|
||
"response_preview": "[ { \"question\": \"What is the primary purpose of a Continuous Integration (CI) pipeline?\", "
|
||
},
|
||
"knowledge_explanation": {
|
||
"elapsed_s": 10.34,
|
||
"prompt_tokens": 83,
|
||
"completion_tokens": 541,
|
||
"tokens_per_sec": 52.3,
|
||
"response_preview": "**Git Branching Strategy Best Practices** As a new engineer, understanding Git branching strategies"
|
||
}
|
||
},
|
||
"database": {
|
||
"organizations": 3,
|
||
"roles": 10,
|
||
"users": 12,
|
||
"training_files_total": 1,
|
||
"training_files_embedded": 0,
|
||
"knowledge_chunks_with_embeddings": 8,
|
||
"onboarding_sessions": 4
|
||
},
|
||
"retrieval": {
|
||
"role": "fNIRS Specialist",
|
||
"organization": "University of Birmingham",
|
||
"query": "What are the key responsibilities, tools, and procedures for this role?",
|
||
"total_chunks_in_db": 8,
|
||
"results": {
|
||
"top_5": {
|
||
"results_returned": 5,
|
||
"mean_ms": 2.3,
|
||
"median_ms": 2.0,
|
||
"p95_ms": 5.0,
|
||
"min_ms": 1.9,
|
||
"max_ms": 5.0
|
||
},
|
||
"top_10": {
|
||
"results_returned": 8,
|
||
"mean_ms": 2.4,
|
||
"median_ms": 2.4,
|
||
"p95_ms": 3.1,
|
||
"min_ms": 2.3,
|
||
"max_ms": 3.1
|
||
},
|
||
"top_20": {
|
||
"results_returned": 8,
|
||
"mean_ms": 2.3,
|
||
"median_ms": 2.3,
|
||
"p95_ms": 2.6,
|
||
"min_ms": 2.2,
|
||
"max_ms": 2.6
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|