Dynavera/benchmarks/results_2026-03-24_13-29-55.md
2026-03-24 17:05:46 +00:00

203 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dynavera Benchmark Results
**Date:** 2026-03-24 13:29:55
**Inference endpoint:** `http://fyp-inference-dev:8001`
**Repetitions per benchmark:** 10
## 1. GPU Server Health
| Field | Value |
|---|---|
| Status | OK |
| LLM Ready | True |
| Embed Ready | True |
| Health check RTT | 44.5 ms |
## 2. Embedding Latency
| Query type | Chars | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) |
|---|---|---|---|---|---|---|
| short | 19 | 25.0 | 25.3 | 31.9 | 20.8 | 31.9 |
| medium | 172 | 24.0 | 22.8 | 31.8 | 21.0 | 31.8 |
| long | 428 | 29.8 | 27.5 | 37.7 | 25.0 | 37.7 |
## 3. Semantic Chunking Latency
| Input size | Chars | Chunks produced | Latency (ms) |
|---|---|---|---|
| small (~200 c) | 200 | 1 | 26.7 |
| medium (~2k c) | 1810 | 1 | 62.7 |
| large (~8k c) | 7740 | 1 | 204.0 |
## 4. LLM Inference Latency
| Prompt type | Elapsed (s) | Prompt tokens | Completion tokens | Tok/s |
|---|---|---|---|---|
| short_qa | 1.26 | 55 | 69 | 54.9 |
| progress_summary | 1.24 | 74 | 68 | 54.9 |
| curriculum_gen | 1.4 | 79 | 76 | 54.4 |
| assessment_gen | 4.75 | 83 | 249 | 52.4 |
| knowledge_explanation | 10.34 | 83 | 541 | 52.3 |
> **Note on end-to-end session time:** A full onboarding session invokes multiple sequential
> inference calls (curriculum generation → knowledge explanation × N modules → assessment generation → progress summary).
> Total wall-clock time accumulates across all turns plus retrieval and tool-call overhead.
## 5. Database Statistics
| Entity | Count |
|---|---|
| Organizations | 3 |
| Roles | 10 |
| Users | 12 |
| Training Files (total) | 1 |
| Training Files (embedded) | 0 |
| Knowledge Chunks (with embeddings) | 8 |
| Onboarding Sessions | 4 |
## 6. pgvector Retrieval Latency
**Role:** fNIRS Specialist
**Organisation:** University of Birmingham
**Query:** "What are the key responsibilities, tools, and procedures for this role?"
**Total chunks in DB:** 8
| Top-K | Results returned | Mean (ms) | Median (ms) | P95 (ms) | Min (ms) | Max (ms) |
|---|---|---|---|---|---|---|
| top_5 | 5 | 2.3 | 2.0 | 5.0 | 1.9 | 5.0 |
| top_10 | 8 | 2.4 | 2.4 | 3.1 | 2.3 | 3.1 |
| top_20 | 8 | 2.3 | 2.3 | 2.6 | 2.2 | 2.6 |
## Raw JSON
```json
{
"health": {
"status": "OK",
"llm_ready": true,
"embed_ready": true,
"latency_ms": 44.5
},
"embeddings": {
"short": {
"query_chars": 19,
"mean_ms": 25.0,
"median_ms": 25.3,
"p95_ms": 31.9,
"min_ms": 20.8,
"max_ms": 31.9
},
"medium": {
"query_chars": 172,
"mean_ms": 24.0,
"median_ms": 22.8,
"p95_ms": 31.8,
"min_ms": 21.0,
"max_ms": 31.8
},
"long": {
"query_chars": 428,
"mean_ms": 29.8,
"median_ms": 27.5,
"p95_ms": 37.7,
"min_ms": 25.0,
"max_ms": 37.7
}
},
"chunking": {
"small (~200 c)": {
"chars": 200,
"chunks_produced": 1,
"latency_ms": 26.7
},
"medium (~2k c)": {
"chars": 1810,
"chunks_produced": 1,
"latency_ms": 62.7
},
"large (~8k c)": {
"chars": 7740,
"chunks_produced": 1,
"latency_ms": 204.0
}
},
"llm": {
"short_qa": {
"elapsed_s": 1.26,
"prompt_tokens": 55,
"completion_tokens": 69,
"tokens_per_sec": 54.9,
"response_preview": "A Kubernetes pod is the basic execution unit of a containerized application, and it represents a log"
},
"progress_summary": {
"elapsed_s": 1.24,
"prompt_tokens": 74,
"completion_tokens": 68,
"tokens_per_sec": 54.9,
"response_preview": "The trainee has demonstrated a strong foundation in the fundamentals of version control with Git, as"
},
"curriculum_gen": {
"elapsed_s": 1.4,
"prompt_tokens": 79,
"completion_tokens": 76,
"tokens_per_sec": 54.4,
"response_preview": "[ \"Module 1: Introduction to Backend Services\", \"Module 2: Fundamentals of API Design\", \"Modul"
},
"assessment_gen": {
"elapsed_s": 4.75,
"prompt_tokens": 83,
"completion_tokens": 249,
"tokens_per_sec": 52.4,
"response_preview": "[ { \"question\": \"What is the primary purpose of a Continuous Integration (CI) pipeline?\", "
},
"knowledge_explanation": {
"elapsed_s": 10.34,
"prompt_tokens": 83,
"completion_tokens": 541,
"tokens_per_sec": 52.3,
"response_preview": "**Git Branching Strategy Best Practices** As a new engineer, understanding Git branching strategies"
}
},
"database": {
"organizations": 3,
"roles": 10,
"users": 12,
"training_files_total": 1,
"training_files_embedded": 0,
"knowledge_chunks_with_embeddings": 8,
"onboarding_sessions": 4
},
"retrieval": {
"role": "fNIRS Specialist",
"organization": "University of Birmingham",
"query": "What are the key responsibilities, tools, and procedures for this role?",
"total_chunks_in_db": 8,
"results": {
"top_5": {
"results_returned": 5,
"mean_ms": 2.3,
"median_ms": 2.0,
"p95_ms": 5.0,
"min_ms": 1.9,
"max_ms": 5.0
},
"top_10": {
"results_returned": 8,
"mean_ms": 2.4,
"median_ms": 2.4,
"p95_ms": 3.1,
"min_ms": 2.3,
"max_ms": 3.1
},
"top_20": {
"results_returned": 8,
"mean_ms": 2.3,
"median_ms": 2.3,
"p95_ms": 2.6,
"min_ms": 2.2,
"max_ms": 2.6
}
}
}
}
```