Dynavera/benchmarks/results_2026-03-24_13-29-55.md
2026-03-24 17:05:46 +00:00

5.4 KiB
Raw Blame History

Dynavera Benchmark Results

Date: 2026-03-24 13:29:55
Inference endpoint: http://fyp-inference-dev:8001
Repetitions per benchmark: 10

1. GPU Server Health

Field Value
Status OK
LLM Ready True
Embed Ready True
Health check RTT 44.5 ms

2. Embedding Latency

Query type Chars Mean (ms) Median (ms) P95 (ms) Min (ms) Max (ms)
short 19 25.0 25.3 31.9 20.8 31.9
medium 172 24.0 22.8 31.8 21.0 31.8
long 428 29.8 27.5 37.7 25.0 37.7

3. Semantic Chunking Latency

Input size Chars Chunks produced Latency (ms)
small (~200 c) 200 1 26.7
medium (~2k c) 1810 1 62.7
large (~8k c) 7740 1 204.0

4. LLM Inference Latency

Prompt type Elapsed (s) Prompt tokens Completion tokens Tok/s
short_qa 1.26 55 69 54.9
progress_summary 1.24 74 68 54.9
curriculum_gen 1.4 79 76 54.4
assessment_gen 4.75 83 249 52.4
knowledge_explanation 10.34 83 541 52.3

Note on end-to-end session time: A full onboarding session invokes multiple sequential inference calls (curriculum generation → knowledge explanation × N modules → assessment generation → progress summary). Total wall-clock time accumulates across all turns plus retrieval and tool-call overhead.

5. Database Statistics

Entity Count
Organizations 3
Roles 10
Users 12
Training Files (total) 1
Training Files (embedded) 0
Knowledge Chunks (with embeddings) 8
Onboarding Sessions 4

6. pgvector Retrieval Latency

Role: fNIRS Specialist
Organisation: University of Birmingham
Query: "What are the key responsibilities, tools, and procedures for this role?"
Total chunks in DB: 8

Top-K Results returned Mean (ms) Median (ms) P95 (ms) Min (ms) Max (ms)
top_5 5 2.3 2.0 5.0 1.9 5.0
top_10 8 2.4 2.4 3.1 2.3 3.1
top_20 8 2.3 2.3 2.6 2.2 2.6

Raw JSON

{
  "health": {
    "status": "OK",
    "llm_ready": true,
    "embed_ready": true,
    "latency_ms": 44.5
  },
  "embeddings": {
    "short": {
      "query_chars": 19,
      "mean_ms": 25.0,
      "median_ms": 25.3,
      "p95_ms": 31.9,
      "min_ms": 20.8,
      "max_ms": 31.9
    },
    "medium": {
      "query_chars": 172,
      "mean_ms": 24.0,
      "median_ms": 22.8,
      "p95_ms": 31.8,
      "min_ms": 21.0,
      "max_ms": 31.8
    },
    "long": {
      "query_chars": 428,
      "mean_ms": 29.8,
      "median_ms": 27.5,
      "p95_ms": 37.7,
      "min_ms": 25.0,
      "max_ms": 37.7
    }
  },
  "chunking": {
    "small  (~200 c)": {
      "chars": 200,
      "chunks_produced": 1,
      "latency_ms": 26.7
    },
    "medium (~2k c)": {
      "chars": 1810,
      "chunks_produced": 1,
      "latency_ms": 62.7
    },
    "large  (~8k c)": {
      "chars": 7740,
      "chunks_produced": 1,
      "latency_ms": 204.0
    }
  },
  "llm": {
    "short_qa": {
      "elapsed_s": 1.26,
      "prompt_tokens": 55,
      "completion_tokens": 69,
      "tokens_per_sec": 54.9,
      "response_preview": "A Kubernetes pod is the basic execution unit of a containerized application, and it represents a log"
    },
    "progress_summary": {
      "elapsed_s": 1.24,
      "prompt_tokens": 74,
      "completion_tokens": 68,
      "tokens_per_sec": 54.9,
      "response_preview": "The trainee has demonstrated a strong foundation in the fundamentals of version control with Git, as"
    },
    "curriculum_gen": {
      "elapsed_s": 1.4,
      "prompt_tokens": 79,
      "completion_tokens": 76,
      "tokens_per_sec": 54.4,
      "response_preview": "[   \"Module 1: Introduction to Backend Services\",   \"Module 2: Fundamentals of API Design\",   \"Modul"
    },
    "assessment_gen": {
      "elapsed_s": 4.75,
      "prompt_tokens": 83,
      "completion_tokens": 249,
      "tokens_per_sec": 52.4,
      "response_preview": "[   {     \"question\": \"What is the primary purpose of a Continuous Integration (CI) pipeline?\",     "
    },
    "knowledge_explanation": {
      "elapsed_s": 10.34,
      "prompt_tokens": 83,
      "completion_tokens": 541,
      "tokens_per_sec": 52.3,
      "response_preview": "**Git Branching Strategy Best Practices**  As a new engineer, understanding Git branching strategies"
    }
  },
  "database": {
    "organizations": 3,
    "roles": 10,
    "users": 12,
    "training_files_total": 1,
    "training_files_embedded": 0,
    "knowledge_chunks_with_embeddings": 8,
    "onboarding_sessions": 4
  },
  "retrieval": {
    "role": "fNIRS Specialist",
    "organization": "University of Birmingham",
    "query": "What are the key responsibilities, tools, and procedures for this role?",
    "total_chunks_in_db": 8,
    "results": {
      "top_5": {
        "results_returned": 5,
        "mean_ms": 2.3,
        "median_ms": 2.0,
        "p95_ms": 5.0,
        "min_ms": 1.9,
        "max_ms": 5.0
      },
      "top_10": {
        "results_returned": 8,
        "mean_ms": 2.4,
        "median_ms": 2.4,
        "p95_ms": 3.1,
        "min_ms": 2.3,
        "max_ms": 3.1
      },
      "top_20": {
        "results_returned": 8,
        "mean_ms": 2.3,
        "median_ms": 2.3,
        "p95_ms": 2.6,
        "min_ms": 2.2,
        "max_ms": 2.6
      }
    }
  }
}