betelgeusebytes/k8s/observability-stack/me.md

6.5 KiB
Raw Blame History

🧠 BetelgeuseBytes — Full AI Infrastructure Stack

Authoritative README, Architecture & Onboarding Guide

This repository documents the entire self-hosted AI infrastructure stack running on a Kubernetes cluster hosted on Hetzner dedicated servers.

The stack currently powers an Islamic Hadith Scholar AI, but it is intentionally designed as a general-purpose, sovereign AI, MLOps, and data platform that can support many future projects.

This document is the single source of truth for:

  • architecture (logical & physical)
  • infrastructure configuration
  • networking & DNS
  • every deployed component
  • why each component exists
  • how to build new systems on top of the platform

1. Mission & Design Philosophy

Current Mission

Build an AI system that can:

  • Parse classical Islamic texts
  • Extract Sanad (chains of narrators) and Matn (hadith text)
  • Identify narrators and their relationships:
    • teacher / student
    • familial lineage
  • Construct a verifiable knowledge graph
  • Support human scholarly review
  • Provide transparent and explainable reasoning
  • Operate fully on-prem, CPU-first, without SaaS or GPU dependency

Core Principles

  • Sovereignty — no external cloud lock-in
  • Explainability — graph + provenance, not black boxes
  • Human-in-the-loop — scholars remain in control
  • Observability-first — everything is measurable and traceable
  • Composable — every part can be reused or replaced

2. Physical Infrastructure (Hetzner)

Nodes

  • Provider: Hetzner
  • Type: Dedicated servers
  • Architecture: x86_64
  • GPU: None (CPU-only by design)
  • Storage: Local NVMe / SSD

Node Roles (Logical)

Node Type Responsibilities
Control / Worker Kubernetes control plane + workloads
Storage-heavy Databases, MinIO, observability data
Compute-heavy LLM inference, embeddings, pipelines

The cluster is intentionally single-region and on-prem-like, optimized for predictability and data locality.


3. Kubernetes Infrastructure Configuration

Kubernetes

  • Runtime for all services
  • Namespaced isolation
  • Explicit PersistentVolumeClaims
  • Declarative configuration (GitOps)

Namespaces (Conceptual)

Namespace Purpose
ai LLMs, embeddings, labeling
vec Vector database
graph Knowledge graph
db Relational databases
storage Object storage
mlops MLflow
ml Argo Workflows
auth Keycloak
observability LGTM stack
hadith Custom apps (orchestrator, UI)

4. Networking & DNS

Ingress

  • NGINX Ingress Controller
  • HTTPS termination at ingress
  • Internal services communicate via ClusterIP

TLS

  • cert-manager
  • Lets Encrypt
  • Automatic renewal

Public Endpoints

URL Service
https://llm.betelgeusebytes.io LLM inference (Ollama / llama.cpp)
https://embeddings.betelgeusebytes.io Text Embeddings Inference
https://vector.betelgeusebytes.io Qdrant + UI
https://neo4j.betelgeusebytes.io Neo4j Browser
https://hadith-api.betelgeusebytes.io FastAPI Orchestrator
https://hadith-admin.betelgeusebytes.io Admin / Curation UI
https://label.betelgeusebytes.io Label Studio
https://mlflow.betelgeusebytes.io MLflow
https://minio.betelgeusebytes.io MinIO Console
https://argo.betelgeusebytes.io Argo Workflows
https://auth.betelgeusebytes.io Keycloak
https://grafana.betelgeusebytes.io Grafana

5. Full Logical Architecture

flowchart LR
  User --> AdminUI --> Orchestrator

  Orchestrator --> LLM
  Orchestrator --> TEI --> Qdrant
  Orchestrator --> Neo4j
  Orchestrator --> PostgreSQL
  Orchestrator --> Redis

  LabelStudio --> MinIO
  MinIO --> ArgoWF --> MLflow
  MLflow --> Models --> Orchestrator

  Kafka --> ArgoWF

  Alloy --> Prometheus --> Grafana
  Alloy --> Loki --> Grafana
  Alloy --> Tempo --> Grafana
  1. AI & Reasoning Layer Ollama / llama.cpp (CPU LLM) Current usage

JSON-structured extraction

Sanad / matn reasoning

Deterministic outputs

No GPU dependency

Future usage

Offline assistants

Document intelligence

Agent frameworks

Replaceable by vLLM when GPUs are added

Text Embeddings Inference (TEI) Current usage

Embeddings for hadith texts and biographies

Future usage

RAG systems

Semantic search

Deduplication

Similarity clustering

Qdrant (Vector Database) Current usage

Stores embeddings

Similarity search

Future usage

Recommendation systems

Agent memory

Multimodal retrieval

Includes Web UI.

  1. Knowledge & Data Layer Neo4j (Graph Database) Current usage

Isnād chains

Narrator relationships

Future usage

Knowledge graphs

Trust networks

Provenance systems

PostgreSQL Current usage

App data

MLflow backend

Label Studio DB

Future usage

Feature stores

Metadata catalogs

Transactional apps

Redis Current usage

Caching

Temporary state

Future usage

Job queues

Rate limiting

Sessions

Kafka Current usage

Optional async backbone

Future usage

Streaming ingestion

Event-driven ML

Audit pipelines

MinIO (S3) Current usage

Datasets

Model artifacts

Pipeline outputs

Future usage

Data lake

Backups

Feature storage

  1. MLOps & Human-in-the-Loop Label Studio Current usage

Human annotation of narrators & relations

Future usage

Any labeling task (text, image, audio)

MLflow Current usage

Experiment tracking

Model registry

Future usage

Governance

Model promotion

Auditing

Argo Workflows Current usage

ETL & training pipelines

Future usage

Batch inference

Scheduled automation

Data engineering

  1. Authentication & Security Keycloak Current usage

SSO for Admin UI, MLflow, Label Studio

Future usage

API authentication

Multi-tenant access

Organization-wide IAM

  1. Observability Stack (LGTM) Components Grafana

Prometheus

Loki

Tempo

Grafana Alloy

kube-state-metrics

node-exporter

Capabilities Metrics, logs, traces

Automatic correlation

OTLP-native

Local SSD persistence

  1. Design Rules for All Custom Services All services must:

be stateless

use env vars & Kubernetes Secrets

authenticate via Keycloak

emit:

Prometheus metrics

OTLP traces

structured JSON logs

be deployable via kubectl & Argo CD

  1. Future Use Cases (Beyond Hadith) This platform can support:

General Knowledge Graph AI

Legal / scholarly document analysis

Enterprise RAG systems

Research data platforms

Explainable AI systems

Internal search engines

Agent-based systems

Provenance & trust scoring engines

Digital humanities projects

Offline sovereign AI deployments