betelgeusebytes/k8s/observability-stack/me.md

# 🧠 BetelgeuseBytes — Full AI Infrastructure Stack
## Authoritative README, Architecture & Onboarding Guide

This repository documents the **entire self-hosted AI infrastructure stack** running on a Kubernetes cluster hosted on **Hetzner dedicated servers**.

The stack currently powers an **Islamic Hadith Scholar AI**, but it is intentionally designed as a **general-purpose, sovereign AI, MLOps, and data platform** that can support many future projects.

This document is the **single source of truth** for:
- architecture (logical & physical)
- infrastructure configuration
- networking & DNS
- every deployed component
- why each component exists
- how to build new systems on top of the platform

---

## 1. Mission & Design Philosophy

### Current Mission
Build an AI system that can:

- Parse classical Islamic texts
- Extract **Sanad** (chains of narrators) and **Matn** (hadith text)
- Identify narrators and their relationships:
  - teacher / student
  - familial lineage
- Construct a **verifiable knowledge graph**
- Support **human scholarly review**
- Provide **transparent and explainable reasoning**
- Operate **fully on-prem**, CPU-first, without SaaS or GPU dependency

### Core Principles
- **Sovereignty** — no external cloud lock-in
- **Explainability** — graph + provenance, not black boxes
- **Human-in-the-loop** — scholars remain in control
- **Observability-first** — everything is measurable and traceable
- **Composable** — every part can be reused or replaced

---

## 2. Physical Infrastructure (Hetzner)

### Nodes
- **Provider:** Hetzner
- **Type:** Dedicated servers
- **Architecture:** x86_64
- **GPU:** None (CPU-only by design)
- **Storage:** Local NVMe / SSD

### Node Roles (Logical)
| Node Type | Responsibilities |
|---------|------------------|
| Control / Worker | Kubernetes control plane + workloads |
| Storage-heavy | Databases, MinIO, observability data |
| Compute-heavy | LLM inference, embeddings, pipelines |

> The cluster is intentionally **single-region and on-prem-like**, optimized for predictability and data locality.

---

## 3. Kubernetes Infrastructure Configuration

### Kubernetes
- Runtime for **all services**
- Namespaced isolation
- Explicit PersistentVolumeClaims
- Declarative configuration (GitOps)

### Namespaces (Conceptual)
| Namespace | Purpose |
|--------|--------|
| `ai` | LLMs, embeddings, labeling |
| `vec` | Vector database |
| `graph` | Knowledge graph |
| `db` | Relational databases |
| `storage` | Object storage |
| `mlops` | MLflow |
| `ml` | Argo Workflows |
| `auth` | Keycloak |
| `observability` | LGTM stack |
| `hadith` | Custom apps (orchestrator, UI) |

---

## 4. Networking & DNS

### Ingress
- **NGINX Ingress Controller**
- HTTPS termination at ingress
- Internal services communicate via ClusterIP

### TLS
- **cert-manager**
- Let’s Encrypt
- Automatic renewal

### Public Endpoints

| URL | Service |
|----|--------|
| https://llm.betelgeusebytes.io | LLM inference (Ollama / llama.cpp) |
| https://embeddings.betelgeusebytes.io | Text Embeddings Inference |
| https://vector.betelgeusebytes.io | Qdrant + UI |
| https://neo4j.betelgeusebytes.io | Neo4j Browser |
| https://hadith-api.betelgeusebytes.io | FastAPI Orchestrator |
| https://hadith-admin.betelgeusebytes.io | Admin / Curation UI |
| https://label.betelgeusebytes.io | Label Studio |
| https://mlflow.betelgeusebytes.io | MLflow |
| https://minio.betelgeusebytes.io | MinIO Console |
| https://argo.betelgeusebytes.io | Argo Workflows |
| https://auth.betelgeusebytes.io | Keycloak |
| https://grafana.betelgeusebytes.io | Grafana |

---

## 5. Full Logical Architecture

```mermaid
flowchart LR
  User --> AdminUI --> Orchestrator

  Orchestrator --> LLM
  Orchestrator --> TEI --> Qdrant
  Orchestrator --> Neo4j
  Orchestrator --> PostgreSQL
  Orchestrator --> Redis

  LabelStudio --> MinIO
  MinIO --> ArgoWF --> MLflow
  MLflow --> Models --> Orchestrator

  Kafka --> ArgoWF

  Alloy --> Prometheus --> Grafana
  Alloy --> Loki --> Grafana
  Alloy --> Tempo --> Grafana
```
6. AI & Reasoning Layer
Ollama / llama.cpp (CPU LLM)
Current usage

JSON-structured extraction

Sanad / matn reasoning

Deterministic outputs

No GPU dependency

Future usage

Offline assistants

Document intelligence

Agent frameworks

Replaceable by vLLM when GPUs are added

Text Embeddings Inference (TEI)
Current usage

Embeddings for hadith texts and biographies

Future usage

RAG systems

Semantic search

Deduplication

Similarity clustering

Qdrant (Vector Database)
Current usage

Stores embeddings

Similarity search

Future usage

Recommendation systems

Agent memory

Multimodal retrieval

Includes Web UI.

7. Knowledge & Data Layer
Neo4j (Graph Database)
Current usage

Isnād chains

Narrator relationships

Future usage

Knowledge graphs

Trust networks

Provenance systems

PostgreSQL
Current usage

App data

MLflow backend

Label Studio DB

Future usage

Feature stores

Metadata catalogs

Transactional apps

Redis
Current usage

Caching

Temporary state

Future usage

Job queues

Rate limiting

Sessions

Kafka
Current usage

Optional async backbone

Future usage

Streaming ingestion

Event-driven ML

Audit pipelines

MinIO (S3)
Current usage

Datasets

Model artifacts

Pipeline outputs

Future usage

Data lake

Backups

Feature storage

8. MLOps & Human-in-the-Loop
Label Studio
Current usage

Human annotation of narrators & relations

Future usage

Any labeling task (text, image, audio)

MLflow
Current usage

Experiment tracking

Model registry

Future usage

Governance

Model promotion

Auditing

Argo Workflows
Current usage

ETL & training pipelines

Future usage

Batch inference

Scheduled automation

Data engineering

9. Authentication & Security
Keycloak
Current usage

SSO for Admin UI, MLflow, Label Studio

Future usage

API authentication

Multi-tenant access

Organization-wide IAM

10. Observability Stack (LGTM)
Components
Grafana

Prometheus

Loki

Tempo

Grafana Alloy

kube-state-metrics

node-exporter

Capabilities
Metrics, logs, traces

Automatic correlation

OTLP-native

Local SSD persistence

11. Design Rules for All Custom Services
All services must:

be stateless

use env vars & Kubernetes Secrets

authenticate via Keycloak

emit:

Prometheus metrics

OTLP traces

structured JSON logs

be deployable via kubectl & Argo CD

12. Future Use Cases (Beyond Hadith)
This platform can support:

General Knowledge Graph AI

Legal / scholarly document analysis

Enterprise RAG systems

Research data platforms

Explainable AI systems

Internal search engines

Agent-based systems

Provenance & trust scoring engines

Digital humanities projects

Offline sovereign AI deployments