betelgeusebytes/k8s/observability-stack/me.md

388 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🧠 BetelgeuseBytes — Full AI Infrastructure Stack
## Authoritative README, Architecture & Onboarding Guide
This repository documents the **entire self-hosted AI infrastructure stack** running on a Kubernetes cluster hosted on **Hetzner dedicated servers**.
The stack currently powers an **Islamic Hadith Scholar AI**, but it is intentionally designed as a **general-purpose, sovereign AI, MLOps, and data platform** that can support many future projects.
This document is the **single source of truth** for:
- architecture (logical & physical)
- infrastructure configuration
- networking & DNS
- every deployed component
- why each component exists
- how to build new systems on top of the platform
---
## 1. Mission & Design Philosophy
### Current Mission
Build an AI system that can:
- Parse classical Islamic texts
- Extract **Sanad** (chains of narrators) and **Matn** (hadith text)
- Identify narrators and their relationships:
- teacher / student
- familial lineage
- Construct a **verifiable knowledge graph**
- Support **human scholarly review**
- Provide **transparent and explainable reasoning**
- Operate **fully on-prem**, CPU-first, without SaaS or GPU dependency
### Core Principles
- **Sovereignty** — no external cloud lock-in
- **Explainability** — graph + provenance, not black boxes
- **Human-in-the-loop** — scholars remain in control
- **Observability-first** — everything is measurable and traceable
- **Composable** — every part can be reused or replaced
---
## 2. Physical Infrastructure (Hetzner)
### Nodes
- **Provider:** Hetzner
- **Type:** Dedicated servers
- **Architecture:** x86_64
- **GPU:** None (CPU-only by design)
- **Storage:** Local NVMe / SSD
### Node Roles (Logical)
| Node Type | Responsibilities |
|---------|------------------|
| Control / Worker | Kubernetes control plane + workloads |
| Storage-heavy | Databases, MinIO, observability data |
| Compute-heavy | LLM inference, embeddings, pipelines |
> The cluster is intentionally **single-region and on-prem-like**, optimized for predictability and data locality.
---
## 3. Kubernetes Infrastructure Configuration
### Kubernetes
- Runtime for **all services**
- Namespaced isolation
- Explicit PersistentVolumeClaims
- Declarative configuration (GitOps)
### Namespaces (Conceptual)
| Namespace | Purpose |
|--------|--------|
| `ai` | LLMs, embeddings, labeling |
| `vec` | Vector database |
| `graph` | Knowledge graph |
| `db` | Relational databases |
| `storage` | Object storage |
| `mlops` | MLflow |
| `ml` | Argo Workflows |
| `auth` | Keycloak |
| `observability` | LGTM stack |
| `hadith` | Custom apps (orchestrator, UI) |
---
## 4. Networking & DNS
### Ingress
- **NGINX Ingress Controller**
- HTTPS termination at ingress
- Internal services communicate via ClusterIP
### TLS
- **cert-manager**
- Lets Encrypt
- Automatic renewal
### Public Endpoints
| URL | Service |
|----|--------|
| https://llm.betelgeusebytes.io | LLM inference (Ollama / llama.cpp) |
| https://embeddings.betelgeusebytes.io | Text Embeddings Inference |
| https://vector.betelgeusebytes.io | Qdrant + UI |
| https://neo4j.betelgeusebytes.io | Neo4j Browser |
| https://hadith-api.betelgeusebytes.io | FastAPI Orchestrator |
| https://hadith-admin.betelgeusebytes.io | Admin / Curation UI |
| https://label.betelgeusebytes.io | Label Studio |
| https://mlflow.betelgeusebytes.io | MLflow |
| https://minio.betelgeusebytes.io | MinIO Console |
| https://argo.betelgeusebytes.io | Argo Workflows |
| https://auth.betelgeusebytes.io | Keycloak |
| https://grafana.betelgeusebytes.io | Grafana |
---
## 5. Full Logical Architecture
```mermaid
flowchart LR
User --> AdminUI --> Orchestrator
Orchestrator --> LLM
Orchestrator --> TEI --> Qdrant
Orchestrator --> Neo4j
Orchestrator --> PostgreSQL
Orchestrator --> Redis
LabelStudio --> MinIO
MinIO --> ArgoWF --> MLflow
MLflow --> Models --> Orchestrator
Kafka --> ArgoWF
Alloy --> Prometheus --> Grafana
Alloy --> Loki --> Grafana
Alloy --> Tempo --> Grafana
```
6. AI & Reasoning Layer
Ollama / llama.cpp (CPU LLM)
Current usage
JSON-structured extraction
Sanad / matn reasoning
Deterministic outputs
No GPU dependency
Future usage
Offline assistants
Document intelligence
Agent frameworks
Replaceable by vLLM when GPUs are added
Text Embeddings Inference (TEI)
Current usage
Embeddings for hadith texts and biographies
Future usage
RAG systems
Semantic search
Deduplication
Similarity clustering
Qdrant (Vector Database)
Current usage
Stores embeddings
Similarity search
Future usage
Recommendation systems
Agent memory
Multimodal retrieval
Includes Web UI.
7. Knowledge & Data Layer
Neo4j (Graph Database)
Current usage
Isnād chains
Narrator relationships
Future usage
Knowledge graphs
Trust networks
Provenance systems
PostgreSQL
Current usage
App data
MLflow backend
Label Studio DB
Future usage
Feature stores
Metadata catalogs
Transactional apps
Redis
Current usage
Caching
Temporary state
Future usage
Job queues
Rate limiting
Sessions
Kafka
Current usage
Optional async backbone
Future usage
Streaming ingestion
Event-driven ML
Audit pipelines
MinIO (S3)
Current usage
Datasets
Model artifacts
Pipeline outputs
Future usage
Data lake
Backups
Feature storage
8. MLOps & Human-in-the-Loop
Label Studio
Current usage
Human annotation of narrators & relations
Future usage
Any labeling task (text, image, audio)
MLflow
Current usage
Experiment tracking
Model registry
Future usage
Governance
Model promotion
Auditing
Argo Workflows
Current usage
ETL & training pipelines
Future usage
Batch inference
Scheduled automation
Data engineering
9. Authentication & Security
Keycloak
Current usage
SSO for Admin UI, MLflow, Label Studio
Future usage
API authentication
Multi-tenant access
Organization-wide IAM
10. Observability Stack (LGTM)
Components
Grafana
Prometheus
Loki
Tempo
Grafana Alloy
kube-state-metrics
node-exporter
Capabilities
Metrics, logs, traces
Automatic correlation
OTLP-native
Local SSD persistence
11. Design Rules for All Custom Services
All services must:
be stateless
use env vars & Kubernetes Secrets
authenticate via Keycloak
emit:
Prometheus metrics
OTLP traces
structured JSON logs
be deployable via kubectl & Argo CD
12. Future Use Cases (Beyond Hadith)
This platform can support:
General Knowledge Graph AI
Legal / scholarly document analysis
Enterprise RAG systems
Research data platforms
Explainable AI systems
Internal search engines
Agent-based systems
Provenance & trust scoring engines
Digital humanities projects
Offline sovereign AI deployments