6.5 KiB
🧠 BetelgeuseBytes — Full AI Infrastructure Stack
Authoritative README, Architecture & Onboarding Guide
This repository documents the entire self-hosted AI infrastructure stack running on a Kubernetes cluster hosted on Hetzner dedicated servers.
The stack currently powers an Islamic Hadith Scholar AI, but it is intentionally designed as a general-purpose, sovereign AI, MLOps, and data platform that can support many future projects.
This document is the single source of truth for:
- architecture (logical & physical)
- infrastructure configuration
- networking & DNS
- every deployed component
- why each component exists
- how to build new systems on top of the platform
1. Mission & Design Philosophy
Current Mission
Build an AI system that can:
- Parse classical Islamic texts
- Extract Sanad (chains of narrators) and Matn (hadith text)
- Identify narrators and their relationships:
- teacher / student
- familial lineage
- Construct a verifiable knowledge graph
- Support human scholarly review
- Provide transparent and explainable reasoning
- Operate fully on-prem, CPU-first, without SaaS or GPU dependency
Core Principles
- Sovereignty — no external cloud lock-in
- Explainability — graph + provenance, not black boxes
- Human-in-the-loop — scholars remain in control
- Observability-first — everything is measurable and traceable
- Composable — every part can be reused or replaced
2. Physical Infrastructure (Hetzner)
Nodes
- Provider: Hetzner
- Type: Dedicated servers
- Architecture: x86_64
- GPU: None (CPU-only by design)
- Storage: Local NVMe / SSD
Node Roles (Logical)
| Node Type | Responsibilities |
|---|---|
| Control / Worker | Kubernetes control plane + workloads |
| Storage-heavy | Databases, MinIO, observability data |
| Compute-heavy | LLM inference, embeddings, pipelines |
The cluster is intentionally single-region and on-prem-like, optimized for predictability and data locality.
3. Kubernetes Infrastructure Configuration
Kubernetes
- Runtime for all services
- Namespaced isolation
- Explicit PersistentVolumeClaims
- Declarative configuration (GitOps)
Namespaces (Conceptual)
| Namespace | Purpose |
|---|---|
ai |
LLMs, embeddings, labeling |
vec |
Vector database |
graph |
Knowledge graph |
db |
Relational databases |
storage |
Object storage |
mlops |
MLflow |
ml |
Argo Workflows |
auth |
Keycloak |
observability |
LGTM stack |
hadith |
Custom apps (orchestrator, UI) |
4. Networking & DNS
Ingress
- NGINX Ingress Controller
- HTTPS termination at ingress
- Internal services communicate via ClusterIP
TLS
- cert-manager
- Let’s Encrypt
- Automatic renewal
Public Endpoints
| URL | Service |
|---|---|
| https://llm.betelgeusebytes.io | LLM inference (Ollama / llama.cpp) |
| https://embeddings.betelgeusebytes.io | Text Embeddings Inference |
| https://vector.betelgeusebytes.io | Qdrant + UI |
| https://neo4j.betelgeusebytes.io | Neo4j Browser |
| https://hadith-api.betelgeusebytes.io | FastAPI Orchestrator |
| https://hadith-admin.betelgeusebytes.io | Admin / Curation UI |
| https://label.betelgeusebytes.io | Label Studio |
| https://mlflow.betelgeusebytes.io | MLflow |
| https://minio.betelgeusebytes.io | MinIO Console |
| https://argo.betelgeusebytes.io | Argo Workflows |
| https://auth.betelgeusebytes.io | Keycloak |
| https://grafana.betelgeusebytes.io | Grafana |
5. Full Logical Architecture
flowchart LR
User --> AdminUI --> Orchestrator
Orchestrator --> LLM
Orchestrator --> TEI --> Qdrant
Orchestrator --> Neo4j
Orchestrator --> PostgreSQL
Orchestrator --> Redis
LabelStudio --> MinIO
MinIO --> ArgoWF --> MLflow
MLflow --> Models --> Orchestrator
Kafka --> ArgoWF
Alloy --> Prometheus --> Grafana
Alloy --> Loki --> Grafana
Alloy --> Tempo --> Grafana
- AI & Reasoning Layer Ollama / llama.cpp (CPU LLM) Current usage
JSON-structured extraction
Sanad / matn reasoning
Deterministic outputs
No GPU dependency
Future usage
Offline assistants
Document intelligence
Agent frameworks
Replaceable by vLLM when GPUs are added
Text Embeddings Inference (TEI) Current usage
Embeddings for hadith texts and biographies
Future usage
RAG systems
Semantic search
Deduplication
Similarity clustering
Qdrant (Vector Database) Current usage
Stores embeddings
Similarity search
Future usage
Recommendation systems
Agent memory
Multimodal retrieval
Includes Web UI.
- Knowledge & Data Layer Neo4j (Graph Database) Current usage
Isnād chains
Narrator relationships
Future usage
Knowledge graphs
Trust networks
Provenance systems
PostgreSQL Current usage
App data
MLflow backend
Label Studio DB
Future usage
Feature stores
Metadata catalogs
Transactional apps
Redis Current usage
Caching
Temporary state
Future usage
Job queues
Rate limiting
Sessions
Kafka Current usage
Optional async backbone
Future usage
Streaming ingestion
Event-driven ML
Audit pipelines
MinIO (S3) Current usage
Datasets
Model artifacts
Pipeline outputs
Future usage
Data lake
Backups
Feature storage
- MLOps & Human-in-the-Loop Label Studio Current usage
Human annotation of narrators & relations
Future usage
Any labeling task (text, image, audio)
MLflow Current usage
Experiment tracking
Model registry
Future usage
Governance
Model promotion
Auditing
Argo Workflows Current usage
ETL & training pipelines
Future usage
Batch inference
Scheduled automation
Data engineering
- Authentication & Security Keycloak Current usage
SSO for Admin UI, MLflow, Label Studio
Future usage
API authentication
Multi-tenant access
Organization-wide IAM
- Observability Stack (LGTM) Components Grafana
Prometheus
Loki
Tempo
Grafana Alloy
kube-state-metrics
node-exporter
Capabilities Metrics, logs, traces
Automatic correlation
OTLP-native
Local SSD persistence
- Design Rules for All Custom Services All services must:
be stateless
use env vars & Kubernetes Secrets
authenticate via Keycloak
emit:
Prometheus metrics
OTLP traces
structured JSON logs
be deployable via kubectl & Argo CD
- Future Use Cases (Beyond Hadith) This platform can support:
General Knowledge Graph AI
Legal / scholarly document analysis
Enterprise RAG systems
Research data platforms
Explainable AI systems
Internal search engines
Agent-based systems
Provenance & trust scoring engines
Digital humanities projects
Offline sovereign AI deployments