# 🧠 BetelgeuseBytes — Full AI Infrastructure Stack ## Authoritative README, Architecture & Onboarding Guide This repository documents the **entire self-hosted AI infrastructure stack** running on a Kubernetes cluster hosted on **Hetzner dedicated servers**. The stack currently powers an **Islamic Hadith Scholar AI**, but it is intentionally designed as a **general-purpose, sovereign AI, MLOps, and data platform** that can support many future projects. This document is the **single source of truth** for: - architecture (logical & physical) - infrastructure configuration - networking & DNS - every deployed component - why each component exists - how to build new systems on top of the platform --- ## 1. Mission & Design Philosophy ### Current Mission Build an AI system that can: - Parse classical Islamic texts - Extract **Sanad** (chains of narrators) and **Matn** (hadith text) - Identify narrators and their relationships: - teacher / student - familial lineage - Construct a **verifiable knowledge graph** - Support **human scholarly review** - Provide **transparent and explainable reasoning** - Operate **fully on-prem**, CPU-first, without SaaS or GPU dependency ### Core Principles - **Sovereignty** — no external cloud lock-in - **Explainability** — graph + provenance, not black boxes - **Human-in-the-loop** — scholars remain in control - **Observability-first** — everything is measurable and traceable - **Composable** — every part can be reused or replaced --- ## 2. Physical Infrastructure (Hetzner) ### Nodes - **Provider:** Hetzner - **Type:** Dedicated servers - **Architecture:** x86_64 - **GPU:** None (CPU-only by design) - **Storage:** Local NVMe / SSD ### Node Roles (Logical) | Node Type | Responsibilities | |---------|------------------| | Control / Worker | Kubernetes control plane + workloads | | Storage-heavy | Databases, MinIO, observability data | | Compute-heavy | LLM inference, embeddings, pipelines | > The cluster is intentionally **single-region and on-prem-like**, optimized for predictability and data locality. --- ## 3. Kubernetes Infrastructure Configuration ### Kubernetes - Runtime for **all services** - Namespaced isolation - Explicit PersistentVolumeClaims - Declarative configuration (GitOps) ### Namespaces (Conceptual) | Namespace | Purpose | |--------|--------| | `ai` | LLMs, embeddings, labeling | | `vec` | Vector database | | `graph` | Knowledge graph | | `db` | Relational databases | | `storage` | Object storage | | `mlops` | MLflow | | `ml` | Argo Workflows | | `auth` | Keycloak | | `observability` | LGTM stack | | `hadith` | Custom apps (orchestrator, UI) | --- ## 4. Networking & DNS ### Ingress - **NGINX Ingress Controller** - HTTPS termination at ingress - Internal services communicate via ClusterIP ### TLS - **cert-manager** - Let’s Encrypt - Automatic renewal ### Public Endpoints | URL | Service | |----|--------| | https://llm.betelgeusebytes.io | LLM inference (Ollama / llama.cpp) | | https://embeddings.betelgeusebytes.io | Text Embeddings Inference | | https://vector.betelgeusebytes.io | Qdrant + UI | | https://neo4j.betelgeusebytes.io | Neo4j Browser | | https://hadith-api.betelgeusebytes.io | FastAPI Orchestrator | | https://hadith-admin.betelgeusebytes.io | Admin / Curation UI | | https://label.betelgeusebytes.io | Label Studio | | https://mlflow.betelgeusebytes.io | MLflow | | https://minio.betelgeusebytes.io | MinIO Console | | https://argo.betelgeusebytes.io | Argo Workflows | | https://auth.betelgeusebytes.io | Keycloak | | https://grafana.betelgeusebytes.io | Grafana | --- ## 5. Full Logical Architecture ```mermaid flowchart LR User --> AdminUI --> Orchestrator Orchestrator --> LLM Orchestrator --> TEI --> Qdrant Orchestrator --> Neo4j Orchestrator --> PostgreSQL Orchestrator --> Redis LabelStudio --> MinIO MinIO --> ArgoWF --> MLflow MLflow --> Models --> Orchestrator Kafka --> ArgoWF Alloy --> Prometheus --> Grafana Alloy --> Loki --> Grafana Alloy --> Tempo --> Grafana ``` 6. AI & Reasoning Layer Ollama / llama.cpp (CPU LLM) Current usage JSON-structured extraction Sanad / matn reasoning Deterministic outputs No GPU dependency Future usage Offline assistants Document intelligence Agent frameworks Replaceable by vLLM when GPUs are added Text Embeddings Inference (TEI) Current usage Embeddings for hadith texts and biographies Future usage RAG systems Semantic search Deduplication Similarity clustering Qdrant (Vector Database) Current usage Stores embeddings Similarity search Future usage Recommendation systems Agent memory Multimodal retrieval Includes Web UI. 7. Knowledge & Data Layer Neo4j (Graph Database) Current usage Isnād chains Narrator relationships Future usage Knowledge graphs Trust networks Provenance systems PostgreSQL Current usage App data MLflow backend Label Studio DB Future usage Feature stores Metadata catalogs Transactional apps Redis Current usage Caching Temporary state Future usage Job queues Rate limiting Sessions Kafka Current usage Optional async backbone Future usage Streaming ingestion Event-driven ML Audit pipelines MinIO (S3) Current usage Datasets Model artifacts Pipeline outputs Future usage Data lake Backups Feature storage 8. MLOps & Human-in-the-Loop Label Studio Current usage Human annotation of narrators & relations Future usage Any labeling task (text, image, audio) MLflow Current usage Experiment tracking Model registry Future usage Governance Model promotion Auditing Argo Workflows Current usage ETL & training pipelines Future usage Batch inference Scheduled automation Data engineering 9. Authentication & Security Keycloak Current usage SSO for Admin UI, MLflow, Label Studio Future usage API authentication Multi-tenant access Organization-wide IAM 10. Observability Stack (LGTM) Components Grafana Prometheus Loki Tempo Grafana Alloy kube-state-metrics node-exporter Capabilities Metrics, logs, traces Automatic correlation OTLP-native Local SSD persistence 11. Design Rules for All Custom Services All services must: be stateless use env vars & Kubernetes Secrets authenticate via Keycloak emit: Prometheus metrics OTLP traces structured JSON logs be deployable via kubectl & Argo CD 12. Future Use Cases (Beyond Hadith) This platform can support: General Knowledge Graph AI Legal / scholarly document analysis Enterprise RAG systems Research data platforms Explainable AI systems Internal search engines Agent-based systems Provenance & trust scoring engines Digital humanities projects Offline sovereign AI deployments