388 lines
6.5 KiB
Markdown
388 lines
6.5 KiB
Markdown
# 🧠 BetelgeuseBytes — Full AI Infrastructure Stack
|
||
## Authoritative README, Architecture & Onboarding Guide
|
||
|
||
This repository documents the **entire self-hosted AI infrastructure stack** running on a Kubernetes cluster hosted on **Hetzner dedicated servers**.
|
||
|
||
The stack currently powers an **Islamic Hadith Scholar AI**, but it is intentionally designed as a **general-purpose, sovereign AI, MLOps, and data platform** that can support many future projects.
|
||
|
||
This document is the **single source of truth** for:
|
||
- architecture (logical & physical)
|
||
- infrastructure configuration
|
||
- networking & DNS
|
||
- every deployed component
|
||
- why each component exists
|
||
- how to build new systems on top of the platform
|
||
|
||
---
|
||
|
||
## 1. Mission & Design Philosophy
|
||
|
||
### Current Mission
|
||
Build an AI system that can:
|
||
|
||
- Parse classical Islamic texts
|
||
- Extract **Sanad** (chains of narrators) and **Matn** (hadith text)
|
||
- Identify narrators and their relationships:
|
||
- teacher / student
|
||
- familial lineage
|
||
- Construct a **verifiable knowledge graph**
|
||
- Support **human scholarly review**
|
||
- Provide **transparent and explainable reasoning**
|
||
- Operate **fully on-prem**, CPU-first, without SaaS or GPU dependency
|
||
|
||
### Core Principles
|
||
- **Sovereignty** — no external cloud lock-in
|
||
- **Explainability** — graph + provenance, not black boxes
|
||
- **Human-in-the-loop** — scholars remain in control
|
||
- **Observability-first** — everything is measurable and traceable
|
||
- **Composable** — every part can be reused or replaced
|
||
|
||
---
|
||
|
||
## 2. Physical Infrastructure (Hetzner)
|
||
|
||
### Nodes
|
||
- **Provider:** Hetzner
|
||
- **Type:** Dedicated servers
|
||
- **Architecture:** x86_64
|
||
- **GPU:** None (CPU-only by design)
|
||
- **Storage:** Local NVMe / SSD
|
||
|
||
### Node Roles (Logical)
|
||
| Node Type | Responsibilities |
|
||
|---------|------------------|
|
||
| Control / Worker | Kubernetes control plane + workloads |
|
||
| Storage-heavy | Databases, MinIO, observability data |
|
||
| Compute-heavy | LLM inference, embeddings, pipelines |
|
||
|
||
> The cluster is intentionally **single-region and on-prem-like**, optimized for predictability and data locality.
|
||
|
||
---
|
||
|
||
## 3. Kubernetes Infrastructure Configuration
|
||
|
||
### Kubernetes
|
||
- Runtime for **all services**
|
||
- Namespaced isolation
|
||
- Explicit PersistentVolumeClaims
|
||
- Declarative configuration (GitOps)
|
||
|
||
### Namespaces (Conceptual)
|
||
| Namespace | Purpose |
|
||
|--------|--------|
|
||
| `ai` | LLMs, embeddings, labeling |
|
||
| `vec` | Vector database |
|
||
| `graph` | Knowledge graph |
|
||
| `db` | Relational databases |
|
||
| `storage` | Object storage |
|
||
| `mlops` | MLflow |
|
||
| `ml` | Argo Workflows |
|
||
| `auth` | Keycloak |
|
||
| `observability` | LGTM stack |
|
||
| `hadith` | Custom apps (orchestrator, UI) |
|
||
|
||
---
|
||
|
||
## 4. Networking & DNS
|
||
|
||
### Ingress
|
||
- **NGINX Ingress Controller**
|
||
- HTTPS termination at ingress
|
||
- Internal services communicate via ClusterIP
|
||
|
||
### TLS
|
||
- **cert-manager**
|
||
- Let’s Encrypt
|
||
- Automatic renewal
|
||
|
||
### Public Endpoints
|
||
|
||
| URL | Service |
|
||
|----|--------|
|
||
| https://llm.betelgeusebytes.io | LLM inference (Ollama / llama.cpp) |
|
||
| https://embeddings.betelgeusebytes.io | Text Embeddings Inference |
|
||
| https://vector.betelgeusebytes.io | Qdrant + UI |
|
||
| https://neo4j.betelgeusebytes.io | Neo4j Browser |
|
||
| https://hadith-api.betelgeusebytes.io | FastAPI Orchestrator |
|
||
| https://hadith-admin.betelgeusebytes.io | Admin / Curation UI |
|
||
| https://label.betelgeusebytes.io | Label Studio |
|
||
| https://mlflow.betelgeusebytes.io | MLflow |
|
||
| https://minio.betelgeusebytes.io | MinIO Console |
|
||
| https://argo.betelgeusebytes.io | Argo Workflows |
|
||
| https://auth.betelgeusebytes.io | Keycloak |
|
||
| https://grafana.betelgeusebytes.io | Grafana |
|
||
|
||
---
|
||
|
||
## 5. Full Logical Architecture
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
User --> AdminUI --> Orchestrator
|
||
|
||
Orchestrator --> LLM
|
||
Orchestrator --> TEI --> Qdrant
|
||
Orchestrator --> Neo4j
|
||
Orchestrator --> PostgreSQL
|
||
Orchestrator --> Redis
|
||
|
||
LabelStudio --> MinIO
|
||
MinIO --> ArgoWF --> MLflow
|
||
MLflow --> Models --> Orchestrator
|
||
|
||
Kafka --> ArgoWF
|
||
|
||
Alloy --> Prometheus --> Grafana
|
||
Alloy --> Loki --> Grafana
|
||
Alloy --> Tempo --> Grafana
|
||
```
|
||
6. AI & Reasoning Layer
|
||
Ollama / llama.cpp (CPU LLM)
|
||
Current usage
|
||
|
||
JSON-structured extraction
|
||
|
||
Sanad / matn reasoning
|
||
|
||
Deterministic outputs
|
||
|
||
No GPU dependency
|
||
|
||
Future usage
|
||
|
||
Offline assistants
|
||
|
||
Document intelligence
|
||
|
||
Agent frameworks
|
||
|
||
Replaceable by vLLM when GPUs are added
|
||
|
||
Text Embeddings Inference (TEI)
|
||
Current usage
|
||
|
||
Embeddings for hadith texts and biographies
|
||
|
||
Future usage
|
||
|
||
RAG systems
|
||
|
||
Semantic search
|
||
|
||
Deduplication
|
||
|
||
Similarity clustering
|
||
|
||
Qdrant (Vector Database)
|
||
Current usage
|
||
|
||
Stores embeddings
|
||
|
||
Similarity search
|
||
|
||
Future usage
|
||
|
||
Recommendation systems
|
||
|
||
Agent memory
|
||
|
||
Multimodal retrieval
|
||
|
||
Includes Web UI.
|
||
|
||
7. Knowledge & Data Layer
|
||
Neo4j (Graph Database)
|
||
Current usage
|
||
|
||
Isnād chains
|
||
|
||
Narrator relationships
|
||
|
||
Future usage
|
||
|
||
Knowledge graphs
|
||
|
||
Trust networks
|
||
|
||
Provenance systems
|
||
|
||
PostgreSQL
|
||
Current usage
|
||
|
||
App data
|
||
|
||
MLflow backend
|
||
|
||
Label Studio DB
|
||
|
||
Future usage
|
||
|
||
Feature stores
|
||
|
||
Metadata catalogs
|
||
|
||
Transactional apps
|
||
|
||
Redis
|
||
Current usage
|
||
|
||
Caching
|
||
|
||
Temporary state
|
||
|
||
Future usage
|
||
|
||
Job queues
|
||
|
||
Rate limiting
|
||
|
||
Sessions
|
||
|
||
Kafka
|
||
Current usage
|
||
|
||
Optional async backbone
|
||
|
||
Future usage
|
||
|
||
Streaming ingestion
|
||
|
||
Event-driven ML
|
||
|
||
Audit pipelines
|
||
|
||
MinIO (S3)
|
||
Current usage
|
||
|
||
Datasets
|
||
|
||
Model artifacts
|
||
|
||
Pipeline outputs
|
||
|
||
Future usage
|
||
|
||
Data lake
|
||
|
||
Backups
|
||
|
||
Feature storage
|
||
|
||
8. MLOps & Human-in-the-Loop
|
||
Label Studio
|
||
Current usage
|
||
|
||
Human annotation of narrators & relations
|
||
|
||
Future usage
|
||
|
||
Any labeling task (text, image, audio)
|
||
|
||
MLflow
|
||
Current usage
|
||
|
||
Experiment tracking
|
||
|
||
Model registry
|
||
|
||
Future usage
|
||
|
||
Governance
|
||
|
||
Model promotion
|
||
|
||
Auditing
|
||
|
||
Argo Workflows
|
||
Current usage
|
||
|
||
ETL & training pipelines
|
||
|
||
Future usage
|
||
|
||
Batch inference
|
||
|
||
Scheduled automation
|
||
|
||
Data engineering
|
||
|
||
9. Authentication & Security
|
||
Keycloak
|
||
Current usage
|
||
|
||
SSO for Admin UI, MLflow, Label Studio
|
||
|
||
Future usage
|
||
|
||
API authentication
|
||
|
||
Multi-tenant access
|
||
|
||
Organization-wide IAM
|
||
|
||
10. Observability Stack (LGTM)
|
||
Components
|
||
Grafana
|
||
|
||
Prometheus
|
||
|
||
Loki
|
||
|
||
Tempo
|
||
|
||
Grafana Alloy
|
||
|
||
kube-state-metrics
|
||
|
||
node-exporter
|
||
|
||
Capabilities
|
||
Metrics, logs, traces
|
||
|
||
Automatic correlation
|
||
|
||
OTLP-native
|
||
|
||
Local SSD persistence
|
||
|
||
11. Design Rules for All Custom Services
|
||
All services must:
|
||
|
||
be stateless
|
||
|
||
use env vars & Kubernetes Secrets
|
||
|
||
authenticate via Keycloak
|
||
|
||
emit:
|
||
|
||
Prometheus metrics
|
||
|
||
OTLP traces
|
||
|
||
structured JSON logs
|
||
|
||
be deployable via kubectl & Argo CD
|
||
|
||
12. Future Use Cases (Beyond Hadith)
|
||
This platform can support:
|
||
|
||
General Knowledge Graph AI
|
||
|
||
Legal / scholarly document analysis
|
||
|
||
Enterprise RAG systems
|
||
|
||
Research data platforms
|
||
|
||
Explainable AI systems
|
||
|
||
Internal search engines
|
||
|
||
Agent-based systems
|
||
|
||
Provenance & trust scoring engines
|
||
|
||
Digital humanities projects
|
||
|
||
Offline sovereign AI deployments |