6.5 KiB

Raw Blame History

🧠 BetelgeuseBytes — Full AI Infrastructure Stack

Authoritative README, Architecture & Onboarding Guide

This repository documents the entire self-hosted AI infrastructure stack running on a Kubernetes cluster hosted on Hetzner dedicated servers.

The stack currently powers an Islamic Hadith Scholar AI, but it is intentionally designed as a general-purpose, sovereign AI, MLOps, and data platform that can support many future projects.

This document is the single source of truth for:

architecture (logical & physical)
infrastructure configuration
networking & DNS
every deployed component
why each component exists
how to build new systems on top of the platform

1. Mission & Design Philosophy

Current Mission

Build an AI system that can:

Parse classical Islamic texts
Extract Sanad (chains of narrators) and Matn (hadith text)
Identify narrators and their relationships:
- teacher / student
- familial lineage
Construct a verifiable knowledge graph
Support human scholarly review
Provide transparent and explainable reasoning
Operate fully on-prem, CPU-first, without SaaS or GPU dependency

Core Principles

Sovereignty — no external cloud lock-in
Explainability — graph + provenance, not black boxes
Human-in-the-loop — scholars remain in control
Observability-first — everything is measurable and traceable
Composable — every part can be reused or replaced

2. Physical Infrastructure (Hetzner)

Nodes

Provider: Hetzner
Type: Dedicated servers
Architecture: x86_64
GPU: None (CPU-only by design)
Storage: Local NVMe / SSD

Node Roles (Logical)

Node Type	Responsibilities
Control / Worker	Kubernetes control plane + workloads
Storage-heavy	Databases, MinIO, observability data
Compute-heavy	LLM inference, embeddings, pipelines

The cluster is intentionally single-region and on-prem-like, optimized for predictability and data locality.

3. Kubernetes Infrastructure Configuration

Kubernetes

Runtime for all services
Namespaced isolation
Explicit PersistentVolumeClaims
Declarative configuration (GitOps)

Namespaces (Conceptual)

Namespace	Purpose
`ai`	LLMs, embeddings, labeling
`vec`	Vector database
`graph`	Knowledge graph
`db`	Relational databases
`storage`	Object storage
`mlops`	MLflow
`ml`	Argo Workflows
`auth`	Keycloak
`observability`	LGTM stack
`hadith`	Custom apps (orchestrator, UI)

4. Networking & DNS

Ingress

NGINX Ingress Controller
HTTPS termination at ingress
Internal services communicate via ClusterIP

TLS

cert-manager
Let’s Encrypt
Automatic renewal

Public Endpoints

URL	Service
https://llm.betelgeusebytes.io	LLM inference (Ollama / llama.cpp)
https://embeddings.betelgeusebytes.io	Text Embeddings Inference
https://vector.betelgeusebytes.io	Qdrant + UI
https://neo4j.betelgeusebytes.io	Neo4j Browser
https://hadith-api.betelgeusebytes.io	FastAPI Orchestrator
https://hadith-admin.betelgeusebytes.io	Admin / Curation UI
https://label.betelgeusebytes.io	Label Studio
https://mlflow.betelgeusebytes.io	MLflow
https://minio.betelgeusebytes.io	MinIO Console
https://argo.betelgeusebytes.io	Argo Workflows
https://auth.betelgeusebytes.io	Keycloak
https://grafana.betelgeusebytes.io	Grafana

5. Full Logical Architecture

flowchart LR
  User --> AdminUI --> Orchestrator

  Orchestrator --> LLM
  Orchestrator --> TEI --> Qdrant
  Orchestrator --> Neo4j
  Orchestrator --> PostgreSQL
  Orchestrator --> Redis

  LabelStudio --> MinIO
  MinIO --> ArgoWF --> MLflow
  MLflow --> Models --> Orchestrator

  Kafka --> ArgoWF

  Alloy --> Prometheus --> Grafana
  Alloy --> Loki --> Grafana
  Alloy --> Tempo --> Grafana

AI & Reasoning Layer Ollama / llama.cpp (CPU LLM) Current usage

JSON-structured extraction

Sanad / matn reasoning

Deterministic outputs

No GPU dependency

Future usage

Offline assistants

Document intelligence

Agent frameworks

Replaceable by vLLM when GPUs are added

Text Embeddings Inference (TEI) Current usage

Embeddings for hadith texts and biographies

Future usage

RAG systems

Semantic search

Deduplication

Similarity clustering

Qdrant (Vector Database) Current usage

Stores embeddings

Similarity search

Future usage

Recommendation systems

Agent memory

Multimodal retrieval

Includes Web UI.

Knowledge & Data Layer Neo4j (Graph Database) Current usage

Isnād chains

Narrator relationships

Future usage

Knowledge graphs

Trust networks

Provenance systems

PostgreSQL Current usage

App data

MLflow backend

Label Studio DB

Future usage

Feature stores

Metadata catalogs

Transactional apps

Redis Current usage

Caching

Temporary state

Future usage

Job queues

Rate limiting

Sessions

Kafka Current usage

Optional async backbone

Future usage

Streaming ingestion

Event-driven ML

Audit pipelines

MinIO (S3) Current usage

Datasets

Model artifacts

Pipeline outputs

Future usage

Data lake

Backups

Feature storage

MLOps & Human-in-the-Loop Label Studio Current usage

Human annotation of narrators & relations

Future usage

Any labeling task (text, image, audio)

MLflow Current usage

Experiment tracking

Model registry

Future usage

Governance

Model promotion

Auditing

Argo Workflows Current usage

ETL & training pipelines

Future usage

Batch inference

Scheduled automation

Data engineering

Authentication & Security Keycloak Current usage

SSO for Admin UI, MLflow, Label Studio

Future usage

API authentication

Multi-tenant access

Organization-wide IAM

Observability Stack (LGTM) Components Grafana

Prometheus

Loki

Tempo

Grafana Alloy

kube-state-metrics

node-exporter

Capabilities Metrics, logs, traces

Automatic correlation

OTLP-native

Local SSD persistence

Design Rules for All Custom Services All services must:

be stateless

use env vars & Kubernetes Secrets

authenticate via Keycloak

emit:

Prometheus metrics

OTLP traces

structured JSON logs

be deployable via kubectl & Argo CD

Future Use Cases (Beyond Hadith) This platform can support:

General Knowledge Graph AI

Legal / scholarly document analysis

Enterprise RAG systems

Research data platforms

Explainable AI systems

Internal search engines

Agent-based systems

Provenance & trust scoring engines

Digital humanities projects

Offline sovereign AI deployments

6.5 KiB Raw Blame History Unescape Escape

🧠 BetelgeuseBytes — Full AI Infrastructure Stack

Authoritative README, Architecture & Onboarding Guide

1. Mission & Design Philosophy

Current Mission

Core Principles

2. Physical Infrastructure (Hetzner)

Nodes

Node Roles (Logical)

3. Kubernetes Infrastructure Configuration

Kubernetes

Namespaces (Conceptual)

4. Networking & DNS

Ingress

TLS

Public Endpoints

5. Full Logical Architecture

6.5 KiB

Raw Blame History