betelgeusebytes/ARCHITECTURE.md

# BetelgeuseBytes – Architecture Overview

## High-Level Architecture

This platform is a **self-hosted, production-grade Kubernetes stack** designed for:

* AI / ML experimentation and serving
* Data engineering & observability
* Knowledge graphs & vector search
* Automation, workflows, and research tooling

The architecture follows a **hub-and-spoke model**:

* **Core Infrastructure**: Kubernetes + networking + storage
* **Platform Services**: databases, messaging, auth, observability
* **ML / AI Services**: labeling, embeddings, LLM serving, notebooks
* **Automation & Workflows**: Argo Workflows, n8n
* **Access Layer**: DNS, Ingress, TLS

---

## Logical Architecture Diagram (Textual)

```
Internet
   │
   ▼
DNS (betelgeusebytes.io)
   │
   ▼
Ingress-NGINX (TLS via cert-manager)
   │
   ├── Platform UIs (Grafana, Kibana, Gitea, Neo4j, MinIO, etc.)
   ├── ML UIs (Jupyter, Label Studio, MLflow)
   ├── Automation (n8n, Argo)
   └── APIs (Postgres TCP, Neo4j Bolt, Kafka)

Kubernetes Cluster
   ├── Control Plane
   ├── Worker Nodes
   ├── Stateful Workloads (local SSD)
   └── Observability Stack
```

---

## Key Design Principles

* **Bare‑metal friendly** (Hetzner dedicated servers)
* **Local SSD storage** for stateful workloads
* **Everything observable** (logs, metrics, traces)
* **CPU-first ML** with optional GPU expansion
* **Single-tenant but multi-project ready**

---

## Networking

* Cilium CNI (eBPF-based networking)
* NGINX Ingress Controller
* TCP services exposed via Ingress patch (Postgres, Neo4j Bolt)
* WireGuard mesh between nodes

---

## Security Model

* TLS everywhere (cert-manager + Let’s Encrypt)
* Namespace isolation per domain (db, ml, graph, observability…)
* Secrets stored in Kubernetes Secrets
* Optional Basic Auth on sensitive UIs
* Keycloak available for future SSO

---

## Scalability Notes

* Currently single control-plane + workers
* Designed to add:

  * More workers
  * Dedicated control-plane VPS nodes
  * GPU nodes (for vLLM / training)

---

## What This Enables

* Research platforms
* Knowledge graph + LLM pipelines
* End-to-end ML lifecycle
* Automated data pipelines
* Production observability-first apps