94 lines
2.2 KiB
Markdown
94 lines
2.2 KiB
Markdown
# BetelgeuseBytes – Architecture Overview
|
||
|
||
## High-Level Architecture
|
||
|
||
This platform is a **self-hosted, production-grade Kubernetes stack** designed for:
|
||
|
||
* AI / ML experimentation and serving
|
||
* Data engineering & observability
|
||
* Knowledge graphs & vector search
|
||
* Automation, workflows, and research tooling
|
||
|
||
The architecture follows a **hub-and-spoke model**:
|
||
|
||
* **Core Infrastructure**: Kubernetes + networking + storage
|
||
* **Platform Services**: databases, messaging, auth, observability
|
||
* **ML / AI Services**: labeling, embeddings, LLM serving, notebooks
|
||
* **Automation & Workflows**: Argo Workflows, n8n
|
||
* **Access Layer**: DNS, Ingress, TLS
|
||
|
||
---
|
||
|
||
## Logical Architecture Diagram (Textual)
|
||
|
||
```
|
||
Internet
|
||
│
|
||
▼
|
||
DNS (betelgeusebytes.io)
|
||
│
|
||
▼
|
||
Ingress-NGINX (TLS via cert-manager)
|
||
│
|
||
├── Platform UIs (Grafana, Kibana, Gitea, Neo4j, MinIO, etc.)
|
||
├── ML UIs (Jupyter, Label Studio, MLflow)
|
||
├── Automation (n8n, Argo)
|
||
└── APIs (Postgres TCP, Neo4j Bolt, Kafka)
|
||
|
||
Kubernetes Cluster
|
||
├── Control Plane
|
||
├── Worker Nodes
|
||
├── Stateful Workloads (local SSD)
|
||
└── Observability Stack
|
||
```
|
||
|
||
---
|
||
|
||
## Key Design Principles
|
||
|
||
* **Bare‑metal friendly** (Hetzner dedicated servers)
|
||
* **Local SSD storage** for stateful workloads
|
||
* **Everything observable** (logs, metrics, traces)
|
||
* **CPU-first ML** with optional GPU expansion
|
||
* **Single-tenant but multi-project ready**
|
||
|
||
---
|
||
|
||
## Networking
|
||
|
||
* Cilium CNI (eBPF-based networking)
|
||
* NGINX Ingress Controller
|
||
* TCP services exposed via Ingress patch (Postgres, Neo4j Bolt)
|
||
* WireGuard mesh between nodes
|
||
|
||
---
|
||
|
||
## Security Model
|
||
|
||
* TLS everywhere (cert-manager + Let’s Encrypt)
|
||
* Namespace isolation per domain (db, ml, graph, observability…)
|
||
* Secrets stored in Kubernetes Secrets
|
||
* Optional Basic Auth on sensitive UIs
|
||
* Keycloak available for future SSO
|
||
|
||
---
|
||
|
||
## Scalability Notes
|
||
|
||
* Currently single control-plane + workers
|
||
* Designed to add:
|
||
|
||
* More workers
|
||
* Dedicated control-plane VPS nodes
|
||
* GPU nodes (for vLLM / training)
|
||
|
||
---
|
||
|
||
## What This Enables
|
||
|
||
* Research platforms
|
||
* Knowledge graph + LLM pipelines
|
||
* End-to-end ML lifecycle
|
||
* Automated data pipelines
|
||
* Production observability-first apps
|