261 lines
7.1 KiB
Markdown
261 lines
7.1 KiB
Markdown
# BetelgeuseBytes – Architecture Overview
|
||
|
||
## High-Level Architecture
|
||
|
||
This platform is a **self-hosted, production-grade Kubernetes stack** designed for:
|
||
|
||
* AI / ML experimentation and serving
|
||
* Data engineering & observability
|
||
* Knowledge graphs & vector search
|
||
* Automation, workflows, and research tooling
|
||
|
||
The architecture follows a **hub-and-spoke model**:
|
||
|
||
* **Core Infrastructure**: Kubernetes + networking + storage
|
||
* **Platform Services**: databases, messaging, auth, observability
|
||
* **ML / AI Services**: labeling, embeddings, LLM serving, notebooks
|
||
* **Automation & Workflows**: Argo Workflows, n8n
|
||
* **Access Layer**: DNS, Ingress, TLS
|
||
|
||
---
|
||
|
||
## Logical Architecture Diagram (Textual)
|
||
|
||
```
|
||
Internet
|
||
│
|
||
▼
|
||
DNS (betelgeusebytes.io)
|
||
│
|
||
▼
|
||
Ingress-NGINX (TLS via cert-manager)
|
||
│
|
||
├── Platform UIs (Grafana, Kibana, Gitea, Neo4j, MinIO, etc.)
|
||
├── ML UIs (Jupyter, Label Studio, MLflow)
|
||
├── Automation (n8n, Argo)
|
||
└── APIs (Postgres TCP, Neo4j Bolt, Kafka)
|
||
|
||
Kubernetes Cluster
|
||
├── Control Plane
|
||
├── Worker Nodes
|
||
├── Stateful Workloads (local SSD)
|
||
└── Observability Stack
|
||
```
|
||
|
||
---
|
||
|
||
## Key Design Principles
|
||
|
||
* **Bare‑metal friendly** (Hetzner dedicated servers)
|
||
* **Local SSD storage** for stateful workloads
|
||
* **Everything observable** (logs, metrics, traces)
|
||
* **CPU-first ML** with optional GPU expansion
|
||
* **Single-tenant but multi-project ready**
|
||
|
||
---
|
||
|
||
## Networking
|
||
|
||
* Cilium CNI (eBPF-based networking)
|
||
* NGINX Ingress Controller
|
||
* TCP services exposed via Ingress patch (Postgres, Neo4j Bolt)
|
||
* WireGuard mesh between nodes
|
||
|
||
---
|
||
|
||
## Security Model
|
||
|
||
* TLS everywhere (cert-manager + Let’s Encrypt)
|
||
* Namespace isolation per domain (db, ml, graph, observability…)
|
||
* Secrets stored in Kubernetes Secrets
|
||
* Optional Basic Auth on sensitive UIs
|
||
* Keycloak available for future SSO
|
||
|
||
---
|
||
|
||
## Scalability Notes
|
||
|
||
* Currently single control-plane + workers
|
||
* Designed to add:
|
||
|
||
* More workers
|
||
* Dedicated control-plane VPS nodes
|
||
* GPU nodes (for vLLM / training)
|
||
|
||
---
|
||
|
||
## What This Enables
|
||
|
||
* Research platforms
|
||
* Knowledge graph + LLM pipelines
|
||
* End-to-end ML lifecycle
|
||
* Automated data pipelines
|
||
* Production observability-first apps
|
||
|
||
|
||
|
||
```mermaid
|
||
|
||
flowchart TB
|
||
%% =========================
|
||
%% BetelgeuseBytes AI Platform – Full Architecture (CPU-first, K8s)
|
||
%% =========================
|
||
|
||
%% ---- External / Users ----
|
||
subgraph EXT["External Users"]
|
||
U1["Scholar / Admin User\n"]
|
||
U2["API Client\n(curl / SDK / Bots)"]
|
||
U3["Annotator\n(Labeling UI)"]
|
||
end
|
||
|
||
%% ---- DNS + TLS + Ingress ----
|
||
subgraph EDGE["Edge: DNS → TLS → Ingress"]
|
||
DNS["DNS: betelgeusebytes.io\nA/AAAA records → Ingress IP"]
|
||
CM["cert-manager\nLet's Encrypt TLS"]
|
||
INGRESS["NGINX Ingress Controller\nHTTP(S) + SNI routing"]
|
||
TCPMAP["Ingress TCP Services\n(Postgres, Neo4j Bolt)"]
|
||
end
|
||
|
||
%% ---- Kubernetes Cluster ----
|
||
subgraph K8S["K8S Cluster"]
|
||
direction TB
|
||
|
||
subgraph NET["Networking"]
|
||
CILIUM["Cilium CNI\n(eBPF dataplane / policies)"]
|
||
WG["WireGuard\n(node mesh / private networking)"]
|
||
end
|
||
|
||
subgraph DEVOPS["Dev/GitOps"]
|
||
GITEA["Gitea\nGit repos"]
|
||
ARGOCD["Argo CD\nGitOps deployments"]
|
||
end
|
||
|
||
subgraph OBS["Observability"]
|
||
ALLOY["Grafana Alloy\n(collect logs+traces)"]
|
||
PROM["Prometheus\n(metrics)"]
|
||
LOKI["Loki\n(logs)"]
|
||
TEMPO["Tempo\n(traces)"]
|
||
GRAF["Grafana\n(dashboards)"]
|
||
KSM["kube-state-metrics"]
|
||
NODEX["node-exporter"]
|
||
end
|
||
|
||
subgraph DATA["Core Data Layer"]
|
||
PG["PostgreSQL\n(app DB / MLflow / Label Studio)\nNamespace: db"]
|
||
REDIS["Redis\n(cache)\nNamespace: db"]
|
||
ES["Elasticsearch\n(search/log store)\nNamespace: elastic"]
|
||
KIB["Kibana\nUI\nNamespace: elastic"]
|
||
KAFKA["Kafka\n(event bus)\nNamespace: broker"]
|
||
KAFKAUI["Kafka UI\nUI\nNamespace: broker"]
|
||
MINIO["MinIO (S3)\n(datasets & artifacts)\nNamespace: storage"]
|
||
end
|
||
|
||
subgraph KG["Knowledge & Retrieval"]
|
||
NEO4J["Neo4j\n(knowledge graph)\nNamespace: graph"]
|
||
QDRANT["Qdrant\n(vector DB + UI)\nNamespace: vec"]
|
||
TEI["Text Embeddings Inference\n(embeddings API)\nNamespace: ai"]
|
||
end
|
||
|
||
subgraph AI["AI / ML Services"]
|
||
LLM["LLM Server (CPU)\nOllama / llama.cpp\nNamespace: ai"]
|
||
JUP["Jupyter\n(research notebooks)\nNamespace: ml"]
|
||
LABEL["Label Studio\n(annotation UI)\nNamespace: ai"]
|
||
MLFLOW["MLflow\n(tracking + registry)\nNamespace: mlops/ml"]
|
||
end
|
||
|
||
subgraph PIPE["Automation / Pipelines"]
|
||
ARGO_WF["Argo Workflows\n(pipelines)\nNamespace: ml/argo"]
|
||
N8N["n8n\n(automation)\nNamespace: automation"]
|
||
end
|
||
|
||
subgraph AUTH["Authentication"]
|
||
KEYCLOAK["Keycloak\n(OIDC/SSO)\nNamespace: auth"]
|
||
end
|
||
|
||
subgraph APPS["Custom Applications (to build)"]
|
||
ORCH["Hadith Orchestrator API\nNamespace: hadith"]
|
||
ADMIN["Hadith Admin UI\nNamespace: hadith"]
|
||
NER["NER Service\nNamespace: hadith"]
|
||
RE["Relation Extraction Service\nNamespace: hadith"]
|
||
end
|
||
end
|
||
|
||
%% ---- Edge wiring ----
|
||
U1 --> DNS
|
||
U2 --> DNS
|
||
U3 --> DNS
|
||
DNS --> INGRESS
|
||
CM --> INGRESS
|
||
|
||
%% ---- Public HTTP(S) routes ----
|
||
INGRESS -->|hadith-admin.betelgeusebytes.io| ADMIN
|
||
INGRESS -->|hadith-api.betelgeusebytes.io| ORCH
|
||
INGRESS -->|llm.betelgeusebytes.io| LLM
|
||
INGRESS -->|embeddings.betelgeusebytes.io| TEI
|
||
INGRESS -->|vector.betelgeusebytes.io| QDRANT
|
||
INGRESS -->|neo4j.betelgeusebytes.io| NEO4J
|
||
INGRESS -->|label.betelgeusebytes.io| LABEL
|
||
INGRESS -->|mlflow.betelgeusebytes.io| MLFLOW
|
||
INGRESS -->|minio.betelgeusebytes.io| MINIO
|
||
INGRESS -->|argo.betelgeusebytes.io| ARGO_WF
|
||
INGRESS -->|auth.betelgeusebytes.io| KEYCLOAK
|
||
INGRESS -->|grafana.betelgeusebytes.io| GRAF
|
||
INGRESS -->|kibana.betelgeusebytes.io| KIB
|
||
INGRESS -->|broker.betelgeusebytes.io| KAFKAUI
|
||
|
||
%% ---- TCP routes (optional/external) ----
|
||
TCPMAP -.-> PG
|
||
TCPMAP -.-> NEO4J
|
||
|
||
%% ---- GitOps flow ----
|
||
GITEA -->|manifests + app code| ARGOCD
|
||
ARGOCD -->|sync/apply| K8S
|
||
|
||
%% ---- Auth flows ----
|
||
ADMIN -->|OIDC login| KEYCLOAK
|
||
ORCH -->|validate JWT / introspect| KEYCLOAK
|
||
LABEL -->|optional OIDC| KEYCLOAK
|
||
MLFLOW -->|OIDC| KEYCLOAK
|
||
|
||
%% ---- Orchestrator runtime data flows ----
|
||
ORCH -->|reasoning / JSON extraction| LLM
|
||
ORCH -->|embed queries/docs| TEI
|
||
ORCH -->|vector search| QDRANT
|
||
ORCH -->|graph read/write| NEO4J
|
||
ORCH -->|metadata/users/jobs| PG
|
||
ORCH -->|cache| REDIS
|
||
ORCH -->|full-text search| ES
|
||
|
||
%% ---- NER/RE services (future) ----
|
||
ORCH --> NER
|
||
ORCH --> RE
|
||
NER -->|entities| NEO4J
|
||
RE -->|relations| NEO4J
|
||
|
||
%% ---- Data curation loop ----
|
||
LABEL -->|labeled datasets| MINIO
|
||
ARGO_WF -->|training data| MINIO
|
||
ARGO_WF -->|log metrics| MLFLOW
|
||
ARGO_WF -->|publish artifacts| MINIO
|
||
MLFLOW -->|model versions| MINIO
|
||
ARGO_WF -->|deploy/update services| ARGOCD
|
||
|
||
%% ---- Event-driven (optional) ----
|
||
ORCH -->|events| KAFKA
|
||
ARGO_WF -->|consume triggers| KAFKA
|
||
N8N -->|integrations/alerts| KAFKA
|
||
|
||
%% ---- Observability wiring ----
|
||
ALLOY --> LOKI
|
||
ALLOY --> TEMPO
|
||
PROM --> GRAF
|
||
LOKI --> GRAF
|
||
TEMPO --> GRAF
|
||
KSM --> PROM
|
||
NODEX --> PROM
|
||
|
||
%% ---- Internal networking ----
|
||
CILIUM --- INGRESS
|
||
WG --- CILIUM
|
||
|