betelgeusebytes/INFRASTRUCTURE.md

155 lines
2.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BetelgeuseBytes Infrastructure & Cluster Configuration
## Hosting Provider
* **Provider**: Hetzner
* **Server Type**: Dedicated servers
* **Region**: EU
* **Network**: Private LAN + WireGuard
---
## Nodes
### Current Nodes
| Node | Role | Notes |
| --------- | ---------------------- | ------------------- |
| hetzner-1 | control-plane + worker | runs core workloads |
| hetzner-2 | worker + storage | hosts local SSD PVs |
---
## Kubernetes Setup
* Kubernetes installed via kubeadm
* Single cluster
* Control plane is also schedulable
### CNI
* **Cilium**
* eBPF dataplane
* kube-proxy replacement
* Network policy support
---
## Storage
### Persistent Volumes
* Backed by **local NVMe / SSD**
* Manually provisioned PVs
* Bound via PVCs
### Storage Layout
```
/mnt/local-ssd/
├── postgres/
├── neo4j/
├── elasticsearch/
├── prometheus/
├── loki/
├── tempo/
├── grafana/
├── minio/
└── qdrant/
```
---
## Networking
* Ingress Controller: nginx
* External DNS records → ingress IP
* TCP mappings for:
* PostgreSQL
* Neo4j Bolt
---
## TLS & Certificates
* cert-manager
* ClusterIssuer: Lets Encrypt
* Automatic renewal
---
## Namespaces
| Namespace | Purpose |
| ------------- | ---------------------------------- |
| db | Databases (Postgres, Redis) |
| graph | Neo4j |
| broker | Kafka |
| ml | ML tooling (Jupyter, Argo, MLflow) |
| observability | Grafana, Prometheus, Loki, Tempo |
| automation | n8n |
| devops | Gitea, Argo CD |
---
## What This Infra Enables
* Full onprem AI platform
* Predictable performance
* Low-latency data access
* Independence from cloud providers
```mermaid
flowchart TB
subgraph NET[Internet]
W[Web/Clients]
end
subgraph EDGE[Edge]
DNS[DNS: betelgeusebytes.io\nA/AAAA -> Ingress IP]
LE[cert-manager\nLet's Encrypt]
ING[Ingress-NGINX]
DNS --> ING
LE --> ING
W --> DNS
end
subgraph K8S
direction TB
subgraph N1[Node 1]
CP[control-plane + worker]
PV1[(local SSD PVs)]
end
subgraph N2[Node 2]
WK[worker + storage-heavy]
PV2[(local SSD PVs)]
end
subgraph NS
AI[ai: LLM, TEI, Label Studio]
VEC[vec: Qdrant]
GRAPH[graph: Neo4j]
DB[db: Postgres, Redis]
BROKER[broker: Kafka]
STORE[storage: MinIO]
MLOPS[ml/mlops: MLflow, Argo WF, Jupyter]
OBS[observability: Grafana/Prom/Loki/Tempo/Alloy]
DEV[devops: ArgoCD, Gitea]
HAD
end
CP --- WK
PV1 --- DB
PV2 --- STORE
PV2 --- OBS
PV2 --- GRAPH
PV2 --- VEC
end
ING -->| host routing| NS
ING -.TCP (optional).- DB
ING -.Bolt (optional).- GRAPH