309 lines
12 KiB
Markdown
309 lines
12 KiB
Markdown
# Hadith Scholar API — حَدِيثٌ
|
||
|
||
Production-grade REST API for analyzing Islamic hadith literature across 8+ major collections.
|
||
|
||
Built with **FastAPI** · **PostgreSQL** · **Neo4j** · **Qdrant** · **Elasticsearch**
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
The Hadith Scholar API provides structured access to ~41,000 hadiths from the major canonical collections, enriched with:
|
||
|
||
- **LLM-extracted narrator chains** — structured isnad parsing with entity typing
|
||
- **Narrator knowledge graph** — biographies, teacher/student networks, places, tribes (Neo4j)
|
||
- **Multilingual semantic search** — find hadiths by meaning in Arabic, English, or Urdu (BGE-M3 + Qdrant)
|
||
- **Full-text Arabic search** — morphological analysis with stemming and root extraction (Elasticsearch)
|
||
- **Interactive API docs** — Swagger UI with Arabic examples on every endpoint
|
||
|
||
### Collections
|
||
|
||
| Collection | Arabic | Hadiths |
|
||
|------------|--------|---------|
|
||
| Sahih Bukhari | صحيح البخاري | 6,986 |
|
||
| Sahih Muslim | صحيح مسلم | 15,034 |
|
||
| Sunan Abu Dawood | سنن أبي داود | 5,274 |
|
||
| Jami` at-Tirmidhi | جامع الترمذي | — |
|
||
| Sunan an-Nasa'i | سنن النسائي | 5,758 |
|
||
| Sunan Ibn Majah | سنن ابن ماجه | 4,341 |
|
||
| Musnad Ahmad | مسند أحمد | — |
|
||
| Muwatta Malik | موطأ مالك | — |
|
||
|
||
---
|
||
|
||
## API Endpoints
|
||
|
||
### Hadiths (`/hadiths`)
|
||
|
||
| Method | Endpoint | Description |
|
||
|--------|----------|-------------|
|
||
| `GET` | `/hadiths/{hadith_id}` | Full hadith details with narrator chain and topics |
|
||
| `GET` | `/hadiths/collection/{name}` | Paginated listing by collection |
|
||
| `GET` | `/hadiths/number/{collection}/{number}` | Lookup by collection name + hadith number |
|
||
| `GET` | `/hadiths/search/keyword?q=صلاة` | Arabic keyword search with filters |
|
||
| `GET` | `/hadiths/search/topic/{topic}` | Search by topic tag |
|
||
| `GET` | `/hadiths/search/narrator/{name}` | Find hadiths by narrator |
|
||
|
||
### Narrators (`/narrators`)
|
||
|
||
| Method | Endpoint | Description |
|
||
|--------|----------|-------------|
|
||
| `GET` | `/narrators/search?q=أبو هريرة` | Search by name (Arabic or transliterated) |
|
||
| `GET` | `/narrators/profile/{name_arabic}` | Full biography, hadiths, teachers, students, places |
|
||
| `GET` | `/narrators/by-generation/{gen}` | List narrators by طبقة (صحابي, تابعي, etc.) |
|
||
| `GET` | `/narrators/by-place/{place}` | Narrators associated with a place |
|
||
| `GET` | `/narrators/interactions/{name}` | All relationships for a narrator |
|
||
| `GET` | `/narrators/who-met-who?narrator_a=X&narrator_b=Y` | Shortest path between two narrators |
|
||
|
||
### Isnad Chains (`/chains`)
|
||
|
||
| Method | Endpoint | Description |
|
||
|--------|----------|-------------|
|
||
| `GET` | `/chains/hadith/{hadith_id}` | Chain as graph (nodes + links) for visualization |
|
||
| `GET` | `/chains/narrator/{name}` | All chains containing a narrator |
|
||
| `GET` | `/chains/common-chains?narrator_a=X&narrator_b=Y` | Hadiths where both narrators appear |
|
||
|
||
### Search (`/search`)
|
||
|
||
| Method | Endpoint | Description |
|
||
|--------|----------|-------------|
|
||
| `GET` | `/search/semantic?q=what did the prophet say about fasting` | Semantic search (any language) |
|
||
| `GET` | `/search/fulltext?q=الصلاة` | Arabic full-text with morphological analysis |
|
||
| `GET` | `/search/combined?q=صيام رمضان` | Both semantic + full-text in parallel |
|
||
|
||
### System
|
||
|
||
| Method | Endpoint | Description |
|
||
|--------|----------|-------------|
|
||
| `GET` | `/` | API info and endpoint listing |
|
||
| `GET` | `/health` | Health check (verifies all 4 backends) |
|
||
| `GET` | `/stats` | Database statistics |
|
||
| `GET` | `/docs` | Swagger UI |
|
||
| `GET` | `/redoc` | ReDoc documentation |
|
||
| `GET` | `/openapi.json` | OpenAPI 3.1 spec |
|
||
|
||
---
|
||
|
||
## Example Requests
|
||
|
||
### Search for hadiths about prayer
|
||
```bash
|
||
curl "https://hadith-api.betelgeusebytes.io/hadiths/search/keyword?q=صلاة&collection=Sahih%20Bukhari&grade=Sahih"
|
||
```
|
||
|
||
### Get narrator profile
|
||
```bash
|
||
curl "https://hadith-api.betelgeusebytes.io/narrators/profile/أبو%20هريرة"
|
||
```
|
||
|
||
### Semantic search (English → Arabic results)
|
||
```bash
|
||
curl "https://hadith-api.betelgeusebytes.io/search/semantic?q=what%20is%20the%20reward%20of%20prayer"
|
||
```
|
||
|
||
### Check if two narrators are connected
|
||
```bash
|
||
curl "https://hadith-api.betelgeusebytes.io/narrators/who-met-who?narrator_a=الزهري&narrator_b=أنس%20بن%20مالك"
|
||
```
|
||
|
||
### Get isnad chain for a hadith
|
||
```bash
|
||
curl "https://hadith-api.betelgeusebytes.io/chains/hadith/{hadith_uuid}"
|
||
```
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌──────────────────────────────┐
|
||
│ FastAPI Application │
|
||
│ hadith-api.betelgeusebytes.io │
|
||
└─────────┬────────────────────┘
|
||
│
|
||
┌─────────────────┼─────────────────────┐
|
||
│ │ │
|
||
┌───────▼──────┐ ┌──────▼───────┐ ┌───────────▼──────────┐
|
||
│ PostgreSQL │ │ Neo4j │ │ Qdrant + TEI │
|
||
│ 41k hadiths │ │ Knowledge │ │ Semantic search │
|
||
│ full text │ │ Graph │ │ 1024-dim BGE-M3 │
|
||
└──────────────┘ │ - Narrators │ └──────────────────────┘
|
||
│ - Chains │
|
||
│ - Places │ ┌──────────────────────┐
|
||
│ - Tribes │ │ Elasticsearch │
|
||
│ - Topics │ │ Arabic full-text │
|
||
└──────────────┘ │ morphological │
|
||
└──────────────────────┘
|
||
```
|
||
|
||
### Backend Responsibilities
|
||
|
||
| Backend | What it stores | Used by |
|
||
|---------|---------------|---------|
|
||
| **PostgreSQL** | Raw hadith text (Arabic/English/Urdu), metadata, grades | `/hadiths/*` keyword search, collection listing |
|
||
| **Neo4j** | Narrator graph, isnad chains, topics, places, tribes | `/narrators/*`, `/chains/*`, topic search |
|
||
| **Qdrant** | 1024-dim BGE-M3 embeddings for all 41k hadiths | `/search/semantic` |
|
||
| **Elasticsearch** | Arabic-analyzed hadith text index | `/search/fulltext` |
|
||
| **TEI** | BGE-M3 embedding inference (query → vector) | `/search/semantic` (query encoding) |
|
||
|
||
---
|
||
|
||
## Knowledge Graph Model
|
||
|
||
```
|
||
(:Narrator)-[:APPEARS_IN {chain_order, transmission_verb}]->(:Hadith)
|
||
(:Narrator)-[:NARRATED_FROM {hadith_ids}]->(:Narrator)
|
||
(:Narrator)-[:TEACHER_OF]->(:Narrator)
|
||
(:Narrator)-[:BORN_IN|LIVED_IN|DIED_IN|TRAVELED_TO]->(:Place)
|
||
(:Narrator)-[:BELONGS_TO_TRIBE]->(:Tribe)
|
||
(:Hadith)-[:HAS_TOPIC]->(:Topic)
|
||
```
|
||
|
||
### Narrator Properties
|
||
- `name_arabic` / `name_transliterated` — primary identifiers
|
||
- `full_nasab` — complete lineage (فلان بن فلان بن فلان)
|
||
- `kunya` — أبو/أم names
|
||
- `nisba` — attributional (-i suffix: البخاري، المدني)
|
||
- `generation` — طبقة: صحابي، تابعي، تابع التابعين
|
||
- `reliability_grade` — جرح وتعديل: ثقة، صدوق، ضعيف
|
||
- `biography_summary_arabic` / `biography_summary_english` — bilingual bios
|
||
- `birth_year_hijri` / `death_year_hijri` — dates in Hijri calendar
|
||
|
||
---
|
||
|
||
## Setup
|
||
|
||
### Prerequisites
|
||
- Python 3.12+
|
||
- Docker
|
||
- Access to PostgreSQL, Neo4j, Qdrant, Elasticsearch, TEI
|
||
|
||
### Local Development
|
||
|
||
```bash
|
||
# Clone
|
||
git clone <repo_url>
|
||
cd hadith-api
|
||
|
||
# Configure
|
||
cp .env.example .env
|
||
# Edit .env with your credentials
|
||
|
||
# Install
|
||
pip install -r requirements.txt
|
||
|
||
# Run
|
||
uvicorn app.main:app --reload --port 8000
|
||
|
||
# Open docs
|
||
open http://localhost:8000/docs
|
||
```
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Description | Default |
|
||
|----------|-------------|---------|
|
||
| `HADITH_PG_HOST` | PostgreSQL host | `pg.betelgeusebytes.io` |
|
||
| `HADITH_PG_PORT` | PostgreSQL port | `5432` |
|
||
| `HADITH_PG_DBNAME` | Database name | — |
|
||
| `HADITH_PG_USER` | Database user | — |
|
||
| `HADITH_PG_PASSWORD` | Database password | — |
|
||
| `HADITH_PG_SSLMODE` | SSL mode | `require` |
|
||
| `HADITH_NEO4J_URI` | Neo4j bolt URI | `neo4j+ssc://neo4j.betelgeusebytes.io:7687` |
|
||
| `HADITH_NEO4J_USER` | Neo4j user | `neo4j` |
|
||
| `HADITH_NEO4J_PASSWORD` | Neo4j password | — |
|
||
| `HADITH_QDRANT_HOST` | Qdrant host | `qdrant.vector.svc.cluster.local` |
|
||
| `HADITH_QDRANT_PORT` | Qdrant port | `6333` |
|
||
| `HADITH_QDRANT_COLLECTION` | Qdrant collection name | `hadiths` |
|
||
| `HADITH_ES_HOST` | Elasticsearch URL | `http://elasticsearch.elastic.svc.cluster.local:9200` |
|
||
| `HADITH_ES_INDEX` | Elasticsearch index | `hadiths` |
|
||
| `HADITH_TEI_URL` | TEI embedding service | `http://tei.ml.svc.cluster.local:80` |
|
||
|
||
---
|
||
|
||
## Deployment (Kubernetes)
|
||
|
||
### Build & Push
|
||
|
||
```bash
|
||
docker build -t axxs/hadith-api:latest .
|
||
docker push axxs/hadith-api:latest
|
||
```
|
||
|
||
### Deploy
|
||
|
||
```bash
|
||
# Edit secrets in k8s/deployment.yaml first
|
||
kubectl apply -f k8s/deployment.yaml
|
||
|
||
# Watch rollout
|
||
kubectl rollout status deployment/hadith-api -n api
|
||
|
||
# Verify
|
||
kubectl get pods -n api -l app=hadith-api
|
||
curl https://hadith-api.betelgeusebytes.io/health
|
||
```
|
||
|
||
### What gets created
|
||
- **Namespace**: `api`
|
||
- **Secret**: `hadith-api-secrets` (PG + Neo4j credentials)
|
||
- **Deployment**: 2 replicas with health checks
|
||
- **Service**: ClusterIP on port 80 → container 8000
|
||
- **Ingress**: TLS via cert-manager at `hadith-api.betelgeusebytes.io`
|
||
|
||
### Resource Limits
|
||
- Requests: 250m CPU, 256Mi RAM per pod
|
||
- Limits: 1 CPU, 512Mi RAM per pod
|
||
|
||
---
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
hadith-api/
|
||
├── app/
|
||
│ ├── main.py # FastAPI app, lifespan, health, stats
|
||
│ ├── config.py # Pydantic settings (env vars)
|
||
│ ├── models/
|
||
│ │ └── schemas.py # Response models with examples
|
||
│ ├── routers/
|
||
│ │ ├── hadiths.py # /hadiths/* — details, search, listing
|
||
│ │ ├── narrators.py # /narrators/* — profiles, relationships
|
||
│ │ ├── chains.py # /chains/* — isnad visualization
|
||
│ │ └── search.py # /search/* — semantic + full-text
|
||
│ └── services/
|
||
│ └── database.py # PG, Neo4j, Qdrant, ES connections
|
||
├── k8s/
|
||
│ └── deployment.yaml # K8s namespace + secret + deploy + svc + ingress
|
||
├── Dockerfile
|
||
├── .dockerignore
|
||
├── .env.example
|
||
├── requirements.txt
|
||
├── deploy.sh
|
||
└── README.md
|
||
```
|
||
|
||
---
|
||
|
||
## Data Pipeline
|
||
|
||
The API consumes data produced by the hadith extraction pipeline:
|
||
|
||
```
|
||
HadithAPI.com ──► PostgreSQL (41k hadiths, raw text)
|
||
│
|
||
├──► TEI (BGE-M3) ──► Qdrant (embeddings)
|
||
├──► Elasticsearch (full-text index)
|
||
└──► LLM Extraction (OpenAI/Gemini)
|
||
│
|
||
├──► Phase A: sanad/matn split, narrator chains, entities, topics
|
||
└──► Phase B: narrator biographies from classical scholarship
|
||
│
|
||
└──► MinIO (JSON) ──► Neo4j (knowledge graph)
|
||
```
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
MIT |