Sources
OpenQalam searches across 35,779 hadith from 7 classical collections. Every text is stored verbatim with full Arabic and English, and each is individually searchable.
Hadith Collections
Sahih al-Bukhari
7,276 hadithImam Muhammad ibn Ismail al-Bukhari (810-870 CE)
The most authenticated collection of hadith, universally accepted by Sunni scholars. Every hadith meets the strictest criteria of chain authenticity.
Sahih Muslim
7,458 hadithImam Muslim ibn al-Hajjaj (821-875 CE)
The second most authenticated collection. Together with Bukhari, these two are known as "the two Sahihs" (al-Sahihayn).
Sunan Abi Dawud
5,272 hadithImam Abu Dawud al-Sijistani (817-889 CE)
Focused on hadith related to legal rulings (fiqh). Contains hadith of varying grades — each is labeled with its authentication status.
Jami at-Tirmidhi
3,926 hadithImam Abu Isa al-Tirmidhi (824-892 CE)
Notable for including grading commentary by the compiler himself. Often cites legal opinions of early scholars alongside the hadith.
Sunan an-Nasa'i
5,679 hadithImam Ahmad ibn Shu'ayb an-Nasa'i (829-915 CE)
Known for its strict criteria, considered by some scholars as second only to the two Sahihs in authenticity standards.
Sunan Ibn Majah
4,340 hadithImam Muhammad ibn Yazid ibn Majah (824-887 CE)
Completes the "Six Books" (al-Kutub al-Sitta). Contains some unique hadith not found in the other five collections.
Muwatta Malik
1,828 hadithImam Malik ibn Anas (711-795 CE)
One of the earliest compiled collections of hadith and legal opinions. Represents the practice of the people of Madinah.
How Data is Ingested
Hadith texts are sourced from open-source GitHub repositories that compile data from sunnah.com. Each hadith goes through this pipeline:
- Raw data parsed from JSON source files (Arabic + English + metadata)
- Arabic text enriched from secondary sources where the primary source lacked it
- Each hadith embedded as a 1024-dimensional vector using Voyage AI
- Stored in PostgreSQL with full-text search index for keyword matching
- Post-ingestion validation: count verification, embedding checks, grade coverage
Scholar Insights
Peer-reviewed articles from Yaqeen Institute are live. Each article is chunked by heading and embedded for semantic search. Source attribution and original URLs are shown with every result.
Contemporary Voices
Timestamped lecture clips from YouTube scholars are live. Transcripts are auto-generated and used as a search index to surface relevant clips — the video itself is the authoritative source. Every clip is labeled: “Auto-generated transcript — watch video for exact wording.”
Data Quality
We take data quality seriously. Every ingestion pipeline includes validation checks, and we cross-reference against sunnah.com as the authoritative reference. If you notice an error in any hadith text, grading, or attribution, please report it on GitHub or email [email protected].