I am an AI Researcher and Senior AI Engineer focused on building intelligent systems that connect rigorous research with production-grade software engineering.
My work combines LLM systems, retrieval-augmented generation, agentic workflows, computer vision, OCR, speech AI, and model optimization with secure backend engineering, cloud deployment, and measurable business impact.
I care about systems that are not only impressive in demos, but also:
- grounded in evidence and evaluation,
- reliable under real-world usage,
- optimized for latency, cost, and scalability,
- secure enough for enterprise environments,
- maintainable by engineering teams,
- and useful to end users.
const ahmad = {
identity: "AI Researcher + Senior AI Engineer",
location: "Riyadh, Saudi Arabia",
currentFocus: [
"LLM systems",
"Retrieval-Augmented Generation",
"Agentic AI workflows",
"AI evaluation and reliability",
"Arabic-first AI systems",
"Production ML deployment"
],
researchInterests: [
"GraphRAG and scientific reasoning",
"Paper-to-code generation",
"LLM evaluation and hallucination detection",
"Arabic NLP and dialect-aware AI",
"OCR and document intelligence",
"Efficient inference and model optimization"
],
engineeringStrengths: [
"Python ML systems",
"ASP.NET Core backends",
"React front ends",
"Cloud deployment",
"CI/CD and observability",
"Secure AI APIs"
]
};|
Building retrieval-augmented systems, tool-calling agents, structured-output pipelines, GraphRAG workflows, multi-agent reasoning, and human-in-the-loop automation. |
Translating papers into experiments, pipelines, evaluation frameworks, benchmarks, model adaptations, and reproducible AI prototypes. |
Shipping secure APIs, inference microservices, real-time AI interfaces, cloud deployments, CI/CD workflows, monitoring, and optimization. |
|
Arabic NLP, dialect classification, RTL-aware interfaces, multilingual assistants, speech pipelines, and semantic response matching. |
Handwriting recognition, prescription OCR, document parsing, image captioning, verification pipelines, and multimodal AI workflows. |
Quantization, ONNX export, model serving, latency tuning, cost reduction, Dockerized services, observability, and deployment automation. |
Areas: LLMs, RAG, GraphRAG, embeddings, reranking, structured outputs, prompt evaluation, guardrails, OCR, speech AI, computer vision, fine-tuning, PEFT/LoRA, RLHF-style workflows, quantization, ONNX.
Areas: ASP.NET Core, Entity Framework, REST APIs, WebSockets, React, FastAPI, Flask, Docker, IIS, AWS, GCP, CI/CD, OpenTelemetry, Prometheus, Grafana, MLflow, Weights & Biases.
A research intelligence system for AI paper discovery, forensic analysis, claim verification, and paper-to-code generation.
- Built a production-grade research assistant for analyzing AI papers with GraphRAG, document parsing, OCR, and multi-model reasoning.
- Developed a Scientific Code Forge that converts paper sections into modular Python/PyTorch codebases.
- Implemented paper claim verification by comparing extracted claims with tables, metrics, and reported results.
- Designed a scientific auditing workflow with hype scoring, adversarial debate, and consensus reporting.
- Added a repository mapping workflow that links paper concepts to relevant GitHub implementations.
Stack: Python, Asyncio, Streamlit, ChromaDB, GraphRAG, Docling, PyMuPDF, OCR, OpenAI, Anthropic, Gemini, Ollama
Repository: AI Research Intelligence Agent
Enterprise AI support automation platform combining intent classification, RAG, multilingual UX, and live operations.
- Built an AI support system that automates 60–80% of customer inquiries through intent classification and retrieval-augmented generation.
- Delivered a multilingual English/Arabic chatbot with automatic RTL handling.
- Built admin workflows for agent takeover, case management, order handling, staff administration, and operational monitoring.
- Implemented secure authentication using JWT HTTP-only cookies and role-based access control.
- Integrated analytics for latency, error rate, top intents, traffic trends, and operational quality.
Stack: Next.js, React, TypeScript, Node.js, GCP Cloud Run, Vertex AI Gemini, Firestore, BigQuery, Docker
Live Demo: NexusAI Demo
Repository: nexus-ai
Multilingual voice assistant with Arabic dialect classification and low-latency AI interaction.
- Architected a voice AI system combining Whisper v3, a fine-tuned MARBERTv2 dialect classifier, and downstream LLM reasoning.
- Designed a modular inference pipeline capable of sub-500ms response times for short utterances.
- Improved Arabic dialect recognition F1 score by 12% over a Whisper-only baseline.
- Built a real-time application using Flask, React, REST APIs, and WebSockets.
- Added orchestration flows with n8n and Elsa Workflows for retries, queues, events, and human-in-the-loop review.
Stack: Whisper, MARBERTv2, Flask, React, WebSockets, REST APIs, n8n, Elsa Workflows
Repository: AIVoiceAsisstent
Optimized Arabic character recognition system for deployment.
- Developed a CNN-based Arabic handwriting recognition model trained on 120,000 handwritten samples across 28 classes.
- Achieved 85% top-1 accuracy through preprocessing, data augmentation, and dropout regularization.
- Applied quantization-aware training and ONNX conversion.
- Reduced model size by 59% and achieved 2.3x CPU inference speedup.
- Deployed the system through a Flask API and served public requests on PythonAnywhere.
Stack: TensorFlow, Flask, ONNX, PythonAnywhere
Demo: LearnWithUs
Repository: LearnWithUs
OCR and verification pipeline for complex handwritten prescriptions.
- Designed a multitask recognition system combining GANs, CRNN, and CTC loss.
- Processed prescriptions with mixed Arabic/Latin scripts, dosage units, physician signatures, and hospital seals.
- Integrated a verification pipeline for hospital seal authenticity and dosage standardization.
- Reduced interpretation errors by 35%.
- Achieved approximately 92% character-level accuracy on a 50,000-sample dataset.
Stack: GANs, CRNN, CTC Loss, TensorFlow, OCR, Multitask Learning
Retrieval-grounded natural-language-to-SQL assistant for safer business analytics.
- Built a schema-aware NL-to-SQL assistant with retrieval grounding.
- Added validation checks to reduce hallucinated tables and columns.
- Enforced query safety constraints such as avoiding unsafe or overly broad SQL patterns.
- Designed for reliable business analytics workflows over MS SQL Server.
Stack: Python, Streamlit, MS SQL Server, Vanna, ChromaDB
Riyadh, Saudi Arabia · Dec 2024 – Present
- Designed and implemented ASP.NET Core applications with secure REST API layers using Entity Framework and DbContext.
- Built Python-backed inference microservices for AI model serving.
- Configured IIS and production environments for availability and throughput.
- Collaborated on CI/CD workflows with automated builds, tests, and multi-stage deployments.
- Built React front ends for real-time AI interactions.
- Orchestrated AI agent workflows with n8n and Elsa Workflows, including webhooks, scheduled jobs, tool chaining, retries, and human approvals.
- Applied SOLID principles, clean architecture, documentation, and onboarding runbooks.
Remote · Dec 2025 – Present
- Evaluates LLM outputs using structured rubrics for helpfulness, correctness, safety, and style.
- Supports RLHF/RLAIF-style workflows by refining prompts, identifying failure modes, and improving guidelines.
- Performs quality checks and calibration to reduce evaluator variance.
Remote · Dec 2024 – Dec 2025
- Built Python training and fine-tuning pipelines for generative image model workflows.
- Developed tools for data augmentation, feature extraction, and custom evaluation metrics.
- Supported stability, scalability, and research-to-engineering implementation work.
Remote · Sep 2024 – Feb 2025
- Built predictive models for customer behavior and market trends.
- Created SQL analytics pipelines and Tableau dashboards.
- Applied preprocessing, feature engineering, dimensionality reduction, and anomaly detection.
Remote · Sep 2022 – Dec 2024
- Delivered NLP systems for classification, summarization, and sentiment analysis.
- Built Arabic handwriting recognition and model deployment workflows.
- Applied quantization and ONNX export for production optimization.
- Built agentic NLP workflows with LangChain and adapted models using PEFT/RLHF-style methods.
Sep 2025 – Present
Current focus areas include computer vision, deep learning, and LLM applications.
Aug 2018 – Feb 2024
- Graduated with distinction.
- Ranked 2nd in class.
- Relevant coursework: Artificial Intelligence, Machine Learning, Neural Networks, Data Mining, Image and Pattern Recognition, Deep Learning, Natural Language Processing, Algorithms, Database Systems, and Cybersecurity.
| Certification | Issuer | Date |
|---|---|---|
| Develop AI-Powered Prototypes in Google AI Studio | Feb 2026 | |
| AI Software Engineer | micro1 | Oct 2025 |
| Generative AI with Large Language Models | Coursera / AWS | Oct 2024 |
| Introduction to Retrieval Augmented Generation | Duke University / Coursera | Dec 2024 |
| Intermediate Machine Learning | Kaggle | Feb 2025 |
| Feature Engineering | Kaggle | Nov 2024 |
| AWS EMEA Innovate: Migrate. Modernize. Build. | AWS | Oct 2024 |
| Azure DevOps: Intro to CI/CD | United Latino Students Association | Mar 2024 |
class ResearchDrivenAIEngineer:
"""
Building AI systems that are scientifically grounded,
production-ready, and useful in real-world workflows.
"""
def __init__(self):
self.principles = [
"Start from the problem, not the model",
"Read deeply, prototype carefully, evaluate honestly",
"Design for reliability, latency, cost, and security",
"Measure hallucinations, failure modes, and edge cases",
"Keep humans in the loop when risk is high",
"Ship systems that can be monitored, improved, and trusted"
]
def build(self, research_idea):
hypothesis = define_hypothesis(research_idea)
prototype = implement_experiment(hypothesis)
evaluation = measure_quality(prototype)
system = harden_for_production(prototype, evaluation)
monitor(system)
iterate(system)
return system- Scientific AI agents and paper-to-code systems
- GraphRAG, long-context retrieval, and knowledge-grounded reasoning
- LLM evaluation, hallucination detection, and claim verification
- Arabic NLP, dialect-aware AI, and multilingual assistants
- OCR, document intelligence, and verification pipelines
- Efficient inference, quantization, ONNX, and production deployment
- AI systems that combine research depth with enterprise reliability
Research-backed AI. Production-grade engineering. Systems that can be trusted.
