ai-scraping

Star

Here are 29 public repositories matching this topic...

firecrawl / firecrawl

Star

The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

markdown crawler data scraper ai html-to-markdown web-crawler scraping webscraping rag llm ai-scraping

Updated Sep 8, 2025
TypeScript

ScrapeGraphAI / Scrapegraph-ai

Sponsor

Star

Python scraper based on AI

markdown crawler ai html-to-markdown web-crawler scraping web-scraping rag automated-scraper scraping-python web-crawlers llm ai-scraping

Updated Aug 13, 2025
Python

D4Vinci / Scrapling

Sponsor

Star

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Updated Sep 8, 2025
Python

any4ai / AnyCrawl

Star

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

data html-to-markdown scraping webscraper crawl scrape serp rag aitools ai-scraping

Updated Sep 7, 2025
TypeScript

itsOwen / CyberScraper-2077

Sponsor

Star

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

scraper web-scraper openai webscraping gemini-api llm llm-scraper ai-scraping

Updated Aug 11, 2025
Python

raznem / parsera

Star

Lightweight library for scraping web-sites with LLMs

python opensource ai scraping data-extraction webscraping playwright llm ai-scraping

Updated Aug 25, 2025
Python

firecrawl / firecrawl-app-examples

Star

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

markdown data ai examples html-to-markdown templates web-crawler scrapers rag llm ai-scraping

Updated Jun 2, 2025
Jupyter Notebook

➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.

search markdown crawler data scraper ai html-to-markdown web-crawler scraping embeddings webscraping rag llm ai-scraping

Updated May 23, 2025
TypeScript

ArchiveBox / abx-dl

Sponsor

Star

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

cli chrome downloader curl headless scraping crawling http-client youtube-dl wget cli-tool puppeteer internet-archiving playwright archivebox yt-dlp gallery-dl ai-scraping

Updated Aug 20, 2025
JavaScript

WeebDataHoarder / go-away

Star

[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.

security mirror http-proxy ai-scraping

Updated Sep 4, 2025
Go

kaymen99 / ai-web-scraper

Star

AI web scraper built with Crawl4AI for extracting structured leads data from websites.

scraper web-scraper web-scraping ai-agents lead-generation data-scraper llms ai-scraping crawl4ai

Updated Feb 13, 2025
Python

spider-rs / web-crawling-guides

Star

How to guides on web-crawling or scraping

crawler scraper html-to-markdown web-scraping agents ai-agents ai-scraping llm-webcrawler clean-markdown fast-webcrawler

Updated Apr 26, 2025

spider-rs / spider-clients

Star

Python, Javascript, and Rust libraries for the Spider Cloud API.

crawler scraper ai spider html-to-markdown web-scraping ai-agents supabase ai-scraping llm-webcrawler

Updated Aug 29, 2025
Python

Chakszzz / NB-Scraper

Star

All Scrapers Resource Available Here! Give Us Stars🌟

open-source scraper youtube-downloader ytdl facebook-scraper scrape-websites ai-scraping nb-scraper nb-script

Updated Jul 22, 2025
TypeScript

L1shed / Turbo

Star

Fastest and cheapest distributed residential proxy network.

iaas web-scraping payment-gateway collaborate passive-income depin distributed-network node-vpn bandwidth-sharing ai-scraping node-infrastructure

Updated Aug 31, 2025
TypeScript

kaymen99 / google-maps-lead-generator

Star

Extract Google Maps business leads and enrich contact details using AI & web scraping

google-maps web-scraping google-maps-api data-scraping ai-agents lead-generation ai-scraping

Updated Jun 24, 2025
Python

oxylabs / oxylabs-ai-studio-py

Star

Oxylabs AI Studio python SDK

web-scraping ai-search proxy-scraper python-ai ai-tools web-scraping-python ai-crawler ai-web-scraper ai-scraping ai-scraper web-scraping-ai

Updated Aug 14, 2025
Python

drisskhattabi6 / AI-Scraper

Star

AI Scraper : scrap and extract data from website in any format (CSV, JSON, HTML...) using Selenium or Crawl4ai, and using Ollama or Sambanova API, and using Streamlit for UI as chatbot

Updated May 22, 2025
Python

nathabonfim59 / md-fetch

Sponsor

Star

A CLI tool and REST API that converts web content to clean Markdown, bypassing anti-scraping measures using headless browsers. Perfect for AI/LLM applications

golang scraper htmltomarkdown ai-scraping

Updated Feb 2, 2025
Go

luminati-io / llama-3-web-scraping

Star

Use LLaMA 3 and Python to extract structured data from websites like Amazon, leveraging LLM-powered parsing for resilient, AI-driven web scraping.

python web-scraping data-collection python-scraper llama-3 ai-scraping llm-scraping

Updated Apr 28, 2025

Improve this page

Add a description, image, and links to the ai-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-scraping topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-scraping

Here are 29 public repositories matching this topic...

firecrawl / firecrawl

ScrapeGraphAI / Scrapegraph-ai

D4Vinci / Scrapling

any4ai / AnyCrawl

itsOwen / CyberScraper-2077

raznem / parsera

firecrawl / firecrawl-app-examples

devflowinc / firecrawl-simple

ArchiveBox / abx-dl

WeebDataHoarder / go-away

kaymen99 / ai-web-scraper

spider-rs / web-crawling-guides

spider-rs / spider-clients

Chakszzz / NB-Scraper

L1shed / Turbo

kaymen99 / google-maps-lead-generator

oxylabs / oxylabs-ai-studio-py

drisskhattabi6 / AI-Scraper

nathabonfim59 / md-fetch

luminati-io / llama-3-web-scraping

Improve this page

Add this topic to your repo