
Installation
Install the package using pip:Features
- AI-Powered Extraction: Advanced web scraping using artificial intelligence
- Flexible Clients: Both synchronous and asynchronous support
- Type Safety: Structured output with Pydantic schemas
- Production Ready: Detailed logging and automatic retries
- Developer Friendly: Comprehensive error handling
Quick Start
Initialize the client with your API key:You can also set the
SGAI_API_KEY
environment variable and initialize the client without parameters: client = Client()
Services
SmartScraper
Extract specific information from any webpage using AI:Parameters
Parameter | Type | Required | Description |
---|---|---|---|
website_url | string | Yes | The URL of the webpage that needs to be scraped. |
user_prompt | string | Yes | A textual description of what you want to achieve. |
output_schema | object | No | The Pydantic object that describes the structure and format of the response. |
render_heavy_js | boolean | No | Enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: False |
Basic Schema Example
Basic Schema Example
Define a simple schema for basic data extraction:
Advanced Schema Example
Advanced Schema Example
Define a complex schema for nested data structures:
Enhanced JavaScript Rendering Example
Enhanced JavaScript Rendering Example
For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:When to use
render_heavy_js
:- React, Vue, or Angular applications
- Single Page Applications (SPAs)
- Sites with heavy client-side rendering
- Dynamic content loaded via JavaScript
- Interactive elements that depend on JavaScript execution
SearchScraper
Search and extract information from multiple web sources using AI:Parameters
Parameter | Type | Required | Description |
---|---|---|---|
user_prompt | string | Yes | A textual description of what you want to achieve. |
num_results | number | No | Number of websites to search (3-20). Default: 3. |
extraction_mode | boolean | No | True = AI extraction mode (10 credits/page), False = markdown mode (2 credits/page). Default: True |
output_schema | object | No | The Pydantic object that describes the structure and format of the response (AI extraction mode only) |
Basic Schema Example
Basic Schema Example
Define a simple schema for structured search results:
Advanced Schema Example
Advanced Schema Example
Define a complex schema for comprehensive market research:
Markdown Mode Example
Markdown Mode Example
Use markdown mode for cost-effective content gathering:Markdown Mode Benefits:
- Cost-effective: Only 2 credits per page (vs 10 credits for AI extraction)
- Full content: Get complete page content in markdown format
- Faster: No AI processing overhead
- Perfect for: Content analysis, bulk data collection, building datasets
Markdownify
Convert any webpage into clean, formatted markdown:Async Support
All endpoints support asynchronous operations:Feedback
Help us improve by submitting feedback programmatically:Support
License
License
This project is licensed under the MIT License. See the LICENSE file for details.