Pull SimilarWeb traffic stats, rankings, traffic sources, similar sites, WHOIS records, and keyword density for any domain — in Python, in under 60 seconds, without a SimilarWeb subscription.
Ready-to-run Python example for extracting website analytics from SimilarWeb at scale. The official SimilarWeb Digital Research Intelligence API starts at $150+/month with enterprise contracts and rate limits; this example uses the SimilarWeb Fast Scraper Apify actor to pull the same data for $1 per 1,000 results with no monthly commitment. Analyze up to 50 domains in parallel per run, three different data modes, and export to JSON, CSV, or Google Sheets.
Most third-party tools that scrape SimilarWeb either get blocked within an hour or charge as much as the official API. This example wires up a managed scraper that handles residential IP rotation, bot-detection fingerprinting, and retries server-side — so you focus on the data, not the scraping infrastructure. Pass a list of domains, get a structured JSON response back from the Apify dataset. Works on any OS with Python 3.10 or newer.
- Competitor traffic benchmarking — pull monthly visits, bounce rate, and traffic-source split for ten competitors and feed the result into a quarterly report.
- Investment due-diligence on private SaaS — verify a target company's traffic claims against three months of SimilarWeb visit history before signing a term sheet.
- Lead enrichment for B2B outbound — score prospects in your CRM by website traffic so SDRs prioritize accounts that match your ICP.
- Discover competitors and partnership targets — use the
similar_sitesmode to find 20+ alternatives to a seed domain ranked by category. - On-page SEO audit — run the
aitdkmode to extract 1-to-5-word keyword n-gram densities and spot keyword stuffing on a page. - Domain due-diligence before acquisition — check WHOIS registration date, expiration, registrar, and nameservers for a domain before buying it.
- Python 3.10 or newer
- A free Apify account (gives you $5/month of free credits — enough to fetch ~5,000 SimilarWeb records before paying a cent)
- A SimilarWeb account is not required
git clone https://github.com/pro100chok/similarweb-traffic-data-python.git
cd similarweb-traffic-data-python
pip install -r requirements.txt
cp .env.example .env
# Open .env and paste your APIFY_API_TOKEN (from console.apify.com/settings/integrations)
python main.pyYou'll get an output.json and output.csv in the project root with traffic data for five project-management SaaS competitors. Edit the COMPETITORS list in main.py to point at your own niche.
main.pyreadsAPIFY_API_TOKENfrom your.envfile.- It calls the
pro100chok/similarweb-scraperactor via the officialapify-clientPython SDK. - The actor runs on Apify's infrastructure: rotates residential proxies, hits the SimilarWeb endpoints in parallel for each domain in your list, parses the response, and writes records to a dataset.
- The Python script iterates the dataset and saves results locally as JSON + a flat CSV.
You can also call this actor directly from any language that speaks HTTP — see the actor's REST API documentation on Apify.
import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_API_TOKEN"])
run = client.actor("pro100chok/similarweb-scraper").call(run_input={
"searchType": "similarweb",
"domains": ["asana.com", "monday.com", "trello.com",
"clickup.com", "notion.so"],
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"],
},
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["SiteName"], "→", item["Engagments"]["VisitsFormatted"]){
"SiteName": "asana.com",
"Title": "Asana",
"Category": "business_and_consumer_services/business_services",
"GlobalRank": { "Rank": 1942 },
"CountryRank": { "CountryCode": "US", "Rank": 1108 },
"Engagments": {
"Visits": 13241765,
"VisitsFormatted": "13.24M",
"BounceRate": 41.6,
"PagePerVisit": 4.8,
"TimeOnSite": 312
},
"EstimatedMonthlyVisits": {
"2026-01-01": 12800000,
"2026-02-01": 13050000,
"2026-03-01": 13241765
},
"TrafficSources": {
"Direct": 53.1, "Search": 32.7, "Social": 4.1,
"Referrals": 8.9, "Paid Referrals": 0.9, "Mail": 0.3
},
"TopCountryShares": [
{ "CountryCode": "US", "Value": 32.4 },
{ "CountryCode": "IN", "Value": 7.9 },
{ "CountryCode": "GB", "Value": 5.3 }
]
}| Parameter | Type | Required | Description |
|---|---|---|---|
searchType |
string | yes | One of similarweb (traffic + rankings), similar_sites (competitor discovery), or aitdk (WHOIS + keyword density). |
domains |
string[] | yes | Up to 50 domains to analyze per run. Subdomains supported (e.g. translate.google.com). |
proxyConfiguration |
object | no | Apify Proxy groups (RESIDENTIAL recommended, DATACENTER faster but blocked more often) or custom proxy URLs. |
| Field | Description |
|---|---|
SiteName |
Input domain. |
Title |
Website title from SimilarWeb. |
Category |
Industry category (e.g. e_commerce_and_shopping/marketplace). |
GlobalRank.Rank |
Worldwide traffic rank. |
CountryRank.Rank |
Rank within the dominant traffic country. |
Engagments.Visits |
Monthly visit count (numeric). |
Engagments.VisitsFormatted |
Pre-formatted version (e.g. "85.76B"). |
Engagments.BounceRate |
Bounce rate as a percentage. |
Engagments.PagePerVisit |
Average pages viewed per session. |
Engagments.TimeOnSite |
Average session length in seconds. |
EstimatedMonthlyVisits |
Last three months of visit counts keyed by month. |
TrafficSources |
Percent split across Direct, Search, Social, Referrals, Paid Referrals, Mail. |
TopCountryShares |
Top 5 countries with their share of traffic. |
For full schemas of similar_sites and aitdk modes, see the examples/ directory or the actor page.
| File | What it demonstrates |
|---|---|
examples/01_basic_usage.py |
Single-domain lookup in 12 lines. |
examples/02_competitor_discovery.py |
Find the top 20 alternatives to a seed domain. |
examples/03_keyword_density_audit.py |
Pull 1-to-5-word keyword n-gram density for any URL. |
examples/04_export_to_csv.py |
Bulk lookup with pandas filtering and sorting. |
examples/05_export_to_google_sheets.py |
Append rows to a shared Google Sheet via a service account. |
How much does this actually cost? The actor charges $0.001 per result item. A run that analyzes 50 domains costs $0.05. Apify's free tier gives you $5 of monthly credits, so the first ~5,000 lookups per month are free.
Is there a free tier? Yes — Apify gives you $5 per month of platform credits forever. No credit card required to sign up.
Do I need a SimilarWeb account? No. The actor accesses the public SimilarWeb endpoints directly via residential proxies.
How is this different from the official SimilarWeb API?
SimilarWeb's own API is enterprise-priced (entry tier reportedly $150+/month with quotas and contracts). This scraper costs $1 per 1,000 records, has no monthly minimum, returns the same traffic stats and rankings, and adds two extra modes (competitor discovery via similar_sites and WHOIS + keyword density via aitdk) that the official API does not surface in one place.
How fast is one run? Typical response time is 1–10 seconds per domain. A 50-domain run usually finishes in 30–90 seconds depending on which domains are in the batch.
Can I use my own proxies instead of Apify Proxy?
Yes. Pass "proxyConfiguration": {"useApifyProxy": false, "proxyUrls": ["http://user:pass@host:port"]}. Make sure your proxies support per-request rotation; sticky sessions get blocked.
Can I use this commercially? The example code in this repo is MIT-licensed — use it however you like. The actor itself is governed by Apify's terms. Most SimilarWeb data is publicly available, but check your local jurisdiction's rules around web scraping before redistributing the data.
What if a domain fails partway through a batch?
The actor keeps processing the rest of the domains. Failed items show up in the dataset with an error field describing the failure (e.g. WHOIS rate-limit, 403 from SimilarWeb).
Can I keep traffic data fresh on a schedule?
Yes — use Apify's built-in scheduler to run the actor daily/weekly/monthly. Combine with examples/05_export_to_google_sheets.py to maintain a rolling traffic history.
If you're building a competitive intelligence stack, check the rest of my actor catalog:
- Ahrefs All-in-One SEO Scraper — DR, Backlinks, Keywords — domain rating, backlink counts, keyword data.
- Semrush All-in-One Scraper — Traffic, Authority, Backlinks — Semrush data points for the same domain list.
- Website Contact Scraper — Bulk Emails, Phones & Socials — enrich the same domains with contact info.
See all my actors at apify.com/pro100chok.
| Symptom | Likely cause | Fix |
|---|---|---|
Actor call failed: User does not have access |
Token is invalid or revoked. | Generate a fresh token at console.apify.com/settings/integrations. |
| Run succeeds but returns no items | All domains failed (rare with RESIDENTIAL proxies). |
Inspect the run's log in the Apify console — usually a typo in a domain name. |
| Most domains return 403/429 | Datacenter IPs are being used. | Switch proxyConfiguration to apifyProxyGroups: ["RESIDENTIAL"]. |
ImportError: dotenv |
Dependencies not installed. | Run pip install -r requirements.txt. |
MIT — see LICENSE.
Built on top of the SimilarWeb Fast Scraper Apify actor.