SimilarWeb Traffic Data Scraper — Python Example

Pull SimilarWeb traffic stats, rankings, traffic sources, similar sites, WHOIS records, and keyword density for any domain — in Python, in under 60 seconds, without a SimilarWeb subscription.

Ready-to-run Python example for extracting website analytics from SimilarWeb at scale. The official SimilarWeb Digital Research Intelligence API starts at $150+/month with enterprise contracts and rate limits; this example uses the SimilarWeb Fast Scraper Apify actor to pull the same data for $1 per 1,000 results with no monthly commitment. Analyze up to 50 domains in parallel per run, three different data modes, and export to JSON, CSV, or Google Sheets.

What this does

Most third-party tools that scrape SimilarWeb either get blocked within an hour or charge as much as the official API. This example wires up a managed scraper that handles residential IP rotation, bot-detection fingerprinting, and retries server-side — so you focus on the data, not the scraping infrastructure. Pass a list of domains, get a structured JSON response back from the Apify dataset. Works on any OS with Python 3.10 or newer.

Use cases

Competitor traffic benchmarking — pull monthly visits, bounce rate, and traffic-source split for ten competitors and feed the result into a quarterly report.
Investment due-diligence on private SaaS — verify a target company's traffic claims against three months of SimilarWeb visit history before signing a term sheet.
Lead enrichment for B2B outbound — score prospects in your CRM by website traffic so SDRs prioritize accounts that match your ICP.
Discover competitors and partnership targets — use the similar_sites mode to find 20+ alternatives to a seed domain ranked by category.
On-page SEO audit — run the aitdk mode to extract 1-to-5-word keyword n-gram densities and spot keyword stuffing on a page.
Domain due-diligence before acquisition — check WHOIS registration date, expiration, registrar, and nameservers for a domain before buying it.

Requirements

Python 3.10 or newer
A free Apify account (gives you $5/month of free credits — enough to fetch ~5,000 SimilarWeb records before paying a cent)
A SimilarWeb account is not required

Quick start

git clone https://github.com/pro100chok/similarweb-traffic-data-python.git
cd similarweb-traffic-data-python
pip install -r requirements.txt
cp .env.example .env
# Open .env and paste your APIFY_API_TOKEN (from console.apify.com/settings/integrations)
python main.py

You'll get an output.json and output.csv in the project root with traffic data for five project-management SaaS competitors. Edit the COMPETITORS list in main.py to point at your own niche.

How it works

main.py reads APIFY_API_TOKEN from your .env file.
It calls the pro100chok/similarweb-scraper actor via the official apify-client Python SDK.
The actor runs on Apify's infrastructure: rotates residential proxies, hits the SimilarWeb endpoints in parallel for each domain in your list, parses the response, and writes records to a dataset.
The Python script iterates the dataset and saves results locally as JSON + a flat CSV.

You can also call this actor directly from any language that speaks HTTP — see the actor's REST API documentation on Apify.

Example: bulk competitor traffic for a SaaS niche

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_API_TOKEN"])

run = client.actor("pro100chok/similarweb-scraper").call(run_input={
    "searchType": "similarweb",
    "domains": ["asana.com", "monday.com", "trello.com",
                "clickup.com", "notion.so"],
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["SiteName"], "→", item["Engagments"]["VisitsFormatted"])

Example output

{
  "SiteName": "asana.com",
  "Title": "Asana",
  "Category": "business_and_consumer_services/business_services",
  "GlobalRank": { "Rank": 1942 },
  "CountryRank": { "CountryCode": "US", "Rank": 1108 },
  "Engagments": {
    "Visits": 13241765,
    "VisitsFormatted": "13.24M",
    "BounceRate": 41.6,
    "PagePerVisit": 4.8,
    "TimeOnSite": 312
  },
  "EstimatedMonthlyVisits": {
    "2026-01-01": 12800000,
    "2026-02-01": 13050000,
    "2026-03-01": 13241765
  },
  "TrafficSources": {
    "Direct": 53.1, "Search": 32.7, "Social": 4.1,
    "Referrals": 8.9, "Paid Referrals": 0.9, "Mail": 0.3
  },
  "TopCountryShares": [
    { "CountryCode": "US", "Value": 32.4 },
    { "CountryCode": "IN", "Value": 7.9 },
    { "CountryCode": "GB", "Value": 5.3 }
  ]
}

Input parameters

Parameter	Type	Required	Description
`searchType`	string	yes	One of `similarweb` (traffic + rankings), `similar_sites` (competitor discovery), or `aitdk` (WHOIS + keyword density).
`domains`	string[]	yes	Up to 50 domains to analyze per run. Subdomains supported (e.g. `translate.google.com`).
`proxyConfiguration`	object	no	Apify Proxy groups (`RESIDENTIAL` recommended, `DATACENTER` faster but blocked more often) or custom proxy URLs.

Output fields (SimilarWeb mode)

Field	Description
`SiteName`	Input domain.
`Title`	Website title from SimilarWeb.
`Category`	Industry category (e.g. `e_commerce_and_shopping/marketplace`).
`GlobalRank.Rank`	Worldwide traffic rank.
`CountryRank.Rank`	Rank within the dominant traffic country.
`Engagments.Visits`	Monthly visit count (numeric).
`Engagments.VisitsFormatted`	Pre-formatted version (e.g. `"85.76B"`).
`Engagments.BounceRate`	Bounce rate as a percentage.
`Engagments.PagePerVisit`	Average pages viewed per session.
`Engagments.TimeOnSite`	Average session length in seconds.
`EstimatedMonthlyVisits`	Last three months of visit counts keyed by month.
`TrafficSources`	Percent split across `Direct`, `Search`, `Social`, `Referrals`, `Paid Referrals`, `Mail`.
`TopCountryShares`	Top 5 countries with their share of traffic.

For full schemas of similar_sites and aitdk modes, see the examples/ directory or the actor page.

More examples

File	What it demonstrates
`examples/01_basic_usage.py`	Single-domain lookup in 12 lines.
`examples/02_competitor_discovery.py`	Find the top 20 alternatives to a seed domain.
`examples/03_keyword_density_audit.py`	Pull 1-to-5-word keyword n-gram density for any URL.
`examples/04_export_to_csv.py`	Bulk lookup with pandas filtering and sorting.
`examples/05_export_to_google_sheets.py`	Append rows to a shared Google Sheet via a service account.

FAQ

How much does this actually cost? The actor charges $0.001 per result item. A run that analyzes 50 domains costs $0.05. Apify's free tier gives you $5 of monthly credits, so the first ~5,000 lookups per month are free.

Is there a free tier? Yes — Apify gives you $5 per month of platform credits forever. No credit card required to sign up.

Do I need a SimilarWeb account? No. The actor accesses the public SimilarWeb endpoints directly via residential proxies.

How is this different from the official SimilarWeb API? SimilarWeb's own API is enterprise-priced (entry tier reportedly $150+/month with quotas and contracts). This scraper costs $1 per 1,000 records, has no monthly minimum, returns the same traffic stats and rankings, and adds two extra modes (competitor discovery via similar_sites and WHOIS + keyword density via aitdk) that the official API does not surface in one place.

How fast is one run? Typical response time is 1–10 seconds per domain. A 50-domain run usually finishes in 30–90 seconds depending on which domains are in the batch.

Can I use my own proxies instead of Apify Proxy? Yes. Pass "proxyConfiguration": {"useApifyProxy": false, "proxyUrls": ["http://user:pass@host:port"]}. Make sure your proxies support per-request rotation; sticky sessions get blocked.

Can I use this commercially? The example code in this repo is MIT-licensed — use it however you like. The actor itself is governed by Apify's terms. Most SimilarWeb data is publicly available, but check your local jurisdiction's rules around web scraping before redistributing the data.

What if a domain fails partway through a batch? The actor keeps processing the rest of the domains. Failed items show up in the dataset with an error field describing the failure (e.g. WHOIS rate-limit, 403 from SimilarWeb).

Can I keep traffic data fresh on a schedule? Yes — use Apify's built-in scheduler to run the actor daily/weekly/monthly. Combine with examples/05_export_to_google_sheets.py to maintain a rolling traffic history.

Related actors

If you're building a competitive intelligence stack, check the rest of my actor catalog:

Ahrefs All-in-One SEO Scraper — DR, Backlinks, Keywords — domain rating, backlink counts, keyword data.
Semrush All-in-One Scraper — Traffic, Authority, Backlinks — Semrush data points for the same domain list.
Website Contact Scraper — Bulk Emails, Phones & Socials — enrich the same domains with contact info.

See all my actors at apify.com/pro100chok.

Troubleshooting

Symptom	Likely cause	Fix
`Actor call failed: User does not have access`	Token is invalid or revoked.	Generate a fresh token at console.apify.com/settings/integrations.
Run succeeds but returns no items	All domains failed (rare with `RESIDENTIAL` proxies).	Inspect the run's log in the Apify console — usually a typo in a domain name.
Most domains return 403/429	Datacenter IPs are being used.	Switch `proxyConfiguration` to `apifyProxyGroups: ["RESIDENTIAL"]`.
`ImportError: dotenv`	Dependencies not installed.	Run `pip install -r requirements.txt`.

License

MIT — see LICENSE.

Built on top of the SimilarWeb Fast Scraper Apify actor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimilarWeb Traffic Data Scraper — Python Example

What this does

Use cases

Requirements

Quick start

How it works

Example: bulk competitor traffic for a SaaS niche

Example output

Input parameters

Output fields (SimilarWeb mode)

More examples

FAQ

Related actors

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SimilarWeb Traffic Data Scraper — Python Example

What this does

Use cases

Requirements

Quick start

How it works

Example: bulk competitor traffic for a SaaS niche

Example output

Input parameters

Output fields (SimilarWeb mode)

More examples

FAQ

Related actors

Troubleshooting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages