Skip to content

PRO100CHOK/similarweb-traffic-data-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimilarWeb Traffic Data Scraper — Python Example

Pull SimilarWeb traffic stats, rankings, traffic sources, similar sites, WHOIS records, and keyword density for any domain — in Python, in under 60 seconds, without a SimilarWeb subscription.

Apify Actor Python 3.10+ License: MIT

Ready-to-run Python example for extracting website analytics from SimilarWeb at scale. The official SimilarWeb Digital Research Intelligence API starts at $150+/month with enterprise contracts and rate limits; this example uses the SimilarWeb Fast Scraper Apify actor to pull the same data for $1 per 1,000 results with no monthly commitment. Analyze up to 50 domains in parallel per run, three different data modes, and export to JSON, CSV, or Google Sheets.

What this does

Most third-party tools that scrape SimilarWeb either get blocked within an hour or charge as much as the official API. This example wires up a managed scraper that handles residential IP rotation, bot-detection fingerprinting, and retries server-side — so you focus on the data, not the scraping infrastructure. Pass a list of domains, get a structured JSON response back from the Apify dataset. Works on any OS with Python 3.10 or newer.

Use cases

  • Competitor traffic benchmarking — pull monthly visits, bounce rate, and traffic-source split for ten competitors and feed the result into a quarterly report.
  • Investment due-diligence on private SaaS — verify a target company's traffic claims against three months of SimilarWeb visit history before signing a term sheet.
  • Lead enrichment for B2B outbound — score prospects in your CRM by website traffic so SDRs prioritize accounts that match your ICP.
  • Discover competitors and partnership targets — use the similar_sites mode to find 20+ alternatives to a seed domain ranked by category.
  • On-page SEO audit — run the aitdk mode to extract 1-to-5-word keyword n-gram densities and spot keyword stuffing on a page.
  • Domain due-diligence before acquisition — check WHOIS registration date, expiration, registrar, and nameservers for a domain before buying it.

Requirements

  • Python 3.10 or newer
  • A free Apify account (gives you $5/month of free credits — enough to fetch ~5,000 SimilarWeb records before paying a cent)
  • A SimilarWeb account is not required

Quick start

git clone https://github.com/pro100chok/similarweb-traffic-data-python.git
cd similarweb-traffic-data-python
pip install -r requirements.txt
cp .env.example .env
# Open .env and paste your APIFY_API_TOKEN (from console.apify.com/settings/integrations)
python main.py

You'll get an output.json and output.csv in the project root with traffic data for five project-management SaaS competitors. Edit the COMPETITORS list in main.py to point at your own niche.

How it works

  1. main.py reads APIFY_API_TOKEN from your .env file.
  2. It calls the pro100chok/similarweb-scraper actor via the official apify-client Python SDK.
  3. The actor runs on Apify's infrastructure: rotates residential proxies, hits the SimilarWeb endpoints in parallel for each domain in your list, parses the response, and writes records to a dataset.
  4. The Python script iterates the dataset and saves results locally as JSON + a flat CSV.

You can also call this actor directly from any language that speaks HTTP — see the actor's REST API documentation on Apify.

Example: bulk competitor traffic for a SaaS niche

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_API_TOKEN"])

run = client.actor("pro100chok/similarweb-scraper").call(run_input={
    "searchType": "similarweb",
    "domains": ["asana.com", "monday.com", "trello.com",
                "clickup.com", "notion.so"],
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["SiteName"], "→", item["Engagments"]["VisitsFormatted"])

Example output

{
  "SiteName": "asana.com",
  "Title": "Asana",
  "Category": "business_and_consumer_services/business_services",
  "GlobalRank": { "Rank": 1942 },
  "CountryRank": { "CountryCode": "US", "Rank": 1108 },
  "Engagments": {
    "Visits": 13241765,
    "VisitsFormatted": "13.24M",
    "BounceRate": 41.6,
    "PagePerVisit": 4.8,
    "TimeOnSite": 312
  },
  "EstimatedMonthlyVisits": {
    "2026-01-01": 12800000,
    "2026-02-01": 13050000,
    "2026-03-01": 13241765
  },
  "TrafficSources": {
    "Direct": 53.1, "Search": 32.7, "Social": 4.1,
    "Referrals": 8.9, "Paid Referrals": 0.9, "Mail": 0.3
  },
  "TopCountryShares": [
    { "CountryCode": "US", "Value": 32.4 },
    { "CountryCode": "IN", "Value": 7.9 },
    { "CountryCode": "GB", "Value": 5.3 }
  ]
}

Input parameters

Parameter Type Required Description
searchType string yes One of similarweb (traffic + rankings), similar_sites (competitor discovery), or aitdk (WHOIS + keyword density).
domains string[] yes Up to 50 domains to analyze per run. Subdomains supported (e.g. translate.google.com).
proxyConfiguration object no Apify Proxy groups (RESIDENTIAL recommended, DATACENTER faster but blocked more often) or custom proxy URLs.

Output fields (SimilarWeb mode)

Field Description
SiteName Input domain.
Title Website title from SimilarWeb.
Category Industry category (e.g. e_commerce_and_shopping/marketplace).
GlobalRank.Rank Worldwide traffic rank.
CountryRank.Rank Rank within the dominant traffic country.
Engagments.Visits Monthly visit count (numeric).
Engagments.VisitsFormatted Pre-formatted version (e.g. "85.76B").
Engagments.BounceRate Bounce rate as a percentage.
Engagments.PagePerVisit Average pages viewed per session.
Engagments.TimeOnSite Average session length in seconds.
EstimatedMonthlyVisits Last three months of visit counts keyed by month.
TrafficSources Percent split across Direct, Search, Social, Referrals, Paid Referrals, Mail.
TopCountryShares Top 5 countries with their share of traffic.

For full schemas of similar_sites and aitdk modes, see the examples/ directory or the actor page.

More examples

File What it demonstrates
examples/01_basic_usage.py Single-domain lookup in 12 lines.
examples/02_competitor_discovery.py Find the top 20 alternatives to a seed domain.
examples/03_keyword_density_audit.py Pull 1-to-5-word keyword n-gram density for any URL.
examples/04_export_to_csv.py Bulk lookup with pandas filtering and sorting.
examples/05_export_to_google_sheets.py Append rows to a shared Google Sheet via a service account.

FAQ

How much does this actually cost? The actor charges $0.001 per result item. A run that analyzes 50 domains costs $0.05. Apify's free tier gives you $5 of monthly credits, so the first ~5,000 lookups per month are free.

Is there a free tier? Yes — Apify gives you $5 per month of platform credits forever. No credit card required to sign up.

Do I need a SimilarWeb account? No. The actor accesses the public SimilarWeb endpoints directly via residential proxies.

How is this different from the official SimilarWeb API? SimilarWeb's own API is enterprise-priced (entry tier reportedly $150+/month with quotas and contracts). This scraper costs $1 per 1,000 records, has no monthly minimum, returns the same traffic stats and rankings, and adds two extra modes (competitor discovery via similar_sites and WHOIS + keyword density via aitdk) that the official API does not surface in one place.

How fast is one run? Typical response time is 1–10 seconds per domain. A 50-domain run usually finishes in 30–90 seconds depending on which domains are in the batch.

Can I use my own proxies instead of Apify Proxy? Yes. Pass "proxyConfiguration": {"useApifyProxy": false, "proxyUrls": ["http://user:pass@host:port"]}. Make sure your proxies support per-request rotation; sticky sessions get blocked.

Can I use this commercially? The example code in this repo is MIT-licensed — use it however you like. The actor itself is governed by Apify's terms. Most SimilarWeb data is publicly available, but check your local jurisdiction's rules around web scraping before redistributing the data.

What if a domain fails partway through a batch? The actor keeps processing the rest of the domains. Failed items show up in the dataset with an error field describing the failure (e.g. WHOIS rate-limit, 403 from SimilarWeb).

Can I keep traffic data fresh on a schedule? Yes — use Apify's built-in scheduler to run the actor daily/weekly/monthly. Combine with examples/05_export_to_google_sheets.py to maintain a rolling traffic history.

Related actors

If you're building a competitive intelligence stack, check the rest of my actor catalog:

See all my actors at apify.com/pro100chok.

Troubleshooting

Symptom Likely cause Fix
Actor call failed: User does not have access Token is invalid or revoked. Generate a fresh token at console.apify.com/settings/integrations.
Run succeeds but returns no items All domains failed (rare with RESIDENTIAL proxies). Inspect the run's log in the Apify console — usually a typo in a domain name.
Most domains return 403/429 Datacenter IPs are being used. Switch proxyConfiguration to apifyProxyGroups: ["RESIDENTIAL"].
ImportError: dotenv Dependencies not installed. Run pip install -r requirements.txt.

License

MIT — see LICENSE.


Built on top of the SimilarWeb Fast Scraper Apify actor.

Releases

No releases published

Packages

 
 
 

Contributors

Languages