Skip to main content
Filter by
Sorted by
Tagged with
-3 votes
0 answers
88 views

How to download product images by SKU from a website using Python? [closed]

I have an Excel (or CSV) file with a list of product SKUs (like 1806B, 1911B, HR2470, etc.), and I would like to write a Python script that does the following: Use each SKU to search for the product ...
Bence SzabΓ³'s user avatar
-6 votes
0 answers
212 views

How can I extract the caption/description from a single Instagram post URL in Python? [closed]

I’m trying to write a simple program that can take one Instagram post link and extract the caption/description of that particular post. For example, given a link like: https://www.instagram.com/p/...
ajay srinivas's user avatar
3 votes
1 answer
45 views

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...
James Brian's user avatar
1 vote
1 answer
137 views

Trouble scraping dynamic lottery results table – inconsistent parsing

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...
Zuryab's user avatar
  • 11
-8 votes
0 answers
85 views

Python 3.9, get in MS excel ALL physical addresses from the URL -> https://www.sappi.com/en-gb/about-us/locations [closed]

Get all physical addresses in MS-excel from this url [https://www.sappi.com/en-gb/about-us/locations]. no output from the code. import requests from bs4 import BeautifulSoup import pandas as pd url = ...
James's user avatar
  • 39
0 votes
2 answers
57 views

Get the attribute data by another attribute beautifulsoup

I want to parse the HTML like this below with beautiful soup . . <meta property="og:image" content="https://test.com/test.jp" /> <meta property="og:description" ...
whitebear's user avatar
  • 12.6k
-2 votes
1 answer
58 views

How to use Beautiful Soup to find partial links [closed]

I have an eBay page in which I would like to formulate a list of all the item numbers on that page. I have executed and parsed the HTML content using requests and Beautiful Soup, but I can't figure ...
Travis Ward's user avatar
0 votes
2 answers
107 views

How to use index to find position of JSON record [closed]

Is there a better way than iteration using a for loop to find the index of the record? My problem is that to use index I seem to need the index of the record I'm seeking. import json from bs4 import ...
Peter Hill's user avatar
4 votes
2 answers
273 views

How to reliably download 1969 β€œGazzetta Ufficiale” PDFs (Italian Official Gazette) with Python?

I’m trying to programmatically download the full β€œpubblicazione completa non certificata” PDFs of the Italian Gazzetta Ufficiale – Serie Generale for 1969 (for an academic article). The site has a ...
Mark's user avatar
  • 1,801
-1 votes
0 answers
13 views

Requests cannot give google search result page HTML properly [closed]

I am trying to extract HTML to use for BeautifulSoup from a search result page. This is my code: import requests from bs4 import BeautifulSoup page_url = 'https://www.google.com/search?q=how+to+...
Aadvik's user avatar
  • 1,304
-1 votes
2 answers
50 views

Puppeteer can't access var doc in javascript

I am trying to scrape a web page using puppeteer, however, I can't access var doc with puppeteer. Although I can see it in the source page of my web browser var rows = []; var i = 1; /* while(i <= ...
M.M. CAN's user avatar
0 votes
1 answer
158 views

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

I'm scraping a large list of URLs (1.2 million) using Selenium + BeautifulSoup with Python's multiprocessing.Pool. I want to scale it up to scrape faster, ideally without hitting system resource ...
SolidOpt's user avatar
  • 113
-3 votes
2 answers
63 views

Beautifulsoap - reading multiply pages breaks after random valid reads

I'm reading some data about books title etc from number of pages. Python 3.10.13 Breautifulsoap 4.12.3 Code: def scrapSite(URL): headers = {"User-Agent": "Mozilla/5.0 (Windows NT ...
Error Replicator's user avatar
1 vote
1 answer
61 views

How to use BeautifulSoup find_all() to get a class with multiple classes?

I am not sure if the terminology "class with multiple classes" is correct but that is the best I can describe it. import requests from bs4 import BeautifulSoup url="https://curiosa.io/...
user30854646's user avatar
2 votes
2 answers
105 views

Using beautiful soup (or maybe some other library) to scrape data from an .aspx webpage containing checkboxes

I was wondering if someone could help me out with a web scraping problem.. I am new to both python and web scraping.. I am trying to use the below python program to scrape data from the following ...
new_coder's user avatar

15 30 50 per page
1
2 3 4 5
…
2190