32,838 questions
-3
votes
0
answers
88
views
How to download product images by SKU from a website using Python? [closed]
I have an Excel (or CSV) file with a list of product SKUs (like 1806B, 1911B, HR2470, etc.), and I would like to write a Python script that does the following:
Use each SKU to search for the product ...
-6
votes
0
answers
212
views
How can I extract the caption/description from a single Instagram post URL in Python? [closed]
Iβm trying to write a simple program that can take one Instagram post link and extract the caption/description of that particular post.
For example, given a link like:
https://www.instagram.com/p/...
3
votes
1
answer
45
views
Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working
I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc.
The ...
1
vote
1
answer
137
views
Trouble scraping dynamic lottery results table β inconsistent parsing
Iβve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...
-8
votes
0
answers
85
views
Python 3.9, get in MS excel ALL physical addresses from the URL -> https://www.sappi.com/en-gb/about-us/locations [closed]
Get all physical addresses in MS-excel from this url [https://www.sappi.com/en-gb/about-us/locations]. no output from the code.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = ...
0
votes
2
answers
57
views
Get the attribute data by another attribute beautifulsoup
I want to parse the HTML like this below with beautiful soup
.
.
<meta property="og:image" content="https://test.com/test.jp" />
<meta property="og:description" ...
-2
votes
1
answer
58
views
How to use Beautiful Soup to find partial links [closed]
I have an eBay page in which I would like to formulate a list of all the item numbers on that page. I have executed and parsed the HTML content using requests and Beautiful Soup, but I can't figure ...
0
votes
2
answers
107
views
How to use index to find position of JSON record [closed]
Is there a better way than iteration using a for loop to find the index of the record?
My problem is that to use index I seem to need the index of the record I'm seeking.
import json
from bs4 import ...
4
votes
2
answers
273
views
How to reliably download 1969 βGazzetta Ufficialeβ PDFs (Italian Official Gazette) with Python?
Iβm trying to programmatically download the full βpubblicazione completa non certificataβ PDFs of the Italian Gazzetta Ufficiale β Serie Generale for 1969 (for an academic article). The site has a ...
-1
votes
0
answers
13
views
Requests cannot give google search result page HTML properly [closed]
I am trying to extract HTML to use for BeautifulSoup from a search result page. This is my code:
import requests
from bs4 import BeautifulSoup
page_url = 'https://www.google.com/search?q=how+to+...
-1
votes
2
answers
50
views
Puppeteer can't access var doc in javascript
I am trying to scrape a web page using puppeteer, however, I can't access var doc with puppeteer. Although I can see it in the source page of my web browser
var rows = [];
var i = 1;
/* while(i <= ...
0
votes
1
answer
158
views
How can I speed up my Selenium scraper using multiprocessing in Python? [closed]
I'm scraping a large list of URLs (1.2 million) using Selenium + BeautifulSoup with Python's multiprocessing.Pool. I want to scale it up to scrape faster, ideally without hitting system resource ...
-3
votes
2
answers
63
views
Beautifulsoap - reading multiply pages breaks after random valid reads
I'm reading some data about books title etc from number of pages.
Python 3.10.13
Breautifulsoap 4.12.3
Code:
def scrapSite(URL):
headers = {"User-Agent": "Mozilla/5.0 (Windows NT ...
1
vote
1
answer
61
views
How to use BeautifulSoup find_all() to get a class with multiple classes?
I am not sure if the terminology "class with multiple classes" is correct but that is the best I can describe it.
import requests
from bs4 import BeautifulSoup
url="https://curiosa.io/...
2
votes
2
answers
105
views
Using beautiful soup (or maybe some other library) to scrape data from an .aspx webpage containing checkboxes
I was wondering if someone could help me out with a web scraping problem.. I am new to both python and web scraping..
I am trying to use the below python program to scrape data from the following ...