Newest 'beautifulsoup' Questions

-3 votes

0 answers

88 views

How to download product images by SKU from a website using Python? [closed]

I have an Excel (or CSV) file with a list of product SKUs (like 1806B, 1911B, HR2470, etc.), and I would like to write a Python script that does the following: Use each SKU to search for the product ...

Bence Szabó

1

asked 17 hours ago

-6 votes

0 answers

212 views

How can I extract the caption/description from a single Instagram post URL in Python? [closed]

I’m trying to write a simple program that can take one Instagram post link and extract the caption/description of that particular post. For example, given a link like: https://www.instagram.com/p/...

ajay srinivas

11

asked yesterday

3 votes

1 answer

45 views

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...

James Brian

33

asked Aug 30 at 17:29

1 vote

1 answer

137 views

Trouble scraping dynamic lottery results table – inconsistent parsing

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...

Zuryab

11

asked Aug 27 at 10:50

-8 votes

0 answers

85 views

Python 3.9, get in MS excel ALL physical addresses from the URL -> https://www.sappi.com/en-gb/about-us/locations [closed]

Get all physical addresses in MS-excel from this url [https://www.sappi.com/en-gb/about-us/locations]. no output from the code. import requests from bs4 import BeautifulSoup import pandas as pd url = ...

James

39

asked Aug 26 at 10:42

0 votes

2 answers

57 views

Get the attribute data by another attribute beautifulsoup

I want to parse the HTML like this below with beautiful soup . . <meta property="og:image" content="https://test.com/test.jp" /> <meta property="og:description" ...

whitebear

12.6k

asked Aug 25 at 4:44

-2 votes

1 answer

58 views

How to use Beautiful Soup to find partial links [closed]

I have an eBay page in which I would like to formulate a list of all the item numbers on that page. I have executed and parsed the HTML content using requests and Beautiful Soup, but I can't figure ...

Travis Ward

1

asked Aug 5 at 13:54

0 votes

2 answers

107 views

How to use index to find position of JSON record [closed]

Is there a better way than iteration using a for loop to find the index of the record? My problem is that to use index I seem to need the index of the record I'm seeking. import json from bs4 import ...

Peter Hill

53

asked Aug 2 at 17:50

4 votes

2 answers

273 views

How to reliably download 1969 “Gazzetta Ufficiale” PDFs (Italian Official Gazette) with Python?

I’m trying to programmatically download the full “pubblicazione completa non certificata” PDFs of the Italian Gazzetta Ufficiale – Serie Generale for 1969 (for an academic article). The site has a ...

Mark

1,801

asked Aug 1 at 13:13

-1 votes

0 answers

13 views

Requests cannot give google search result page HTML properly [closed]

I am trying to extract HTML to use for BeautifulSoup from a search result page. This is my code: import requests from bs4 import BeautifulSoup page_url = 'https://www.google.com/search?q=how+to+...

Aadvik

1,304

asked Jul 29 at 18:00

-1 votes

2 answers

50 views

Puppeteer can't access var doc in javascript

I am trying to scrape a web page using puppeteer, however, I can't access var doc with puppeteer. Although I can see it in the source page of my web browser var rows = []; var i = 1; /* while(i <= ...

M.M. CAN

1

asked Jul 15 at 16:17

0 votes

1 answer

158 views

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

I'm scraping a large list of URLs (1.2 million) using Selenium + BeautifulSoup with Python's multiprocessing.Pool. I want to scale it up to scrape faster, ideally without hitting system resource ...

SolidOpt

113

asked Jul 10 at 6:52

-3 votes

2 answers

63 views

Beautifulsoap - reading multiply pages breaks after random valid reads

I'm reading some data about books title etc from number of pages. Python 3.10.13 Breautifulsoap 4.12.3 Code: def scrapSite(URL): headers = {"User-Agent": "Mozilla/5.0 (Windows NT ...

Error Replicator

288

asked Jun 20 at 21:13

1 vote

1 answer

61 views

How to use BeautifulSoup find_all() to get a class with multiple classes?

I am not sure if the terminology "class with multiple classes" is correct but that is the best I can describe it. import requests from bs4 import BeautifulSoup url="https://curiosa.io/...

user30854646

39

asked Jun 20 at 20:57

2 votes

2 answers

105 views

Using beautiful soup (or maybe some other library) to scrape data from an .aspx webpage containing checkboxes

I was wondering if someone could help me out with a web scraping problem.. I am new to both python and web scraping.. I am trying to use the below python program to scrape data from the following ...

new_coder

43

asked Jun 17 at 18:01

Collectives™ on Stack Overflow

How to download product images by SKU from a website using Python? [closed]

How can I extract the caption/description from a single Instagram post URL in Python? [closed]

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

Trouble scraping dynamic lottery results table – inconsistent parsing

Python 3.9, get in MS excel ALL physical addresses from the URL -> https://www.sappi.com/en-gb/about-us/locations [closed]

Get the attribute data by another attribute beautifulsoup

How to use Beautiful Soup to find partial links [closed]

How to use index to find position of JSON record [closed]

How to reliably download 1969 “Gazzetta Ufficiale” PDFs (Italian Official Gazette) with Python?

Requests cannot give google search result page HTML properly [closed]

Puppeteer can't access var doc in javascript

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

Beautifulsoap - reading multiply pages breaks after random valid reads

How to use BeautifulSoup find_all() to get a class with multiple classes?

Using beautiful soup (or maybe some other library) to scrape data from an .aspx webpage containing checkboxes

Hot Network Questions