Newest 'web-scraping' Questions

-3 votes

0 answers

46 views

How to download product images by SKU from a website using Python?

I have an Excel (or CSV) file with a list of product SKUs (like 1806B, 1911B, HR2470, etc.), and I would like to write a Python script that does the following: Use each SKU to search for the product ...

Bence Szabó

1

asked 6 hours ago

-6 votes

0 answers

183 views

How can I extract the caption/description from a single Instagram post URL in Python? [closed]

I’m trying to write a simple program that can take one Instagram post link and extract the caption/description of that particular post. For example, given a link like: https://www.instagram.com/p/...

ajay srinivas

11

asked yesterday

-4 votes

0 answers

61 views

How to decode utf-8 text from newspaper3k library

class ArticleScraper: def __init__(self): pass def articleScraper(self, article_links): article_content = [] for url in article_links: url_i = ...

NYT_ SKY

21

asked 2 days ago

-8 votes

0 answers

84 views

Beginner in Python and Web Scraping – Looking for Feedback on My Script [closed]

I’m a software engineering student currently doing an internship in the Business Intelligence area at a university. As part of a project, I decided to create a script that scrapes job postings from a ...

Dillan Real

1

asked Aug 28 at 22:36

1 vote

1 answer

137 views

Trouble scraping dynamic lottery results table – inconsistent parsing

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...

Zuryab

11

asked Aug 27 at 10:50

-8 votes

0 answers

85 views

Python 3.9, get in MS excel ALL physical addresses from the URL -> https://www.sappi.com/en-gb/about-us/locations [closed]

Get all physical addresses in MS-excel from this url [https://www.sappi.com/en-gb/about-us/locations]. no output from the code. import requests from bs4 import BeautifulSoup import pandas as pd url = ...

James

39

asked Aug 26 at 10:42

2 votes

1 answer

152 views

Extract tables from website with dynamic content with R

I'm trying to extract tables from this site: https://www.dnb.com/business-directory/company-information.beverage_manufacturing.br.html As you can see, the complete table has 14,387 rows and each page ...

Alejandro Carrera

581

asked Aug 25 at 1:22

0 votes

0 answers

52 views

Disable assignment of window.location in Selenium

I'm trying to extract data from a website using Selenium. On random occasions, the page will do a client-side redirect with window.location. How can I disable this? I've tried redefining the property ...

anon

699

asked Aug 23 at 21:02

-4 votes

0 answers

81 views

How to fetch real-time updates from an API without CDN-induced delays? [closed]

I’m building a service that monitors announcements from Upbit. The main announcements page is here: https://upbit.com/service_center/notice That page fetches its data from this API endpoint: https://...

akshdyxjsbsgzhbssh

1

asked Aug 23 at 12:42

1 vote

1 answer

51 views

Firecrawl self-hosted crawler throws Connection violated security rules error

I set up a self-hosted Firecrawl instance and I want to crawl my internal intranet site (e.g. https://intranet.xxx.gov.tr/). I can access the site directly both from the host machine and from inside ...

birdalugur

305

asked Aug 22 at 13:47

0 votes

1 answer

92 views

Python Selenium find nested element [closed]

on this page I want to parse few elements. I would like to get text in circles and use attribute value to click sometimes. That code returns anything. With this code I want to get all attribute ...

Rok Golob

19

asked Aug 22 at 6:57

2 votes

1 answer

79 views

How to disable selenium logs AND run the browser in headless mode

This is my code as of now: from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.service import Service options = webdriver....

Ahmad

139

asked Aug 19 at 19:16

1 vote

3 answers

132 views

How to scrape a website that has <span class="ellipsis">…</span> in between number on a dynamic table with sellenium python

I am trying to scrape dividend data for the stock "Vale" on the site https://investidor10.com.br/acoes/vale3/. The dividend table has 8 buttons (1, 2, 3, ..., 8) and "Next" and &...

user30126350

35

asked Aug 14 at 10:04

0 votes

1 answer

108 views

Pytube consistently fails with HTTP Error 400: Bad Request also on latest version

I am trying to use pytube (v15.0.0) to fetch the titles of YouTube videos. However, for every video I try, my script fails with the same error: HTTP Error 400: Bad Request. I have already updated ...

Rohit Hake

1

asked Aug 14 at 9:42

0 votes

0 answers

92 views

m3u8 HLS url VIdeo Not Playing with hls.js and Art Player

I have a node Scraper Which Scrapes the HLS streaming url using Playwright Browser which gives the master Playlist like: https://example.com/master.m3u8 Then that Master Playlist does have a cors ...

Alsiro Mira

23

asked Aug 7 at 14:56

Collectives™ on Stack Overflow

How to download product images by SKU from a website using Python?

How can I extract the caption/description from a single Instagram post URL in Python? [closed]

How to decode utf-8 text from newspaper3k library

Beginner in Python and Web Scraping – Looking for Feedback on My Script [closed]

Trouble scraping dynamic lottery results table – inconsistent parsing

Python 3.9, get in MS excel ALL physical addresses from the URL -> https://www.sappi.com/en-gb/about-us/locations [closed]

Extract tables from website with dynamic content with R

Disable assignment of window.location in Selenium

How to fetch real-time updates from an API without CDN-induced delays? [closed]

Firecrawl self-hosted crawler throws Connection violated security rules error

Python Selenium find nested element [closed]

How to disable selenium logs AND run the browser in headless mode

How to scrape a website that has <span class="ellipsis">…</span> in between number on a dynamic table with sellenium python

Pytube consistently fails with HTTP Error 400: Bad Request also on latest version

m3u8 HLS url VIdeo Not Playing with hls.js and Art Player

Hot Network Questions