1

I am trying to scrape dividend data for the stock "Vale" on the site https://investidor10.com.br/acoes/vale3/. The dividend table has 8 buttons (1, 2, 3, ..., 8) and "Next" and "Previous" buttons as well. My script can scrape data from the first 5 tables, but when clicking the button with idx="5", it jumps to idx="8", causing it to miss the data from the 6th, 7th, and 8th tables.

Despite trying everything I found on YouTube, Reddit, and Google, I still cannot fix the issue.

Here is the code I am using:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep

def iterar_botao():
    botoes = driver.find_elements(By.CSS_SELECTOR, "a[data-dt-idx]")
    qtd_botoes = len(botoes)
    
    for i in range(qtd_botoes):
        clicar_botao(str(i+1))

def clicar_botao(idx):
    try:
        localizador = (By.CSS_SELECTOR, f'a[data-dt-idx="{idx}"]')
        botao = WebDriverWait(driver, 10).until(EC.presence_of_element_located(localizador))
        
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        sleep(1)
        driver.execute_script("arguments[0].scrollIntoView({behavior:'instant', block:'center' });", botao)
        driver.execute_script("arguments[0].click();", botao)
        
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "table-dividends-history")))
        pegar_tabelas()  # Function to scrape the tables (not shown here)
    except Exception as e:
        print(f"Failed to execute function: {e}")
Failed to execute function: Message: RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:552:5

I tried adding waits and sleeps to ensure elements are properly loaded.

I've tried debugging by printing the button idx values before clicking.

I checked if the NoSuchElementError was caused by a wrong element locator, but the button exists on the page.

3 Answers 3

0

Here's how you can get the desired results with the following steps:

  1. First, navigates to the target page, that's https://investidor10.com.br/acoes/vale3/

  2. Wait for the dividends section to be present

    • uses an explicit wait for #dividends-section to ensure the dividends area is rendered before proceeding.
  3. Next, locate the table wrapper and bring it into view

  4. Then capture the table element and headers

  5. Extract Page 1 data

    • Calls processing(table) to scrape all visible rows on the first page into row_list.
  6. Finally, Paginate through the table

    • Inside the loop:

      • Tries to locate the next pagination button with CSS: #table-dividends-history_paginate > a.paginate_button.next

      • Attempts to click it. If an ElementClickInterceptedException occurs (e.g., overlay or animation), it silently retries on the next iteration.

      • After a successful click, waits 1 second for the next table to load, and calls processing(table) again to append the new rows.

      • If the next button is not found (NoSuchElementException), stops the loop, meaning the last page has been processed.

  7. Assemble the final table:

    • Builds a pandas.DataFrame from row_list using table_header_list as the column names.

    • and prints the resulting table, which now contains the concatenated rows from all paginated pages.

And here's the implementation code:

import time
import pandas as pd
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

row_list = []


def processing(tbl):
    table_rows = tbl.find_elements(By.CSS_SELECTOR, "div.dataTables_scrollBody>table>tbody>tr")
    for row in table_rows:
        row_list.append([d.text for d in row.find_elements(By.TAG_NAME, 'td')])


options = ChromeOptions()
options.add_argument("--start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)

driver = Chrome(options=options)
wait = WebDriverWait(driver, 10)

url = "https://investidor10.com.br/acoes/vale3/"
driver.get(url)

wait.until(EC.visibility_of_element_located((By.ID, "dividends-section")))

dividend_table_container = driver.find_element(By.ID, "table-dividends-history_wrapper")
driver.execute_script("arguments[0].scrollIntoView(true);", dividend_table_container)
table = dividend_table_container.find_element(By.CSS_SELECTOR, "div.dataTables_scroll")
table_header_list = table.find_element(By.CSS_SELECTOR, "div.dataTables_scrollHead").text.split('\n')
print(f"Table Header {table_header_list}")
print(f"Extracting Page 1...")

processing(table)
page_num = 2
NEXT_PAGE_AVAILABLE = True
while NEXT_PAGE_AVAILABLE:
    try:
        next_page = dividend_table_container.find_element(By.CSS_SELECTOR, '#table-dividends-history_paginate>a[class="paginate_button next"]')
        try:
            next_page.click()
            time.sleep(1)
            print(f"Extracting Page {page_num}...")
            processing(table)
            page_num += 1
        except ElementClickInterceptedException:
            pass
    except NoSuchElementException:
        print("Reached End Page")
        NEXT_PAGE_AVAILABLE = False


# show the table
df = pd.DataFrame(row_list, columns=table_header_list)
print(df)

output:

Table Header ['TIPO', 'DATA COM', 'PAGAMENTO', 'VALOR']
Extracting Page 1...
Extracting Page 2...
Extracting Page 3...
Extracting Page 4...
Extracting Page 5...
Extracting Page 6...
Extracting Page 7...
Extracting Page 8...
Reached End Page

           TIPO    DATA COM   PAGAMENTO       VALOR
0          JSCP  12/08/2025  03/09/2025  1,89538700
1    Dividendos  07/03/2025  14/03/2025  2,14184748
2          JSCP  11/12/2024  14/03/2025  0,52053100
3          JSCP  02/08/2024  04/09/2024  2,09379814
4    Dividendos  11/03/2024  19/03/2024  2,73854837
5    Dividendos  21/11/2023  01/12/2023  1,56589100
6          JSCP  21/11/2023  01/12/2023  0,76577076
7          JSCP  11/08/2023  01/09/2023  1,91847180
8    Dividendos  13/03/2023  22/03/2023  1,82764600
9          JSCP  12/12/2022  22/03/2023  0,29201200
10         JSCP  11/08/2022  01/09/2022  1,53937600
11   Dividendos  11/08/2022  01/09/2022  2,03268000
12   Dividendos  08/03/2022  16/03/2022  3,71925600
13   Dividendos  22/09/2021  30/09/2021  8,19723900
14   Dividendos  23/06/2021  30/06/2021  1,47340202
15   Dividendos  23/06/2021  30/06/2021  0,71626805
16         JSCP  04/03/2021  15/03/2021  0,83573600
17   Dividendos  04/03/2021  15/03/2021  3,42591000
18         JSCP  21/09/2020  30/09/2020  0,99734400
19   Dividendos  21/09/2020  30/09/2020  1,41016600
20         JSCP  26/12/2019  26/12/2019  1,41436400
21         JSCP  02/08/2018  20/09/2018  1,30861400
22   Dividendos  02/08/2018  20/09/2018  0,17174700
23         JSCP  06/03/2018  15/03/2018  0,48851100
24         JSCP  21/12/2017  15/03/2018  0,41991200
25         JSCP  20/04/2017  28/04/2017  0,90557100
26         JSCP  01/12/2016  16/12/2016  0,16630000
27   Dividendos  15/10/2015  30/10/2015  0,37360000
28         JSCP  14/04/2015  30/04/2015  0,60180000
29         JSCP  16/10/2014  31/10/2014  0,65080000
30   Dividendos  16/10/2014  31/10/2014  0,34000000
31         JSCP  14/04/2014  30/04/2014  0,89890000
32   Dividendos  17/10/2013  31/10/2013  0,12060000
33         JSCP  17/10/2013  31/10/2013  0,82370000
34   Dividendos  16/04/2013  30/04/2013  0,15360000
35         JSCP  16/04/2013  30/04/2013  0,71040000
36         JSCP  16/10/2012  31/10/2012  0,52590000
37   Dividendos  16/10/2012  31/10/2012  0,66070000
38         JSCP  13/04/2012  30/04/2012  1,07530000
39         JSCP  14/10/2011  31/10/2011  0,63430000
40   Dividendos  14/10/2011  31/10/2011  0,38930000
41   Dividendos  11/08/2011  26/08/2011  0,93340000
42         JSCP  13/04/2011  29/04/2011  0,60820000
43         JSCP  14/01/2011  31/01/2011  0,32000000
44         JSCP  14/10/2010  29/10/2010  0,55520000
45         JSCP  14/04/2010  30/04/2010  0,42170000
46         JSCP  15/10/2009  30/10/2009  0,49200000
47   Dividendos  15/04/2009  30/04/2009  0,52460000
48   Dividendos  16/10/2008  31/10/2008  0,13850000
49         JSCP  16/10/2008  31/10/2008  0,51470000
50         JSCP  10/04/2008  30/04/2008  0,23810000
51   Dividendos  10/04/2008  30/04/2008  0,19850000
52         JSCP  18/10/2007  31/10/2007  0,38190000
53   Dividendos  18/10/2007  31/10/2007  0,01220000
54         JSCP  17/04/2007  30/04/2007  0,25730000
71         JSCP  28/12/1999  01/03/2000  1,17000000
72         JSCP  06/08/1999  20/08/1999  1,11000000
73  Bonificaรงรฃo  18/04/1997  18/04/1997  1,00000000
1
  • 1
    "Wait for 1 second"? Unreliable. How can you know that the revised data will be available within that timeframe? That's a rhetorical question because you simply cannot know. You need a different, more reliable, more efficient approach Commented Aug 15 at 6:18
0

If you plan to use the requests module to get it done, it would be as simple as this.

import requests
import pandas as pd
from io import StringIO

link = 'https://investidor10.com.br/acoes/vale3/'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
    res = s.get(link)
    df = pd.read_html(StringIO(res.text), attrs={"id": "table-dividends-history"})[0]              
    df = df.loc[:, ["Tipo", "Data COM", "Pagamento", "Valor"]]
    df.to_csv("investidor.csv", index=False, encoding="utf-8-sig")
    print(df)
2
  • How does this handle the issue in the OP regarding paging? Commented Aug 16 at 7:13
  • With all this strenuous effort, OP ultimately ends up scraping data from that site, and my two-liner rightly addresses whether he is willing to achieve that with a much simpler approach. Commented Aug 16 at 16:23
0

You can look for the "next page" element (labelled 'Prรณximo') by its ID - namely 'table-dividends-history_next'

You need to check if the element has been disabled. If it has then you've already processed the last page.

The challenge here is to know when the page has been reloaded after clicking 'Prรณximo'

You can do this by noting the table content for every page. If it doesn't differ from the previous content then just repeatedly get the table content.

In this example, all 4 columns of the dividends table(s) are printed.

from collections.abc import Iterator
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver import Chrome
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.remote.webelement import WebElement


URL = "https://investidor10.com.br/acoes/vale3/"
DISABLED = "disabled"

# pylint: disable=possibly-used-before-assignment


def next_page() -> bool:
    """
    Find the "next page" element (labelled 'Prรณximo')
    If it's disabled, return False
    Otherwise, click it and return True
    """
    wait = WebDriverWait(DRIVER, 5)
    ec = EC.element_to_be_clickable
    loc = (By.ID, "table-dividends-history_next")
    nb = wait.until(ec(loc))
    if DISABLED in (nb.get_attribute("class") or DISABLED):
        return False
    ActionChains(DRIVER).move_to_element(nb).click().perform()
    return True


def fourwise(tds: list[WebElement]) -> Iterator[list[str]]:
    """
    yield text attributes td elements in groups of 4
    """
    assert len(tds) % 4 == 0
    for i in range(0, len(tds), 4):
        yield [td.text for td in tds[i : i + 4]]


def get_table_data() -> list[list[str]]:
    """
    Parse the dividends history table and return all of its content
    """
    wait = WebDriverWait(DRIVER, 10)
    ec = EC.visibility_of_all_elements_located
    loc = (By.CSS_SELECTOR, "#table-dividends-history tbody tr > td")
    tds = list(wait.until(ec(loc)))
    return list(fourwise(tds))


def process() -> None:
    """
    Main processing loop
    """
    DRIVER.get(URL)
    previous: list[list[str]] = []
    while True:
        # do this repeatedly while the content remains the same as on the previous iteration
        tdata = get_table_data()
        if tdata != previous:
            for row in tdata:
                print(*row, sep=", ")
            previous = tdata
            if not next_page():
                break


if __name__ == "__main__":
    with Chrome() as DRIVER:
        process()

Output (partial):

JSCP 12/08/2025 03/09/2025 1,89538700
Dividendos 07/03/2025 14/03/2025 2,14184748
JSCP 11/12/2024 14/03/2025 0,52053100
JSCP 02/08/2024 04/09/2024 2,09379814
Dividendos 11/03/2024 19/03/2024 2,73854837
Dividendos 21/11/2023 01/12/2023 1,56589100
JSCP 21/11/2023 01/12/2023 0,76577076
JSCP 11/08/2023 01/09/2023 1,91847180
Dividendos 13/03/2023 22/03/2023 1,82764600
JSCP 12/12/2022 22/03/2023 0,29201200
JSCP 11/08/2022 01/09/2022 1,53937600
Dividendos 11/08/2022 01/09/2022 2,03268000
...
Dividendos 14/10/2005 31/10/2005 0,89290000
JSCP 14/04/2005 29/04/2005 1,11000000
JSCP 13/10/2004 29/10/2004 1,03000000
Dividendos 13/10/2004 29/10/2004 0,24000000
JSCP 14/04/2004 30/04/2004 2,06000000
JSCP 15/10/2003 31/10/2003 1,48000000
JSCP 27/08/2003 31/10/2003 1,94000000
JSCP 16/04/2003 30/04/2003 1,62000000
JSCP 13/11/2002 10/12/2002 2,68000000
JSCP 28/12/2000 20/02/2001 3,33000000
JSCP 28/12/1999 01/03/2000 1,17000000
JSCP 06/08/1999 20/08/1999 1,11000000
Bonificaรงรฃo 18/04/1997 18/04/1997 1,00000000

Your Answer

By clicking โ€œPost Your Answerโ€, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.