How to decode utf-8 text from newspaper3k library

-4

class ArticleScraper:
    def __init__(self):
        pass
    
    def articleScraper(self, article_links):
        article_content = []    
        for url in article_links:
            url_i = newspaper.Article(url="%s" % (url), language='en')
            url_i.download()
            url_i.parse()
            content = (f"TITLE:{url_i.title} ARTICLES: {url_i.text}")
            print(urllib.parse.unquote(content))
            article_content.append(content)
        
        return ("\n".join(article_content))
sol = ArticleScraper()
print(sol.articleScraper(list_of_urls))

this is my current code, and the problem I'm having is that whenever it outputs the content or the text it doesn't scrape all the utf-8.

like this:

I' tried using the urllib3, and with bs4 aswell, no luck on the urllib3 on bs4 it works the encoding and decoding but I wanted to use newspaper3k because it's more efficient when scraping.

asked Aug 31 at 11:44

NYT_ SKY

211 silver badge3 bronze badges

3

maybe it is not in UTF-8 (some pages still may use other encodings), or maybe problem is only your terminal which may not use UTF-8. Better show URL for this page and maybe create minimal working code so we could check this problem

furas
– furas

08/31/2025 12:25:54
Commented Aug 31 at 12:25
1

You're right, it was a problem in my terminal, it couldn't process or wasn't built to process UTF-8, tried it on a different interpreter and all special characters showed.

NYT_ SKY
– NYT_ SKY

08/31/2025 12:53:32
Commented Aug 31 at 12:53
1

Change default code page of Windows console to UTF-8 - Super User

furas
– furas

08/31/2025 18:17:41
Commented Aug 31 at 18:17

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to decode utf-8 text from newspaper3k library

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest