Issues in htmlparse Metro and Telegraaf

Issues to be fixed in htmlparse in Telegraaf and Metro rss scrapers:
- Metro htmlparser for text also catches some 'invisible' HTML that is not part of the main article text. (Likely they have CSS display: none applied?)
- Telegraaf htmlparser is unable to parse some texts, because they are not included in the HTML, but only load after a script is run on the website. Possible solution... htmlsource is a string that has the text included in the script: "articleBody": "HERE IS THE TEXT.","author":
```
if text.strip() == "":
    logger.warning("Trying alternative method....")
    #parse the text from htmlsource```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues in htmlparse Metro and Telegraaf #486

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues in htmlparse Metro and Telegraaf #486

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions