Closing tag </span> not found 209:76

Hello  author @dcollien , how can I solve the following problem: When I try to filter HTML content, I always get an error: tag </span>/</a>.... not found. The code is as follows:

import pandas as pd
import FilterHTML
import requests
from bs4 import BeautifulSoup

# 读取CSV文件
df = pd.read_csv('xxxx.csv')


attributes = df.columns.tolist()
html_text = df.iloc[0]['crawl_html']
URL = df.iloc[0]['url']

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
}

response = requests.get(
                URL, 
                headers=headers,
                timeout=10,  
                # proxies={"http": None, "https": None}
            )
response.raise_for_status()  
html_content = response.text

soup = BeautifulSoup(html_content, 'html5lib') 
fixed_html = str(soup)

try:


    whitelist = {
                'a': {
                    'href': 'url',
                    'target': [
                        '_blank',
                        '_self'
                    ],
                    'class': [
                        'button'
                    ]
                },
                'img': {
                    'src': 'url',
                    'width': 'measurement',
                    'height': 'measurement'
                },
                'span': {
                    'style': {
                        'color': 'color',
                        'background-color': 'color'
                    }
                }
            }
    filtered_html = FilterHTML.filter_html(fixed_html, spec=whitelist)
    print(filtered_html)

except Exception as e:
    raise Exception(f"下载网页失败: {e}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Closing tag </span> not found 209:76 #19

读取CSV文件

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Closing tag </span> not found 209:76 #19

Description

读取CSV文件

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions