-
Notifications
You must be signed in to change notification settings - Fork 2
Description
In table_annotator.py on line 632, we process the original column name to match the name against the ontologies of DBpedia and Schema. The original column names are processed using the code below:
cleaned_table_columns = [
re.sub(r"[_-]", " ", " ".join(
re.findall("[0-9,a-z,.,\"#!$%\^&\*;:{}=\-_`~()\n\t\d]+|[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)", col)
)).lower() for col in table_columns.copy()
]
I wonder if the first " " inside the re.sub() call, currently a space, should be converted to "", an empty string. Because we already match the _- in the regex inside findall, which in turn means the _ or _ is replaced by a space using " ".join(). This join keeps the matched _ or - in the string, which in turn means the _ or - is replaced by another " " using the re.sub(r"[_-]", " ", ...).
For example:
"Team-Name" would be converted into "team name", 2 spaces between 'team' and 'name'. Is this desired behaviour, am I missing something? Or is this a bug?