-
Notifications
You must be signed in to change notification settings - Fork 0
Milestone
Description
Initial Checks
- I have searched existing issues for duplicates
- I have read the documentation
Summary
Currently, the implementation of the SetDiff operator performs a left join between the two operands by ids. If a combination of ids is not present, the measures and attributes of the operand will appear as null in the result.
Then, a filter is applied to check if there are any null values.
The problem with this implementation is that if there are pre-existing null values, they will pass the filter when they shouldn't.
Reproducible Example
script = "DS_r <- setdiff(DS_1, DS_2);"
data_structures = {
"datasets": [
{
"name": "DS_1",
"DataStructure": [
{"name": "Id_1", "type": "Integer", "role": "Identifier", "nullable": False},
{"name": "Me_1", "type": "Number", "role": "Measure", "nullable": True},
{"name": "At_1", "type": "Number", "role": "Measure", "nullable": True},
],
},
{
"name": "DS_2",
"DataStructure": [
{"name": "Id_1", "type": "Integer", "role": "Identifier", "nullable": False},
{"name": "Me_1", "type": "Number", "role": "Measure", "nullable": True},
{"name": "At_1", "type": "Number", "role": "Measure", "nullable": True},
],
},
]
}
datapoints = {
"DS_1": pd.DataFrame({"Id_1": [1, 2, 3], "Me_1": [1, 2, 3], "At_1": [1, 2, 3]}),
# At_1 not defined, will be filled with nulls
"DS_2": pd.DataFrame({"Id_1": [3, 4, 5], "Me_1": [1, 2, 3]}),
}
print(run(script=script, data_structures=data_structures, datapoints=datapoints))vtlengine version
1.6.0
Python version
Any
OS
Any
Reactions are currently unavailable