I have played around the fraud_dataset_v2.csv dataset. It was successful when I used the original dataset.
However, I tried to change data in some of value in WAERS BUKRS KTOSL PRCTR BSCHL HKONT fields as a new dataset and load trained model. It will throw different number of input data column error message.
I think the issue is caused by less data categories in one of the data fields. When perform one-hot encoding, the newly generated column is less than the original input data used by trained model.
My question:
If we want to evaluate abnormality on a new dataset with the same data fields / columns, is it required to re-train the model?
If no, do you have any (pre-processing) recommendation to fit new input data into trained model? For example, play around with one-hot encoding or other pre-processing approach?