Can I remove the BEV encoder to directly train and infer 2D images?

Hi， Great work! I currently have a batch of 2D images that need to detect various traffic elements such as lane markings and pedestrian crossings, so the BEV transformation process is not involved in my dataset. May I ask, how should I modify your code to directly train for detecting map elements in 2D images? Do I just need to delete the BEV encoder related parts in the config?