RuntimeError: Given groups=1, weight of size [128, 2, 3, 3], expected input[1, 1, 1895, 66] to have 2 channels, but got 1 channels instead
Happens in the audio encoder's first conv2d layer when attempting to preprocess a video with a single audio channel (mono).
Possible fix: detect when a file has mono audio (check if channel dim == 1) and concat 2 copies on the channel dim to create a fake stereo.