There are at least two cases were this applies:
Potential solutions:
- Divide into subclassses and map benchmarks accordingly (e.g., for question answering: 'extractive question answering' and 'abstractive question' answering)
- Add question answering as a separate NLP category next to the existing hierarchy (natural language generation, natural language anaylsis...)