Currently we tokenize the issue body as a single free text input. What we could try to see if it improves the metrics is parse the distinct parts that consist a webcompat report and extract features. Potential features:
- URL
- Browser
- Category
- Description
- Steps to reproduce
- Tested another browser