I am just judging from the code so I could be wrong:
ATM vbench doesn't differentiate either benchmarking failed due to a build failure or actually benchmarks failures. Those are different since if a revision build fails, then for sure we can blacklist that revision right away. The situation with failing benchmarks is different: may be currently provided benchmarks all fail but some added in the future would succeed (e.g. particularly targeting older API). So IMHO revisions should be blacklisted only if build fails, and then additional blacklisting could be done based on both (rev, benchmark).
Related: If I am correct, as it is now -- build failures are simply ignored -- I am about to submit a PR with small refactoring to not even bother running benchmarks if build failed, and then blacklisting it