-
Notifications
You must be signed in to change notification settings - Fork 22
QuickCheck Strategies
Okay, what's a QuickCheck strategy?
Internally, qc uses several strategies to find counterexamples. There are a number of them built into the tool, which you can see with -help:
> qc -help
Usage: quickcheck [-?|-help][-q|-v|-n][-t <msecs>]
[-i <status>]* [-s <strategy>]* [-<strategy:option>]* [<PO numbers/ranges/patterns>]
-?|-help - show command help
-q|-v|-n - run with minimal or verbose output
-t <msecs> - timeout in milliseseconds
-i <status> - only show this result status
-s <strategy> - enable this strategy (below)
-<strategy:option> - pass option to strategy
PO# numbers - only process these POs
PO# - PO# - process a range of POs
<pattern> - process PO names or modules matching
Enabled strategies:
fixed [-fixed:file <file> | -fixed:create <file>][-fixed:size <size>]
search (no options)
finite [-finite:size <size>]
trivial (no options)
direct (no options)
reasons (no options)
constant (no options)
undefined (no options)
Disabled strategies (add with -s <name>):
random [-random:size <size>][-random:seed <seed>]
>
So that shows there are several enabled strategies and one disabled one. More strategies can be added as plugins.
Why would you ever disable a strategy?
You might choose to disable strategies that are very specialized or expensive to run, or if they more or less duplicate an enabled one. For example, the "random" strategy chooses moderately random numbers for a nat1; the "fixed" strategy chooses 1, 2, 3, ... Those are very similar, so the random one is disabled by default. If they were both enabled, it would still work but it would not be as quick.
So strategies generate values of particular types?
Yes, the strategies look at the forall and exists quantifiers in the PO, and try to produce a set of values for each binding, in the hope of finding value combinations that cause the PO to be false -- a counterexample.
And the "fixed" and "random" strategies do that by...?
By choosing fixed or random values for the basic types, like numbers and booleans and characters, and then composing those together to make larger types, like records, sets, sequences, maps and so on.
And that actually finds counterexamples? Just trying fixed values or random ones? I'm surprised, to be honest!
Well, our experience is that it does find counterexamples quite well. It may be that it tends to find very simple ones and perhaps more complex situations, where a counterexample requires particular values for many variables, would not be found. On the other hand, there is a large class of stupid mistakes that everyone makes, and these often involve simple things, like forgetting that a value can be zero or negative or empty etc. These things get caught by very simple strategies.
But it does catch "real" problems too, right? This isn't just a toy?
Yes, it has caught several subtle errors in specifications -- so subtle that we have to work quite hard to figure out what is actually wrong. Some of the strategies take a much more creative approach to finding counterexamples, so perhaps that helps.
Okay, so what does the "search" strategy do differently?
It takes a very different approach. Instead of generating lots of values for each forall binding, it searches the body of the PO expression, looking for boolean checks that it might be able to violate. So if the PO has x <> 0 somewhere, and x is a binding that it can give values for, it will try setting x to zero, naively. There are a dozen or more of these cases that it can violate easily, without complex analysis.
And those values just get added to some collection?
Yes. Each strategy adds its contributions to a pot and then all the values are used to test whether the PO produces a false value. If that happens, a counterexample has been found and the checking process stops.
Okay. What about "finite"?
That one works for finite types -- like set of bool or <A>|<B>|<C> that have a finite number of values. In these cases, if the type is small enough, we can enumerate all of the possible values, rather than selecting just a few as the other strategies do.
So that obviously sounds good, but you're saying it helps find counterexamples?
More than that. If the input variables are all finite types, and they are not too large, the finite strategy can generate every combination of every possible value. If none of them produce counterexamples, the PO is probably provable. That's very useful to know! So a atrategy can indicate this when it has finished its processing.
Okay. Is the "trivial" strategy as trivial as it sounds?
Pretty much, yes. This strategy takes advantage of the fact that some POs have a simple structure like <something> => <something>. You get this because specs often guard against <something> before doing something that requires that to be true. So the POs reflect that, but these are "trivially true".
And "direct"?
This strategy is slightly different in that it is not trying to discharge obligations themselves. Rather, it works out what the obligation is trying to verify, and tries to achieve the same effect "directly" -- by a different analysis of the specification itself.
Umm... you'd better give me an example.
Well, for example, total function obligations are trying to verify that functions declared to be total genuinely produce a result of the right type for every possible argument value. So the PO says, "for all arguments, applying the function with those argument produces a value of the right type". It is sometimes possible to prove that obligation, but it is also possible to directly show that the body of the function is a total function -- because it is comprised entirely of total operators and returns a well defined type, so it will always produce a result of the given type.
Okay, I see. So what do the extra strategy options do, like -fixed:size?
The size option typically sets the number of values for strategies to generate. Sizes usually default to a small number like 20, so that the tool is quick. But if you want to really chew on a particular PO, you could increase this to (say) 1000 values or more. For example:
functions
f: nat1 -> nat
f(i) == let s = [a | a in set {1,...,1000}] in s(i);
> qc -s fixed
PO #1, MAYBE in 0.074s
> qc -s fixed -fixed:size 2000
PO #1, FAILED in 1.289s
Counterexample:
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:52
(forall i:nat1 &
--> i = 1001
(let s : seq of nat1 = [a | a in set {1, ..., 1000}] in
i in set inds s))
--> returns false
>
So you see here that the 2000 value run took nearly 50x longer than the default run, but it successfully found a counterexample, whereas the default wasn't sure. So if the default settings don't find a counterexample, it may be worth trying a larger size like this, if you have time.
Hmmm, so using a larger size is more likely to find a counterexample, but it will take longer?
Exactly.
Why did you use -s fixed for that example though?
It's because, by default, there are other strategies that would be able to find the 1001 counterexample. By limiting it to the just fixed strategy, I could show you the problem.
Okay. And what are the -fixed:create and -fixed:file options for?
Remember the fixed strategy uses fixed values for each type, hence the name. It will use sensible fixed values by default, but it is possible to give your own fixed values as VDM-SL set expressions. This may help with very specialized cases, especially with complex types. You can probably imagine that an auto-generated record value, with a dozen fields, would be nonsense (albeit legal). So for cases like this, you can create a file with your own sensible values.
Ah! So that's what the -fixed:create does?
Yes. You can either give it a filename argument or use the default, which is "ranges.qc" in the current directory.
Can we try that?
Sure. Let's go back to the simple example we used, that had f(s, i) == s(i).
> qc -fixed:create
Created 2 default ranges in 'ranges.qc'.
Check them! Then run 'qc -s fixed'
>
$ cat ranges.qc
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of nat = 20;
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = 20;
$
Okay wait, I can see you've listed the "ranges.qc" file, but I was expecting to see VDM values?
But if you look at the -fixed:create output, it says it created two default values in the ranges.qc file, which you should check. If you just use a number like "20", it will generate 20 values internally. But you can use explicit VDM set expressions instead. So for example, I can change the file to be...
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of nat = { [a, b, c] | a, b, c in set {1, ..., 10} };
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = {1, ..., 100};
> qc
PO #1, FAILED in 0.002s
Counterexample:
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of nat, i:nat &
--> i = 0, s = [1, 1, 10]
i in set inds s)
--> returns false
>
Now you see the counterexample chosen comes from the three-valued sequences in ranges.qc. Note that the filename defaults to "ranges.qc". We could use -fixed:file to use a different file. So you could create a collection of specialized range files to focus on certain subsets of the POs.
Okay, so what about the "reasons" strategy?
This one is slightly different, again. It uses a heuristic to try to help you with MAYBE cases. It looks at the obligation, to see what variables it reasons about. If you haven't got earlier clauses (conditional tests for example) that check those variables, there's a good chance that something may be wrong. So it will point out that although (say) the obligation is that k in set dom m you haven't actually reasoned about k or m, which is suspicious.
But not definitely wrong.
That's right. It's a heuristic that shows you something, more like a "hint". As I said, it only does this when the result is MAYBE.
Okay and what about the "constant" strategy?
This one uses the fact that, often, specifications have constants in them, like MAX_WIDGETS or 256 or {1, ..., 1024}. These are presumably important in the main specification, so there is a good chance that values that are "close" to these constants may cause problems. For example, if you pass a size value that is one greater than MAX_WIDGETS, that might cause an issue if your specification doesn't expect that.
So the constants bias the strategy values somehow?
That's right. The strategy finds the constants used in the obligation and then checks whether any bindings are of the same type. If so, it "nudges" the constants up and down to see whether bindings with those values cause counterexamples.
And the "undefined" strategy?
The idea here is that VDM-SL specifications can define a state vector for variables, but if the definition of the state does not have an "init" clause, the values of the state variables will be undefined until they are given a value by an assignment in an operation.
So what happens if you use an undefined value?
Most of the time you will get an error. The handling of undefined values in VDM is quite complicated and the interpreter can only partially handle the semantics; producing an error is usually the safest way to handle them, though in the case of pure boolean logic, undefined values do produce predictable results.
So what does the strategy actually do?
If your spec is VDM-SL and there is a state definition in a module without an initializer, it will add a completely undefined vector to the set of values tested by QuickCheck. So this is trying to catch cases where you forgot to initialize some state, but then try to use it anyway.
Okay, that makes sense. Anything else I should know?
I think that's about it. There is a description of how to write your own QuickCheck strategy in the High Level Design, but that's an advanced topic :-)
Well, if I get any strategy ideas, I'll let you know!
- TL;DR
- VDM Method
- VDMJ Overview
- VDMJ Help for Specifiers
- VDMJ Help for Tool Developers
- VDMJ Help for LSP Developers