OU blog

Personal Blogs

Sebastian Tyrrell

Having confidence in fit criteria

Visible to anyone in the world

This is loosely based on the question in TMA02.

This basically works for any quality, so I am using reliability as an example.

Let's suppose that we have a system (not yet a product) in which a customer submits a "form" and this must be checked by the work (for completeness and correctness) before being acted on. It is not important in this analysis whether the form is a handwritten form interpreted by a human or an html form interpreted by a computer system: the principles are the same.

Our client has told us that it is essential that the failure rate be low. Naturally we ask what she means by "low" and are told that no more than 1 faulty form in 10 000 should get through.

We still have to ask for further clarification: what is an acceptable false positive failure rate: i.e. the number of correct forms that are rejected. However for simplicity I am going to leave this out.

So we in fact have two requirements: one functional specifying that the form must be complete and correct and one non-functional specifying the reliability quality associated with the test.

For the functional requirement we need to specify exactly what constitutes a failure, so that this is clear and unambiguous.

Once we've done that we can move on to the reliability requirement, and how to specify a fit criterion for this. On the face of it we might say:

The system shall reject at least 9999 out of every 10000 forms that are faulty according to [associated functional requirement].

The first point to note is that we have not yet got enough information to test this and this means the above is not a clear fit criterion. Yes, the required maximum failure rate has been specified but not the tolerance.

If we state that product be tested on 10000 incorrect forms and not accept any there is still a substantial chance of the failure rate being higher than 1:10000, in the same way that you can roll a die 6 times without getting a 6.

Sticking with the die in order to keep the numbers small, the chances of 10 random throws not including a 6 are:

P(0 sixes) = (5/6)10 = 16%.

Now we can introduce the concept of a "confidence level" or tolerance. If the die were loaded we might theorise that this particular die had a higher than usual chance of throwing a 6: if we have 10 throws and no sixes we can be 84% confident that the probability of a six is  1/6 or less.

Similarly the chances of 0 failures in 10000 attempts when the actual failure rate is1:10000 is given by:

P (0 failures) = (9999/10000)10000 = 37%

So we can only state with 63% confidence that the failure rate is 1:10000 or less: it could well be a little higher (for example, for a failure rate of 11:100000:

P (0 failures) = (99989/100000)10000 =  33%

we have almost the same chance of seeing no failures, so you can see why we need to think carefully about our fit criterion and the associated tests.

Fortunately we don't need to become expert statisticians for this as there are two factors that simplify our calculations. Firstly we are dealing with a maximum acceptable failure rate and we are anticipating in fact that we will see no failures at all, and secondly we are dealing with sufficiently small probabilities that the chances of two or more failures should be negligible (I will demonstrate this in a subsequent post).

Our fit criterion for a statistical test should include not only the maximum acceptable failure rate but also the confidence level. So we might have:

The system shall reject at least 9999 out of every 10000 forms that are faulty according to [associated functional requirement] and this figure shall be demonstrated to a 99% confidence level.

There are mathematical ways of defining the number of tests required (I will deal with them in a subsequent post) but initially and in most cases a heuristic approach will be acceptable. Let's look for example at a test with 50000 runs and no failures:

P (0 failures) = (9999/10000)50000 = 0.7%

Meaning that if 50000 tests produce 0 failures the chances of the failure rate being 1:10000 or greater are only 0.7%, or put another way:

we can be 99% confident that the failure rate is 1:10000 or less.

We are nearly there now, but there is still one important point to be taken into account: what is the source of the faulty forms we are using? Ideally these will be a random sample of actual invalid customer forms, and they should not be test forms produced by the development team (use the comments to explain why!)

The system shall reject at least 9999 out of every 10000 forms that are faulty according to [associated functional requirement] and this figure shall be demonstrated to a 99% confidence level using a random selection of faulty forms from actual customer records.

It is not necessarily easy and customers may demur at the perceived costs (for example in this case of collecting and transcribing old faulty forms that they might simply have thrown away). This makes it doubly important that you as the requirements' engineer can explain the theoretical background in simple and straightforward terms: they can cut corners (it's their money) but this will reduce the confidence in the reliability figures.

Permalink
Share post

This blog might contain posts that are only visible to logged-in users, or where only logged-in users can comment. If you have an account on the system, please log in for full access.

Total visits to this blog: 74258