Monday, October 22, 2012

Programming the Evaluation Robot

Stating where you want to go is acknowledging that you are not yet there. Seen from this perspective, the CCMC's recently published vision statement is an interesting list of its own non-accomplishments.
Especially when it comes to terms like “objectivity” and “repeatability,” used by the visionaries of the new direction in Common Criteria like in a Tibetan prayer wheel. I understand that the CCMC has utterly failed to establish a common and sufficiently high standard for evaluations underlying the CC certificates. Apparently, some nations concluded that they cannot trust other nation's certificates up to the assurance levels that have been accepted in the past.
Don't get me wrong; I too, want evaluations to be as objective and repeatable as possible. However, from all my experience working in the field of IT security over the past 30 years, I know as a fact that an evaluation that reduces itself to a checklist activity has no value at all. It is the ultimate victory of form over contents. In IT security, not looking at the contents is not looking at security at all.
This really makes me nervous. From my worm's eye view, I get the impression that some CCMC members, rather than bringing their labs to a higher standard, seem to believe that they could be so specific with their collaborative Protection Profiles that they can program their evaluation robots from such a specification, and that without further ado, these evaluation robots will deliver objective and repeatable results at a sufficiently meaningful level.
It won't work. It cannot work. The “simple” reason: complexity! Even if products fall into the same category, they are not standardized to a point where the evaluation robot could deal with them in a meaningful way.

Every product is different, therefore every evaluation is different, too!

Do you remember the hype with artificial intelligence and expert systems in the 80's? If you don't, ask yourself why this topic has so silently disappeared? My explanation is that all the rules and checklists cannot possibly go down to a single individual or product, but must always stop at some higher level and deal with some uncertainty below that. That's what the CCMC needs to understand.

No checklist will be detailed enough!

Think of medicine or jurisdiction. Did you ever wonder why you cannot go to court, state your cause in front of a machine, turn a crank and get the verdict? No, you did not. You know that even with a vast amount of laws and regulations, you cannot possibly cram life's complexity into a set of rules dealing with every aspect and combination. Instead, you rely on a judge to come to a verdict; he shall take all the relevant details of your cause into account, even if these details have not been spelled out explicitly, but can only be deducted from other cases. You want judges to have sufficient experience and enough common sense to come to a fair verdict (o.k., you don't insist on “fair” as long as you win ;-) ). What you also want is that the verdict comes with a rationale that allows you, your lawyer or other judges to follow the chain of arguments that led to that verdict, and to challenge it if it is not sound.
Medicine provides similar examples. You don't want to grab somebody off the street, give them a checklist and have them diagnose you. Similarly, you don't want to be diagnosed by a robot which cannot look left and right from its pre-programmed algorithm. Again, what you expect is expertise, experience and common sense.
Evaluations are very similar to these scenarios:
  • I expect an ITSEF to diagnose a product under evaluation and come to a verdict based on expertise, experience and common sense. Actually, I don't have a problem if a doctor has some checklists that I fill out as an efficient start for an anamnesis, and if he might use one just to be sure that no important step was forgotten. I would, however, leave as soon as he told me that he was only allowed and capable to diagnose the diseases on his list.
  • I expect ITSEFs (and CBs) to be qualified for their job. However, I don't expect all of them to be at the same level of expertise and to specialize in all product types. I'm fine with visiting my doctor if I have a cold, but I would not have him do brain surgery on me.
  • I expect ITSEFs to be accredited by the CBs much as a doctor gets an approbation, i.e., based on proven expertise and experience. As a patient, I expect doctors to be under supervision and charlatans to be banned from practice.
  • I expect ITSEFs to document their evaluation work in a way that I can understand what they did, how they did it and what the arguments were that lead to their verdict. I accept that two judges may come to different verdicts on the same case, although I would prefer if that did not happen. However, if it does, it is crucial that the verdict comes with a rationale that can be followed and allows a higher court to assess if all aspects had been considered with due care.
Summarizing this, I strongly warn the CC community not to fall into the trap that the seemingly innocent terms “objectivity” and “repeatability” open. In sufficiently complex systems, you must accept that you cannot fully achieve these objectives, and you must not reduce your evaluation work to formalisms that seem to fulfill them! Higher assurance is necessary and can be achieved. If you cannot achieve it, this is no proof that it cannot be achieved at all!

by Gerald Krummeck
Head of the German ITSEF


  1. Reading this, I think that objectivity should be a goal -- and should already be achieved. The opposite of objectivity is bias towards a vendor or other organization, and there should be no trace of that in evaluations.

    I agree that repeatability is harder to acheive, and that a checklist approach is not the way to achieve that. Repeatability, however, can be achieved in a number of ways. It can be achieved through dictating the specifics of how something is to be tested. I'll argue that's more than repeatability -- that's standardized testing (akin to what we seen in US public schools), which doesn't always have the intended effects of creating quality.

    But repeatability can be achieved in other ways, such as through documentation of test plans and procedures, in such a way that another organization could come in and perform the exact same tests and, presuming the same product, get the same results. That also is repeatability... and that is something that is achievable without degredation of quality.

  2. Dear Dan,

    I fully agree with your comments. My current concern is that assurance is lowered with the argument that evaluation activities are not objective and repeatable, using a very narrow definition of these terms.

    We shall strive for objectivity and repeatability, but we also shall accept that there will be differences in the details, and that such individual variations are not bad as long as they are documented and well-founded, rather than arbitrary, as you have said. I know that in many areas, when the evaluator's experience comes into play, this adds tremendous value to the evaluation, and I don't want this value to be thrown away with pseudo-formal arguments.


  3. Gerald,
    Your blog makes some sense but you don't have any discussion of the reason for this change in the first place; the time, effort, cost of the current process; which leads to evaluations of products that are no longer being sold or supported. The problem NIAP is trying to solve is getting COTS in the hands of the acquires of a product that is actually still sold by the vendor. By the way, the one area that is not being compromised in the new approach is VAN and I would argue that is the one area where some subjectivity is ok.

  4. thanks for sharing.


Comments are moderated with the goal of reducing spam. This means that there may be a delay before your comment shows up.