Monday, November 19, 2012

7 Shades of Grey

Recently, there has been much discussion about getting away from EAL4 evaluations for complex software, such as operating systems. Instead of applying EAL4 requirements, which include a detailed design description of the implementation, as well as a review of parts of the implementation, the newly discussed approach is to skip detailed designs and source code review and replace it with more elaborate testing.

This new approach effectively turns the security review from a white-box analysis with white-box testing to (almost pure) black-box testing.

We have to face the uncomfortable truth that modern operating systems are so complex that simply by looking at the interfaces, it is hard to understand what is going on in its guts. Just to provide some numbers to illustrate the problem: The entire source code tree of the Linux kernel is about 120 MB in size. Now, let us subtract three quarters from it which provides device drivers and architecture specific code for architectures we are not interested in. We still have then about 30 MB of code covering the higher level functions that are typically assessed in a security evaluation (I am not saying that we skip device drivers in an assessment, but let us make it easy for the sake of argument). In addition, consider that there are about 350 system calls, three important virtual file systems, and some other interfaces to the kernel. Now, do you really think you can test these limited numbers of interfaces to identify what 30 MB of code does with respect to the security behavior?

Well, I have to admit that I cannot.

During my work as a lead evaluator in different operating system evaluations and recently supporting the preparation of detailed design documents around Linux, I learned that access to source code and detailed design makes it easy to determine the areas to focus on during security assessments. Even when the new evaluation approach is enacted, I will surely use source code for the assessment to answer questions -- simply because it is there and it gives you the most precise answer.

For closed source products, the proposed evaluation methodology will lack the possibility for the evaluator to consult source code. Therefore, one can argue that the new evaluation methodology is (intentionally?) hurting vendors of closed source products because the evaluation result may not be as thorough as open source evaluations.

Apart from using design and source code to aid security evaluations, we need to ask whether black-box testing is really helpful for security assessments at all. Black-box testing focuses on testing the proper functioning of certain mechanisms. This is also one goal of security assessments -- so black-box testing and security assessment have one commonality. However, there is another goal security assessments have to answer: Is it possible to bypass or tamper with security functions? Do you think that any documentation of external interfaces explains, or the testing thereof shows whether (let alone how) to bypass or tamper with functions? Maybe you can use fuzz testing to find a "smoking gun" hinting at such problems, but you will not be able to really find a conclusive result to the bypassing and tampering problem.

There is another aspect worth mentioning: It is very hard for developers (and even harder for evaluators) to identify all interfaces to, say, the kernel. The prime example is the IOCTLs in a Unix kernel. Without the ability to look into the code, how can an evaluator be sure that all interfaces have been identified and tested? Any black-box approach will seriously fall short of identifying and covering all security-sensitive interfaces.

Of course you can use the binaries, disassemble it and apply black-hat style assessments to find and exploit deficiencies if you do not have access to design and source code. But that approach is not covered with the newly suggested evaluation approach either.

My current verdict to the newly suggested approach can be summarized as:
Trying to perform a security assessment using a black-box approach will result in a failure to achieve the intended goal. To state it differently, you do not expect airplane inspectors to just check if a plane looks nice, check if it has wings, flaps, and other stuff necessary to call it a plane - and then just do a few flights -- the inspectors will look at the internals of the plane in detail, and they do that for a reason -- is this reason so different from our reasons?

Stephan Müller
Lead Evaluator, atsec CC laboratories


  1. Dirk-Jan Out (Brightsight)November 19, 2012 at 10:06 AM

    I disagree that "using a black-box approach will result in a failure to achieve the intended goal". EAL1/EAL2 style black-box evaluations *can* be useful, in moving products from the "easily hacked" stage to the "script-kiddies and random malware will go somewhere else" stage.

    But if your product gets widely deployed, and has a vital role (like an OS), you will have to deal with more than script-kiddies and random malware. And for that, black box testing is not cutting it.

    Then again, for many evaluations, EAL 4 wasn't cutting it either. I am still amazed by the whole idea of certified products having a 1 or 2 week patch cycle with many security patches included.

    1. I agree with you that black box testing does help, even covering
      security related aspects. As mentioned in the blog, the testing is one aspect of a security assessment. However, the second aspect, the
      analysis of tampering/bypassing cannot be covered with black-box
      testing. Especially for more complex products, the latter aspect is the key.

      As all our evaluations revolve around complex software products, the
      blog entry applies to all of them.

      The amazement concerning seeing security flaws and reports is typical
      when looking from a high-level perspective. The evaluation covers a
      specific set of configurations and a usually a limited set of software.
      For example, a Linux evaluation does not cover, say, all software on all DVDs a vendor releases. It only covers the base OS. And considering this base OS, with its base functionality there are not too many security flaws identified. Of course, the Linux kernel receives updates here and there. But most of them are due to availability (read crashes) rather than security failures. As availability is not a security claim (and can never be with current OSes), a patch here does not imply our evaluation goals are not reached.

  2. Actually I haven't heard alot about what will happen with O/S evaluations. I actually could support EAL4+ evaluations-- for Operating Systems. One factor that isn't as important with O/S's is that they generally don't have major releases as often as COTS. So when you complete an EAL4+ evaluation of, say, Windows 8 it will have some staying power. I can't wait to see the Protection Profiles that come out for O/S's and will be as interested as you are in understanding the tailored assurance that will be required.

    1. I would not quite sign off on the notion that OSes have a long release
      cycle. Maybe when you look at the major Windows or Solaris versions,
      then yes. But all OSes have one in common: there are patches released at
      least on a monthly basis. So, I would even say that OSes have more
      problems in terms of maintaining its assurance.

      But rather than reduce the assurance assessment (read: go from EAL 4 to
      something lower), wouldn't it be more important that the schemes now
      finally agree on some assurance activities?

  3. As mentioned, certification of aviation-worthy software indeed is a whitebox approach (DO-178). In that field, one attempt to answer the OS complexity challenge is having hypervisors of lower complexity accessible to such a whitebox inspection (e.g. MILS: ).

  4. I think we can learn a lot from the world of reliability and dependability.


Comments are moderated with the goal of reducing spam. This means that there may be a delay before your comment shows up.