7 Shades of Grey
Recently, there has been much discussion about getting away from EAL4 evaluations for complex software, such as operating systems. Instead of applying EAL4 requirements, which include a detailed design description of the implementation, as well as a review of parts of the implementation, the newly discussed approach is to skip detailed designs and source code review and replace it with more elaborate testing.
This new approach effectively turns the security review from a white-box analysis with white-box testing to (almost pure) black-box testing.
We have to face the uncomfortable truth that modern operating systems are so complex that simply by looking at the interfaces, it is hard to understand what is going on in its guts. Just to provide some numbers to illustrate the problem: The entire source code tree of the Linux kernel is about 120 MB in size. Now, let us subtract three quarters from it which provides device drivers and architecture specific code for architectures we are not interested in. We still have then about 30 MB of code covering the higher level functions that are typically assessed in a security evaluation (I am not saying that we skip device drivers in an assessment, but let us make it easy for the sake of argument). In addition, consider that there are about 350 system calls, three important virtual file systems, and some other interfaces to the kernel. Now, do you really think you can test these limited numbers of interfaces to identify what 30 MB of code does with respect to the security behavior?
Well, I have to admit that I cannot.
During my work as a lead evaluator in different operating system evaluations and recently supporting the preparation of detailed design documents around Linux, I learned that access to source code and detailed design makes it easy to determine the areas to focus on during security assessments. Even when the new evaluation approach is enacted, I will surely use source code for the assessment to answer questions -- simply because it is there and it gives you the most precise answer.
For closed source products, the proposed evaluation methodology will lack the possibility for the evaluator to consult source code. Therefore, one can argue that the new evaluation methodology is (intentionally?) hurting vendors of closed source products because the evaluation result may not be as thorough as open source evaluations.
Apart from using design and source code to aid security evaluations, we need to ask whether black-box testing is really helpful for security assessments at all. Black-box testing focuses on testing the proper functioning of certain mechanisms. This is also one goal of security assessments -- so black-box testing and security assessment have one commonality. However, there is another goal security assessments have to answer: Is it possible to bypass or tamper with security functions? Do you think that any documentation of external interfaces explains, or the testing thereof shows whether (let alone how) to bypass or tamper with functions? Maybe you can use fuzz testing to find a "smoking gun" hinting at such problems, but you will not be able to really find a conclusive result to the bypassing and tampering problem.
There is another aspect worth mentioning: It is very hard for developers (and even harder for evaluators) to identify all interfaces to, say, the kernel. The prime example is the IOCTLs in a Unix kernel. Without the ability to look into the code, how can an evaluator be sure that all interfaces have been identified and tested? Any black-box approach will seriously fall short of identifying and covering all security-sensitive interfaces.
Of course you can use the binaries, disassemble it and apply black-hat style assessments to find and exploit deficiencies if you do not have access to design and source code. But that approach is not covered with the newly suggested evaluation approach either.
My current verdict to the newly suggested approach can be summarized as:
Trying to perform a security assessment using a black-box approach will result in a failure to achieve the intended goal. To state it differently, you do not expect airplane inspectors to just check if a plane looks nice, check if it has wings, flaps, and other stuff necessary to call it a plane - and then just do a few flights -- the inspectors will look at the internals of the plane in detail, and they do that for a reason -- is this reason so different from our reasons?