To report, or not to report…

To report, or not to report…

on Jun 6, 11 • by Gwyn Fisher • with No Comments

Creating a source code analysis (SCA) engine is a balancing act, a decision process of where you believe the most value can be found along the spectrum that is the signal-to-noise ratio of the detection process. At one end lies the realm of massive noise and hopefully complete coverage, whilst at the other is the ...

Home » Static Analysis » To report, or not to report…

Creating a source code analysis (SCA) engine is a balancing act, a decision process of where you believe the most value can be found along the spectrum that is the signal-to-noise ratio of the detection process. At one end lies the realm of massive noise and hopefully complete coverage, whilst at the other is the quiet calm of the theoretically useful but ultimately useless realm of no noise, but ultimately no signal either.

That may sound counter-intuitive. Shouldn’t a zero noise point on the spectrum be accompanied by an infinitely strong signal? Perhaps in the world of DSP this is true, but in the world of SCA reducing noise comes right along with a reduction in detection capability – it’s unfortunately almost a straight-line correlation.

So if we assume that we’re trying to balance a couple of dials on our theoretical tuner, we might start by reducing or dampening noise – it’s the most obvious place to start, after all. Nobody likes to listen to their favorite FM station through the curtain of hissing and popping that accompanies the act of driving through a major city.  Likewise no developer likes sifting through a long list of bogus detection errors in order to find the hidden gems. But to drag out the analogy, assume that the only way of reducing hiss on your FM signal is to turn down the volume… now you’ve got less hiss, but also less Bruce Springsteen goodness to accompany it.

Balance is what we need here, obviously. Enough Boss to make us ignore the hiss, or to put it in a more SCA-like context, enough interesting bugs to make us ignore the incorrect, or the irrelevant (correct detections on the part of the engine that the developer just doesn’t care about, e.g. low memory conditions in a memory-insensitive environment).

Consider the following simple example that clearly lies “on the line”:

 void foo(char* s, int a) {    char* s1 = s;  if( a > 0 )  *s1 = 'a';   // potentially use an uninitialized ‘s1’  }  void bar(int m) { char *s;  foo(s, m);      // s is not initialized prior to calling ‘foo’ }

So… to report, or not to report?

Lacking any other information, it is obvious that function ‘foo’ interacts under certain situations (when parameter ‘a’ is positive) with parameter ‘s’ (aliased as local variable ‘s1’). As we have no knowledge about the provenance of parameter ‘s’ when analyzing ‘foo’, however, there’s nothing here to cause a report and so we squirrel away the knowledge of what ‘foo’ does for later use.

When analyzing ‘bar’ we know what ‘foo’ does, and we know we’ve got an uninitialized local pointer, ‘s’. But again we’re lacking enough knowledge to know the valid values, or ranges, that parameter ‘m’ may take. There are definitely a set of circumstances here in which we know a problem will occur (if parameter ‘m’ is positive), and a set of circumstances in which we know a problem will not occur (if parameter ‘m’ is zero or negative) – this much is encoded in the functional behavior of ‘foo’. But is it a defect, or should we filter out the report in favor of providing only those situations in which we can be “sure” the bug not only exists, but can be proven to be exercised?

There’s the art of balance in a nut-shell, and it revolves around the phrase “lacking any other information.” In the ideal world, lacking any restrictions in terms of time, memory or computing power (or indeed actual from-the-wall power, as we have to worry about now), we might defer all such decisions until we categorically know that a particular data value is passed down the call graph far enough to get to ‘foo’. But in the real world of multi-million LOC projects, that approach simply can’t scale.

And so, calling on balance as our friend, we can bias a localized decision to report or not, given that we know to at least one order of approximation that bad things could happen here. Different engines pronounce that bias differently, leading to one of the greatest divides between prevalent solutions.

Now ask yourself, as the developer, is it a worthy report if you know that 10 levels up the call graph there’s a check on what eventually becomes parameter ‘m’ to ensure that it’s never positive? Perhaps you’d automatically classify this as a false positive and, annoyed at the tool, move onto the next report. Or perhaps, seeing the size of the gap in the call graph, you might just choose to code defensively, initializing ‘s’ to NULL in ‘bar’ and adding guard code to ‘foo’ because, hey, you never know.

And as we’ve all seen so many times over the years, “you never know” might just as well be written “and so it came to pass…”

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Scroll to top