There’s been some recent discussion around using source code analysis (SCA) technology for build clean-up and optimization. I thought it might be useful to try and separate the spin from reality and outline where and how static source code analysis can be used for build optimization.
First, every SCA tool worth its salt does build analysis. Automated discovery of a customer’s build system is a required capability for deep static code analysis. Most users of SCA attempt to discover bugs, security vulnerabilities, and other maintainability problems. Some customers will also leverage the build analysis itself to conduct targeted clean-up of the build. Three common ways that this can be done are:
- Trace file analysis – provides visibility of the entire build process to help find issues and inefficiencies in the build that can impact build times, and ultimately developer productivity.
- Header file analysis – goal here is to identify inefficient and overly complex include structures that can lead to long build times and bloated system size.
- Interface analysis – find low level issues that can cause build failures due to improper API usage.
First, trace file analysis is the process of analyzing, understanding and mapping every process executed during your build process, including all compiler and linker invocations. This kind of analysis is mandatory for a good SCA tool since it is necessary to understand all the full details of how you compile and link code so that detailed models of your system can be generated. The benefit from a build optimization standpoint is that this maps the entire build process and not just your compile and linking process, gives development leads and build managers visibility into the build, any inefficiencies, and where it may just be broken.
The other two types of analysis, header file and interface analysis, is focused on the source code directly but also important to build improvement. Specifically the focus here is on header files themselves and optimizations you can perform. A simple example of this is an include file that is simply never used. Why include it? This adds to the build time, size of the system and not to mention the complexity and maintainability of the system. Finding extra includes is not ground breaking technology but there are various types of issues to look for with header files. Other examples of more complex issues that involve deep analysis are with extra transitive issues or context dependent issues. For example, a missing include with a transitive dependency is a relationship between three or more files. In the following example, the first file, File1.c, includes the second file, header1.h, which, in turn, includes the third file, header2.h. File File1.c uses some symbols from file header2.h, but does not include it directly.
Good practice would have you include header2.h directly, of course developers include header1.h as a means to simply get the build to work. By eliminating instances where, in this example File1.c doesn’t even use anything in header1.h, real reductions in build time, and potentially system size, can be realized. Interface analysis is another type of issue focused on header files again looking for cyclical header files, duplicate header files and a whole slew of other in code issues that can be a nightmare such as multiple definitions or declarations.
Frankly there are limits to what can be done in this area with SCA. There are vendors in the build space such as Electric Cloud or IBM Buildforge who do this for a living and specialize in not only setting up production build environments that scale, but also have tools that complement the kind of optimization described above.