With every release, the Klocwork R&D department at Rogue Wave is given the opportunity to participate in a Hackathon. The Hackathon is 4 days in length and is dedicated to working on anything you want – as long as its loosely related to Klocwork. In the same week, participants present their ‘hack’ to their peers. The open format of the hackathon gives an opportunity for R&D to showcase features that may otherwise not make the product roadmap.
I was the winner of Klocwork’s first Hackathon, and having owned the highly coveted Hacksaw trophy for a few sprints now, it was time to defend my title.
Inspired by PVS-Studio’s blog, “How the PVS-Studio team improved Unreal Engine’s code“, I decided to take on the challenge of running Klocwork static code analysis on an industry-leading game engine, Unreal Engine 4.
There were many questions my static analysis nerd brain wanted answered… Why did PVS-Studio require a team and 17 days to perform analysis? Can our static analyzer even analyze this? Why was the number of real PVS-Studio bugs so low for such a large project? Why were edits made “because of the need to please the analyzer”?
There was only 1 (big) problem… I only had 2 work days to do this. After realizing the Unreal Engine 4 code base is over 2 million lines of code, I immediately ran to the documentation department, stole their video camera, and started my Hackathon time lapse.
Some rambling about static analysis in the game development industry
Before we get into the deep technical stuff, I’d like to provide some context to this project. The Klocwork static analyzer is quite experienced. Not only in its age, but in its evolution. The analyzer evolved by consuming trillions lines of code from mission critical software. In my (relatively) short time here, I’ve witnessed static analysis of NASA code destined for moons, military code for
redacted, and banking software which ensures the rounding scam in Office Space doesn’t happen again. All of this software follows a common pattern: legacy code for legacy standards, bare to the metal compilers with concise feature sets, and source mostly written with static analysis compliance in mind. You know what they say, you are what you eat.
Unlike the mission critical software industries, one can make the argument that the game development industry is their polar opposite: There is incredibly little legacy code since game engines are rewritten from the ground up at least once a decade. The compilers (and developers) abuse every optimization trick available for their target platform. Code is written with no regard for static analysis compliance (as it should be). It is a “we ship with bugs” industry. A game industry developer’s mentality towards static analysis results are also polar opposite – a medical device manufacturer would gladly sort through a 99% false positive rate to find the 1% true issue while our friends at Nintendo and Capcom would simply uninstall our tool if we fed their developers noise all day.
The good news for the static analysis vendors is that the fundamentals of static analysis are common to all industries. Build some abstract syntax trees (AST) and run some xpath-like checkers. Build some axioms via path analysis and run some more source & sink checkers. What does differentiate the industries is the complexity and the abstraction of the code analyzed. Good luck trying to statically track memory allocation and deallocation in EA’s proprietary standard type libraries (EASTL) in Frostbite 3 out-of-the-box. I assure you this is a non-issue over at TD Bank and their C++98 compliant Point of Sale (POS).
What I’m trying to get at is analysis of software in the game development industry is really hard. This is the industry pushing the boundaries of compilers and their feature sets (clang, C++11/14/17). This is the industry where functional testing over unit testing is the standard. This is the industry that expects our Visual Studio 2015 integration to be available before Microsoft has even released the IDE to the public. I believe this is the industry through which static analysis should judge it’s success, because the rest is relatively easy.
Analyzing Unreal Engine 4
Get the thing to build. Fortunately for me, the Unreal Engine 4 “Getting Started” documentation is excellent. After signing up over at unrealengine.com and checking out their GitHub project I was good to go. Unfortunately, the internet speeds in Canadia are not the greatest and I burnt a good chunk of valuable time waiting for git checkout. After the checkout, while waiting for the “Development Editor” configuration to compile via MSBuild 2013, I took some time to poke around the Unreal Engine code.
The first thing I hunted for was evidence of conditional compilation and assert macros in the source code. There’s a good chance the “Development Editor” configuration in Visual Studio conditionally disabled a lot of asserts via the preprocessor, yet the disabled code may still contain valuable information for the static analyzer to consume. The compilation of Unreal Engine 4 took about 30 minutes and during that time I managed to find a few assert macros that I wanted our analyzer to retain, such as “check”, “verify”, etc. I notepad’d them into a text file for use later.
Now that I verified Unreal Engine 4 compiles without errors on my platform, I was ready to start the static analysis process.
The static analyzer needs to know what to analyze. Klocwork coins this term the “build specification”. The build specification is a human-readable text file providing the analyzer with information on how the project’s compiler interprets files in your project. Information such as
sizeof() standard types, predefined macros assumed by the preprocessor, ‘compiler features’ that are actually bugs (thanks Microsoft), etc.
Klocwork ships with a collection of out-of-the-box tools to generate the build specification, such as Visual Studio project parsers, compiler wrappers, etc. I decided to go with our Swiss army knife, a utility called kwinject. On the command line, instead of calling
msbuild ue4.sln /t:rebuild, I called
kwinject msbuild ue4.sln /t:rebuild to capture compilation information and populate the build specification.
I’m guessing this step is similar to the CLMonitor.exe utility that PVS-Studio had to write and, guessing again, this utility is only able to analyze Unreal Engine 4 build configurations that target the cl.exe compiler (Windows and XBox). Our build specification generation tools have the additional advantage of capturing C# and Java compilation from 200+ different compilers, such as Sony’s ORBIS Clang PS4 compiler or Nintendo’s wacky ARM compilers.
Step start the analysis!
With 75% of my time remaining (1 minute, 30 seconds into the 5 minute time lapse video above) I moved all the tuning sliders to “Please, destroy my machine” and started the static analysis engine. My 8-core CPU was pinned to 100% for the next few hours as the analysis baseline was built. As soon as the analysis results started streaming in, I began reviewing them.
The Unreal Engine 4 source contained significantly more C# code than I expected. Given the short time frame, I made a decision to defer reviewing C# analysis results for a later time and focus my efforts on C/C++ issues.
There were 1311 C/C++ issues detected and, by the end of the Hackathon, I managed to review 302 of them. This is my first exposure to Unreal Engine source and reviewing source of something you’re not the domain expert of is quite a challenge. Out of the 302 issues reviewed, 213 of them appeared to be legitimate findings of various severites. From the 213 legitimate findings, I believe only 39 of the findings were severe and conclusive enough to warrant immediate attention. Extrapolating from my findings, I suspected there were ~169 issues in the Unreal Engine 4 code base that were severe enough to warrant attention. Here are a few issues that I found interesting (and concise enough) for this blog post.
Oculus’ low persistence mode check in HeadMountedDisplayCommon has a NULL pointer dereference
Suspicious dereference of pointer ‘frame’ before NULL check at line 5
Here, the current ‘frame’ is grabbed on line 3 via
GetCurrentFrame() on the assumption that ‘frame’ is not NULL. This assumption is a bit weird as
GetCurrentFrame() has multiple ways to return nullptr, but we’ll go with it. On the following line (4) we dereference
frame for its
Settings pointer. The author attempts to guard for the NULL
Settings but mistypes the conditional.
The condition was likely intended to say
return FrameSettings && FrameSettings->Flags.bLowPersistenceMode;
If the original conditional was intended, and the developer believes
frame can be NULL, then we’d have a NULL pointer exception on line 4, before reaching the guard on the return statement. If you’ve run into low persistence mode issues in Unreal + Oculus, this check may be why.
Comparison of unsigned value against 0 is always true
These are harmless but popped up in quite a few places.
Unnecessary validation of LocalUserNum >= 0
Unnecessary validation of DeviceId >= 0
I’m not aware of any compiler that targets a gaming platform and treats uint as a signed int, and thus these are needless checks. I thought that these patterns may have resulted from the great 32bit -> 64bit port and argument types in the function signature were updated after the code was written… but then I realized the example is from Valve’s SteamVR, well into the 64bit era.
The processor will hit these branches a few times and have fun with its branch prediction, but after the branch optimizer realizes its always branching in the same direction, the impact would be insignificant. This is dead code at worst.
BlueprintNodeHelpers not so helpful
Pointer ‘Ob->GetClass() ->FindFunctionByName(FuncName)’ returned from call to function ‘FindFunctionByName’ at line 3 may be NULL and will be dereferenced.
HasBlueprintFunction() is intended to return true if the UObject
Ob contains function
FuncName, and false if it does not. Since the definition of
FindFunctionByName() returns NULL when the function is not found,
GetOuter() dereferences the NULL pointer unconditionally.
HasBlueprintFunction() presently returns true if the UObject
Ob contains function
FuncName, and throws a NULL pointer exception if it does not. This code is for the AIModule so the robots might be confused.
Suspicious NULL checks related to replicated movement in CharacterMovementComponent
On line 13, pointer CharacterOwner is dereferenced to check if bReplicateMovement is set. If it is set, the CharacterOwner is once again checked for NULL on line 16 and 17.
I was going to bucket this one with the “Comparison of unsigned value against 0 is always true” and “is harmless” category but couldn’t for the life of me figure out what the NULL checks were for. I hopped on GitHub, hit blame, and tried to figure out what the commit was about:
As you can see, the previous code validated the
CharacterOwner pointer before dereferencing it. The updated code dereferences
CharacterOwner in the if condition, and later verifies it is not NULL.
CharacterOwner can potentially be NULL, which the GitHub history indicates ‘yes’, there is a NULL pointer exception on line 13. If
CharacterOwner can’t be NULL, there is a lot of unnecessary conditionals, both historically and in the current revision.
Suspicious handling of streaming media types in WindowsMoviePlayer plugin
Null pointer ‘SinkActivate’ that comes from line 6 may be dereferenced at line 40.
On line 6 we create a NULL pointer
SinkActive. We hope that at least one conditional, on line 16 or 21, evaluates to true and initializes
SinkActive. On line 40 we unconditionally
Looking at definitions for MFMediaTypes on line 45+, there are certainly a lot more than
MFMediaType_Default seems like reasonable input, as well as
MFMediaType_Stream. Unfortunately anything outside of _Audio and _Video media types would result in
SinkActive never being initialized and thus
Release() dereferencing a null pointer on line 40.
I tried to give this issue the benefit of the doubt and look for guards outside of this function, something along the lines of “Call this function only on MFMediaType_Video and MFMediaType_Audio media types”, but there were none.
Suspicious NULL check in VelocityRender’s HasVelocity.
Suspicious dereference of pointer ‘PrimitiveSceneInfo->Proxy’ by function ‘IsMovable’ at line 6 before NULL check at line 21.
Out of the 302 issues reviewed, I found 31 issues of this type – the developer would dereference a pointer and shortly after check it for NULL. In the above example,
PrimitiveSceneInfo->Proxy is dereferenced on line 6 and then ‘check’ed for NULL on line 21. If
PrimitiveSceneInfo->Proxy is expected to be NULL, the static analyzer has found a valid problem. If
PrimitiveSceneInfo->Proxy cannot be NULL, the static analyzer is confused as to why the developer decided to check it for NULL on line 21. I would argue that the NULL assertion of pointers should be performed before the pointer is dereferenced.
Klocwork analyzed 2,292,918 lines of code from Unreal Engine 4 and detected 1311 C/C++ issues of varying severities. Out of 1311 detected issues, 302 were reviewed by a human. Of those reviewed, 213 appeared to be legitimate findings. From the 213 legitimate findings, only 39 were believe to be severe and conclusive enough to warrant immediate attention. Extrapolating from the human-reviewed sample, it is suspected that there are ~169 issues in the Unreal Engine 4 source code which are severe enough to warrant immediate attention.
By comparing issue density in Unreal Engine 4 to other codebases in the game development industry, Unreal Engine 4’s code quality is in a league of its own, achieving issue density nearing large mission critical software projects. I would guess that Unreal Engine 4’s code quality success is attributed to two factors: Static analysis by PVS-Studio and Unreal’s open source initiative.
Given that, I was not able to fully understand the analysis time invested by PVS-Studio. The 1,800+ issues PVS-Studio detected in their most recent analysis is an incredible number of issues to review, let alone commit fixes for in a span of 17 days. Their statement, “The number of real bugs detected in the code is very small for such a large project,” makes it safe to assume that the PVS-Studio analysis of Unreal Engine 4 was a battle of citing false positives, making edits “because of the need to please the analyzer”, and not coding bug fixes. Regardless of PVS-Studios’ false positive rate, about ~169 high severity issues were missed. There were hundreds of lower severity issues missed as well, but at this point Unreal has bigger fish to fry.
OK this blog is getting way too long. Never seen Unreal Engine 4. Found lots of critical issues in less than ~12 Hackathon hours. Try static analysis. Thanks for reading bye!