ue4logo-large

Hackathon 2: Time lapse analysis of Unreal Engine 4

on Nov 5, 15 • by Michail Greshishchev • with 2 Comments

A Klocwork developer is inspired to run an analysis of the popular Unreal Engine 4 game engine. Here's the entire story, from install to analysis to results...

Home » Static Analysis » Hackathon 2: Time lapse analysis of Unreal Engine 4

With every release, the Klocwork R&D department at Rogue Wave is given the opportunity to participate in a Hackathon. The Hackathon is 4 days in length and is dedicated to working on anything you want – as long as its loosely related to Klocwork. In the same week, participants present their ‘hack’ to their peers. The open format of the hackathon gives an opportunity for R&D to showcase features that may otherwise not make the product roadmap.

It still runs.. which might explain why the blade was replaced with paper prior to letting Engineers touch it.

I was the winner of Klocwork’s first Hackathon, and having owned the highly coveted Hacksaw trophy for a few sprints now, it was time to defend my title.

Questions

Inspired by PVS-Studio’s blog, “How the PVS-Studio team improved Unreal Engine’s code“, I decided to take on the challenge of running Klocwork static code analysis on an industry-leading game engine, Unreal Engine 4.

There were many questions my static analysis nerd brain wanted answered… Why did PVS-Studio require a team and 17 days to perform analysis? Can our static analyzer even analyze this? Why was the number of real PVS-Studio bugs so low for such a large project? Why were edits made “because of the need to please the analyzer”?

There was only 1 (big) problem… I only had 2 work days to do this. After realizing the Unreal Engine 4 code base is over 2 million lines of code, I immediately ran to the documentation department, stole their video camera, and started my Hackathon time lapse.

Since this is a long read, you can skip straight to the interesting findings or the conclusion.

Some rambling about static analysis in the game development industry

Before we get into the deep technical stuff, I’d like to provide some context to this project. The Klocwork static analyzer is quite experienced. Not only in its age, but in its evolution. The analyzer evolved by consuming trillions lines of code from mission critical software. In my (relatively) short time here, I’ve witnessed static analysis of NASA code destined for moons, military code for redacted, and banking software which ensures the rounding scam in Office Space doesn’t happen again. All of this software follows a common pattern: legacy code for legacy standards, bare to the metal compilers with concise feature sets, and source mostly written with static analysis compliance in mind. You know what they say, you are what you eat.

Unlike the mission critical software industries, one can make the argument that the game development industry is their polar opposite: There is incredibly little legacy code since game engines are rewritten from the ground up at least once a decade. The compilers (and developers) abuse every optimization trick available for their target platform. Code is written with no regard for static analysis compliance (as it should be). It is a “we ship with bugs” industry. A game industry developer’s mentality towards static analysis results are also polar opposite – a medical device manufacturer would gladly sort through a 99% false positive rate to find the 1% true issue while our friends at Nintendo and Capcom would simply uninstall our tool if we fed their developers noise all day.

The good news for the static analysis vendors is that the fundamentals of static analysis are common to all industries. Build some abstract syntax trees (AST) and run some xpath-like checkers. Build some axioms via path analysis and run some more source & sink checkers. What does differentiate the industries is the complexity and the abstraction of the code analyzed. Good luck trying to statically track memory allocation and deallocation in EA’s proprietary standard type libraries (EASTL) in Frostbite 3 out-of-the-box. I assure you this is a non-issue over at TD Bank and their C++98 compliant Point of Sale (POS).

What I’m trying to get at is analysis of software in the game development industry is really hard. This is the industry pushing the boundaries of compilers and their feature sets (clang, C++11/14/17). This is the industry where functional testing over unit testing is the standard. This is the industry that expects our Visual Studio 2015 integration to be available before Microsoft has even released the IDE to the public. I believe this is the industry through which static analysis should judge it’s success, because the rest is relatively easy.

Analyzing Unreal Engine 4

Time lapse video of the analysis

Step one!

Get the thing to build. Fortunately for me, the Unreal Engine 4 “Getting Started” documentation is excellent. After signing up over at unrealengine.com and checking out their GitHub project I was good to go. Unfortunately, the internet speeds in Canadia are not the greatest and I burnt a good chunk of valuable time waiting for git checkout. After the checkout, while waiting for the “Development Editor” configuration to compile via MSBuild 2013, I took some time to poke around the Unreal Engine code.

The first thing I hunted for was evidence of conditional compilation and assert macros in the source code. There’s a good chance the “Development Editor” configuration in Visual Studio conditionally disabled a lot of asserts via the preprocessor, yet the disabled code may still contain valuable information for the static analyzer to consume. The compilation of Unreal Engine 4 took about 30 minutes and during that time I managed to find a few assert macros that I wanted our analyzer to retain, such as “check”, “verify”, etc. I notepad’d them into a text file for use later.

Now that I verified Unreal Engine 4 compiles without errors on my platform, I was ready to start the static analysis process.

Step two-ish?

The static analyzer needs to know what to analyze. Klocwork coins this term the “build specification”. The build specification is a human-readable text file providing the analyzer with information on how the project’s compiler interprets files in your project. Information such as sizeof() standard types, predefined macros assumed by the preprocessor, ‘compiler features’ that are actually bugs (thanks Microsoft), etc.

Klocwork ships with a collection of out-of-the-box tools to generate the build specification, such as Visual Studio project parsers, compiler wrappers, etc. I decided to go with our Swiss army knife, a utility called kwinject. On the command line, instead of calling msbuild ue4.sln /t:rebuild, I called kwinject msbuild ue4.sln /t:rebuild to capture compilation information and populate the build specification.

I’m guessing this step is similar to the CLMonitor.exe utility that PVS-Studio had to write and, guessing again, this utility is only able to analyze Unreal Engine 4 build configurations that target the cl.exe compiler (Windows and XBox). Our build specification generation tools have the additional advantage of capturing C# and Java compilation from 200+ different compilers, such as Sony’s ORBIS Clang PS4 compiler or Nintendo’s wacky ARM compilers.

Step start the analysis!

With 75% of my time remaining (1 minute, 30 seconds into the 5 minute time lapse video above) I moved all the tuning sliders to “Please, destroy my machine” and started the static analysis engine. My 8-core CPU was pinned to 100% for the next few hours as the analysis baseline was built. As soon as the analysis results started streaming in, I began reviewing them.

Reviewing the Unreal Engine 4 results

The Unreal Engine 4 source contained significantly more C# code than I expected. Given the short time frame, I made a decision to defer reviewing C# analysis results for a later time and focus my efforts on C/C++ issues.

There were 1311 C/C++ issues detected and, by the end of the Hackathon, I managed to review 302 of them. This is my first exposure to Unreal Engine source and reviewing source of something you’re not the domain expert of is quite a challenge. Out of the 302 issues reviewed, 213 of them appeared to be legitimate findings of various severites. From the 213 legitimate findings, I believe only 39 of the findings were severe and conclusive enough to warrant immediate attention. Extrapolating from my findings, I suspected there were ~169 issues in the Unreal Engine 4 code base that were severe enough to warrant attention. Here are a few issues that I found interesting (and concise enough) for this blog post.

Oculus’ low persistence mode check in HeadMountedDisplayCommon has a NULL pointer dereference

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
bool FHeadMountedDisplay::IsInLowPersistenceMode() const
{
    const auto frame = GetCurrentFrame();
    const auto FrameSettings = frame->Settings;
    return frame && FrameSettings->Flags.bLowPersistenceMode;
}

FHMDGameFrame* FHeadMountedDisplay::GetCurrentFrame() const
{
    if (!Frame.IsValid())
    {
        return nullptr;
    }

    check(IsInGameThread());
    if (Frame->Flags.bOutOfFrame && RenderFrame.IsValid() && RenderFrame->Flags.bOutOfFrame) {...}
    if (Frame->FrameNumber == GFrameCounter || Frame->Flags.bOutOfFrame) {...}

    return nullptr;
}

Suspicious dereference of pointer ‘frame’ before NULL check at line 5

Here, the current ‘frame’ is grabbed on line 3 via GetCurrentFrame() on the assumption that ‘frame’ is not NULL. This assumption is a bit weird as GetCurrentFrame() has multiple ways to return nullptr, but we’ll go with it. On the following line (4) we dereference frame for its Settings pointer. The author attempts to guard for the NULL Settings but mistypes the conditional.

The condition was likely intended to say return FrameSettings && FrameSettings->Flags.bLowPersistenceMode;

If the original conditional was intended, and the developer believes frame can be NULL, then we’d have a NULL pointer exception on line 4, before reaching the guard on the return statement. If you’ve run into low persistence mode issues in Unreal + Oculus, this check may be why.

Comparison of unsigned value against 0 is always true

These are harmless but popped up in quite a few places.

1
2
3
4
5
6
void FOnlineVoiceSteam::StartNetworkedVoice(uint8 LocalUserNum)
{
    // Validate the range of the entry
    if (LocalUserNum >= 0 && LocalUserNum < MAX_LOCAL_PLAYERS) {...}
    else {...}
}

Unnecessary validation of LocalUserNum >= 0

1
2
3
4
5
6
7
bool FSteamVRHMD::GetTrackedObjectOrientationAndPosition(uint32 DeviceId, FQuat& CurrentOrientation, FVector& CurrentPosition)
{
    check(IsInGameThread());
    bool bHasValidPose = false;
    if (DeviceId >= 0 && DeviceId < vr::k_unMaxTrackedDeviceCount) {...}
    return bHasValidPose;
}

Unnecessary validation of DeviceId >= 0

I’m not aware of any compiler that targets a gaming platform and treats uint as a signed int, and thus these are needless checks. I thought that these patterns may have resulted from the great 32bit -> 64bit port and argument types in the function signature were updated after the code was written… but then I realized the example is from Valve’s SteamVR, well into the 64bit era.

The processor will hit these branches a few times and have fun with its branch prediction, but after the branch optimizer realizes its always branching in the same direction, the impact would be insignificant. This is dead code at worst.

BlueprintNodeHelpers not so helpful

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
FORCEINLINE bool HasBlueprintFunction(FName FuncName, const UObject* Ob, const UClass* StopAtClass)
{
    return (Ob->GetClass()->FindFunctionByName(FuncName)->GetOuter() != StopAtClass);
}

UFunction* UClass::FindFunctionByName(FName InName, EIncludeSuperFlag::Type IncludeSuper) const
{
    if (!IncludeSuper)
        return FuncMap.FindRef(InName);
    for (const UClass* SearchClass = this; SearchClass; SearchClass = SearchClass->GetSuperClass()) {...}
    return NULL;
}

FORCEINLINE UObject* GetOuter() const
{
    return Outer;
}

Pointer ‘Ob->GetClass() ->FindFunctionByName(FuncName)’ returned from call to function ‘FindFunctionByName’ at line 3 may be NULL and will be dereferenced.

HasBlueprintFunction() is intended to return true if the UObject Ob contains function FuncName, and false if it does not. Since the definition of FindFunctionByName() returns NULL when the function is not found, GetOuter() dereferences the NULL pointer unconditionally.

HasBlueprintFunction() presently returns true if the UObject Ob contains function FuncName, and throws a NULL pointer exception if it does not. This code is for the AIModule so the robots might be confused.

Suspicious NULL checks related to replicated movement in CharacterMovementComponent

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
void UCharacterMovementComponent::SimulatedTick(float DeltaSeconds)
{
    SCOPE_CYCLE_COUNTER(STAT_CharacterMovementSimulated);
    // If we are playing a RootMotion AnimMontage.
    if (CharacterOwner && CharacterOwner->IsPlayingNetworkedRootMotionMontage()) {...}
    // Not playing RootMotion AnimMontage
    else
    {
        // if we were simulating root motion, we've been ignoring regular ReplicatedMovement updates.
        // If we're not simulating root motion anymore, force us to sync our movement properties.
        // (Root Motion could leave Velocity out of sync w/ ReplicatedMovement)
        if( bWasSimulatingRootMotion ) {...}
        if (CharacterOwner->bReplicateMovement)
        {
            if ((UpdatedComponent->IsSimulatingPhysics()
                || (CharacterOwner && CharacterOwner->IsMatineeControlled())
                || (CharacterOwner && CharacterOwner->IsPlayingRootMotion()))) {...}
            else {...}
        }
    }
    if( GetNetMode() == NM_Client ) {...}
}

On line 13, pointer CharacterOwner is dereferenced to check if bReplicateMovement is set. If it is set, the CharacterOwner is once again checked for NULL on line 16 and 17.

I was going to bucket this one with the “Comparison of unsigned value against 0 is always true” and “is harmless” category but couldn’t for the life of me figure out what the NULL checks were for. I hopped on GitHub, hit blame, and tried to figure out what the commit was about:

-    if( UpdatedComponent->IsSimulatingPhysics() 
-        || (CharacterOwner && CharacterOwner->IsMatineeControlled()) 
-        || (CharacterOwner && CharacterOwner->IsPlayingRootMotion()))
+    if (CharacterOwner->bReplicateMovement)
     {
-        PerformMovement(DeltaSeconds); 
-    } 
-    else 
-    { 
-        SimulateMovement(DeltaSeconds);
+        if ((UpdatedComponent->IsSimulatingPhysics()

See the commit here (you need to register here to see it)

As you can see, the previous code validated the CharacterOwner pointer before dereferencing it. The updated code dereferences CharacterOwner in the if condition, and later verifies it is not NULL.

If CharacterOwner can potentially be NULL, which the GitHub history indicates ‘yes’, there is a NULL pointer exception on line 13. If CharacterOwner can’t be NULL, there is a lot of unnecessary conditionals, both historically and in the current revision.

Suspicious handling of streaming media types in WindowsMoviePlayer plugin

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
FIntPoint FVideoPlayer::AddStreamToTopology(IMFTopology* Topology, IMFPresentationDescriptor* PresentationDesc, IMFStreamDescriptor* StreamDesc, FSampleGrabberCallback* SampleGrabberCallback)
{
    FIntPoint OutDimensions = FIntPoint(ForceInit);

    HRESULT HResult = S_OK;
    IMFActivate* SinkActivate = NULL;
    {
        IMFMediaTypeHandler* Handler = NULL;
        HResult = StreamDesc->GetMediaTypeHandler(&Handler);
        check(SUCCEEDED(HResult));

        GUID MajorType;
        HResult = Handler->GetMajorType(&MajorType);
        check(SUCCEEDED(HResult));

        if (MajorType == MFMediaType_Audio)
        {
            HResult = MFCreateAudioRendererActivate(&SinkActivate);
            check(SUCCEEDED(HResult));
        }
        else if (MajorType == MFMediaType_Video)
        {
            //cut code for example clarity
            HResult = MFCreateSampleGrabberSinkActivate(InputType, SampleGrabberCallback, &SinkActivate);
            check(SUCCEEDED(HResult));
        }

        Handler->Release();
    }

    IMFTopologyNode* SourceNode = NULL; {...}

    IMFTopologyNode* OutputNode = NULL; {...}

    HResult = SourceNode->ConnectOutput(0, OutputNode, 0);
    check(SUCCEEDED(HResult));

    SourceNode->Release();
    OutputNode->Release();
    SinkActivate->Release();

    return OutDimensions;
}

DEFINE_GUID(MFMediaType_Default)
DEFINE_GUID(MFMediaType_Audio)
DEFINE_GUID(MFMediaType_Video)
DEFINE_GUID(MFMediaType_Protected)
DEFINE_GUID(MFMediaType_SAMI)
DEFINE_GUID(MFMediaType_Script)
DEFINE_GUID(MFMediaType_Image)
DEFINE_GUID(MFMediaType_HTML)
DEFINE_GUID(MFMediaType_Binary)
DEFINE_GUID(MFMediaType_FileTransfer)
DEFINE_GUID(MFMediaType_Stream)

Null pointer ‘SinkActivate’ that comes from line 6 may be dereferenced at line 40.

On line 6 we create a NULL pointer SinkActive. We hope that at least one conditional, on line 16 or 21, evaluates to true and initializes SinkActive. On line 40 we unconditionally Release() SinkActive.

Looking at definitions for MFMediaTypes on line 45+, there are certainly a lot more than MFMediaType_Audio and MFMediaType_Video. MFMediaType_Default seems like reasonable input, as well as MFMediaType_Stream. Unfortunately anything outside of _Audio and _Video media types would result in SinkActive never being initialized and thus Release() dereferencing a null pointer on line 40.

I tried to give this issue the benefit of the doubt and look for guards outside of this function, something along the lines of “Call this function only on MFMediaType_Video and MFMediaType_Audio media types”, but there were none.

Suspicious NULL check in VelocityRender’s HasVelocity.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
bool FVelocityDrawingPolicy::HasVelocity(const FViewInfo& View, const FPrimitiveSceneInfo* PrimitiveSceneInfo)
{
    checkSlow(IsInParallelRenderingThread());

    // No velocity if motionblur is off, or if it's a non-moving object (treat as background in that case)
    if (View.bCameraCut || !PrimitiveSceneInfo->Proxy->IsMovable()) {...}

    if (PrimitiveSceneInfo->Proxy->AlwaysHasVelocity()) {...}

    if (PrimitiveSceneInfo->bVelocityIsSupressed) {...}

    // check if the primitive has moved
    {
        FMatrix PreviousLocalToWorld;

        // hack
        FScene* Scene = PrimitiveSceneInfo->Scene;

        if (Scene->MotionBlurInfoData.GetPrimitiveMotionBlurInfo(PrimitiveSceneInfo, PreviousLocalToWorld))
        {
            check(PrimitiveSceneInfo->Proxy);

            const FMatrix& LocalToWorld = PrimitiveSceneInfo->Proxy->GetLocalToWorld();

            // Hasn't moved (treat as background by not rendering any special velocities)?
            if (LocalToWorld.Equals(PreviousLocalToWorld, 0.0001f)) {...}
        }
        else  {...}
    }

    return true;
}

Suspicious dereference of pointer ‘PrimitiveSceneInfo->Proxy’ by function ‘IsMovable’ at line 6 before NULL check at line 21.

Out of the 302 issues reviewed, I found 31 issues of this type – the developer would dereference a pointer and shortly after check it for NULL. In the above example, PrimitiveSceneInfo->Proxy is dereferenced on line 6 and then ‘check’ed for NULL on line 21. If PrimitiveSceneInfo->Proxy is expected to be NULL, the static analyzer has found a valid problem. If PrimitiveSceneInfo->Proxy cannot be NULL, the static analyzer is confused as to why the developer decided to check it for NULL on line 21. I would argue that the NULL assertion of pointers should be performed before the pointer is dereferenced.

Conclusion + TL;DR

Klocwork analyzed 2,292,918 lines of code from Unreal Engine 4 and detected 1311 C/C++ issues of varying severities. Out of 1311 detected issues, 302 were reviewed by a human. Of those reviewed, 213 appeared to be legitimate findings. From the 213 legitimate findings, only 39 were believe to be severe and conclusive enough to warrant immediate attention. Extrapolating from the human-reviewed sample, it is suspected that there are ~169 issues in the Unreal Engine 4 source code which are severe enough to warrant immediate attention.

By comparing issue density in Unreal Engine 4 to other codebases in the game development industry, Unreal Engine 4’s code quality is in a league of its own, achieving issue density nearing large mission critical software projects. I would guess that Unreal Engine 4’s code quality success is attributed to two factors: Static analysis by PVS-Studio and Unreal’s open source initiative.

Given that, I was not able to fully understand the analysis time invested by PVS-Studio. The 1,800+ issues PVS-Studio detected in their most recent analysis is an incredible number of issues to review, let alone commit fixes for in a span of 17 days. Their statement, “The number of real bugs detected in the code is very small for such a large project,” makes it safe to assume that the PVS-Studio analysis of Unreal Engine 4 was a battle of citing false positives, making edits “because of the need to please the analyzer”, and not coding bug fixes. Regardless of PVS-Studios’ false positive rate, about ~169 high severity issues were missed. There were hundreds of lower severity issues missed as well, but at this point Unreal has bigger fish to fry.

OK this blog is getting way too long. Never seen Unreal Engine 4. Found lots of critical issues in less than ~12 Hackathon hours. Try static analysis. Thanks for reading bye!

Related Posts

2 Responses to Hackathon 2: Time lapse analysis of Unreal Engine 4

  1. Recently there appeared an article “Hackathon 2: Time lapse analysis of Unreal Engine 4”, which describes how you can find a great deal of bugs in Unreal Engine 4 using Klockwork. I just can’t help commenting on this article. The thing is that once we fixed all the bugs that PVS-Studio analyzer found in this project. Of course, we haven’t worked on all bugs existing in the project, only on those that were detected by our analyzer. However, the article creates an impression that the PVS-Studio analyzer skipped too many bugs. Well, I guess now it’s my turn to say something. I have also rechecked Unreal Engine 4 and found plenty of another bugs. So I can claim that PVS-Studio can find new bugs in Unreal Engine 4. It’s a draw.

    Link: http://www.viva64.com/en/b/0356/

  2. Chrysaour says:

    This looks great. I’d love to see more static analysis on various game engines.

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top