Leigh Davies was working on a football game some years ago and the development team was struggling to optimise it. This was back in the days when on-screen players were made up of 300 polygons. The developers tried turning off the stadium and turning off the players, but they couldn’t work out what was taking all the time. And then somebody took a closer look at the ball: it had 5,000 polygons, more than the rest of the scene put together, and that was doubled because it was also used for the ball’s shadow. It was superbly detailed (it even included the stitching), but it was bringing the whole game’s performance down. Because it had been there since day one, the developers had just assumed that the game’s performance was all down to the code.
It’s exactly that kind of problem, which is equally likely to happen today, that Intel Graphics Performance Analyzers is designed to help solve. If you can optimise early enough, you can add in new features and enhance graphics where they will really be appreciated.
Intel Graphics Performance Analyzers (GPA) is a free suite of tools that have a unique approach to optimising games. It grew out of a tool that Intel was using internally to help games studios to optimise their games. When you have 50 artists working on a game, it can be hard to figure out where the performance goes. GPA makes it easy to tweak the game and then measure and analyse what happens if a particular feature isn’t drawn. As Davies put it, it’s about “how to get the best bang for your buck with the real estate and performance at your disposal.”
Although the tool provides access to all the hardware counters on Intel’s Core i3 and Core i5 chips, which feature integrated graphics, it also works with discrete video cards.
There are three main components to the suite. The first is the System Analyzer. This provides real time metrics including frame rates, CPU utilisation, number of draw calls, state changes, and the number of times a vertex is locked. When used with Intel’s Core i3 and Core i5, it also shows GPU metrics such as the number of pixels drawn, and the memory bandwidth on the card. You can make changes to the game and see the effects on the metrics as the game is running.
The other part of the suite is the Frame Analyzer. If you have a frame that is representative of the game or appears to be particularly slow, you can do a scene capture, and then analyse all the information about the draw calls. You can use the Frame Analyzer to experiment with the frames themselves and tweak things that are internal to your graphics engine.
The latest addition to the suite is the Platform View. This was launched at GDC earlier this year and shows a visual representation of all the tasks running on the system by thread. It reveals the core utilisation and whether a task is waiting on another. While the System Analyzer and Frame Analyzer work without any instrumentation, the Platform View does require some lightweight code additions to mark up the boundaries of tasks if you are to get most out of the tool.
To ensure the overhead of monitoring is small, the suite runs using a client-server model. A small GPA monitor program runs on the PC where the application is run, and that program is used to start the app running. It then monitors what happens and sends its data to another machine where you can analyse performance using System Analyzer or Frame Analyzer. The overhead of monitoring should be tiny if the software is run in this way. If the frame rate is affected by more than 5%, something strange is happening that isn’t down to the monitoring itself. Note that you can run the analysis programs on the same PC as the application you’re monitoring, but you still need to install the GPA monitor and use it to start the game you want to investigate.
System Analyzer shows a wide range of metrics, divided into different areas. The CPU counters can help you to see whether the cores are saturated. The DirectX counters include the frame rate, state changes, and number of locks. There are 32 Intel hardware counters for the Core i3 and Core i5, including the primitives count, how busy the maths box is, how many vertexes are being processed, how much time is being spent on pixel shading.
System Analyzer also provides some simple overrides that you can use without making any changes to your engine, to see where the problem might be. You can turn off the D3D driver to see what happens. If the game doesn’t speed up, the problem is inherent in your game code. You can override all textures and run 2×2 textures. It might look odd, but you can recognise what’s happening in the game and if the game speeds up, it means your textures are causing the slowdown. You can clip the entire screen into a single pixel. That will disable pixel shaders but keep all the geometry processing, which can help identify possible bottlenecks. This all works on any hardware.
Davies gave an example of a scene from Ghostbusters which took place in dark room. The developers were surprised that they didn’t see any speedup when they were just looking at the wall. When the override in System Analyzer was turned on, it was possible to see that the game was rendering the next three rooms behind the wall. Problems like this can easily happen when you have a large team and they don’t all know what everyone else is doing.
Frame Analyzer is designed to provide deep diagnosis of what is taking the time in a frame. This doesn’t run in real time. Instead, you click a button in System Analyzer and it dumps a frame to the Frame Analyzer. A graph at the top of the screen shows how much time the draw calls are taking. There is a separate panel called the erg list that lists any item that renders a pixel, draws, clears, or does a surface copy. You can analyse groups of ergs or individual ones, including colouring them pink on screen so you can easily see which pixels are being affected.
One of the analysis views shows you a greyscale image with dots representing how many times something touched a particular pixel. You can trace the history of any given pixel, so that you can see all the draw calls that affected it. A particular draw call can be disabled, so that you can see the affect that has, or the DirectX render states could be changed and the code rerun with those changes. A wide range of changes can be made without having to change the game at all, with Frame Analyzer modelling those changes on the frame.
Quite impressive speedups are possible with relatively quick fixes, once you have worked out what to fix. Creative Assembly found that its most expensive draw call in Empire Total War on its top level map was the outline of the countries. It was using 40,000 draw calls and 100 instructions. A slight change to the shader and breaking down the mesh into blocks of a couple of hundred miles each resulted in a speedup of 20-30%. They also found that distant trees were consuming a lot of shader time. When it was reduced to almost none, the overall speedup was 15-20%.
Davies stressed that GPA has developed through user feedback, and that the shortlist of new features is being drawn up now for a version intended to be released around the time of GDC next year. The software is free to download, so why not try it out and see how it can help you optimise your code, and share your ideas for how it could be better?
Filed under: Game development Tagged: | develop 2010






[...] and deep zoom, among other features, could also be used to find performance issues. We’ve already blogged about Graphics Performance Analyzers but it’s worth reiterating that this free suite of tools can offer compelling [...]
[...] might remember that I blogged previously about Graphics Performance Analyzers (GPA), a free suite of tools that can be used to optimise the performance of graphics-intensive [...]
[...] Performance Analyzers 4.0. This is a relatively new release and some of you may recall an earlier blog on GPA or for that matter an even more recent posting following the release of GPA [...]