C# performance tuning, part 7. Summary.

May 25, 2014 Leave a comment

Previous parts of this series:
Part 1: Introduction
Part 2: Collisions explained
Part 3: Obvious optimizations
Part 4: Quad Tree
Part 5: Quad Tree improvements and small fixes
Part 6: Minimize loop repeats

I’ve went through whole optimization process. Getting better and better results at each step, finally reaching the limit when getting 362 times faster than first version of code. That’s a lot. Is that a limit? I doubt. We could do few things more at this point. Replacing property calls with private field should give a little bit more performance. Investigating algorithms more deeply may uncover some more optimizations.

Is that needed? At that point to me – no. I’ve been testing it under pretty stressful conditions – small area packed with 1000 triangles means a lot of collisions happening. Changing area size to 10 000 x 10 000 gives serial collision check time at 2ms level. Leave area size 1000 x 1000 and limit triangle number to 500 and time is as low as 2ms as well. And it’s still probably way more collisions going on per frame than in usual 2D game where we either don’t check for collisions against objects close to each other (player vs. floor) or objects are usually not colliding (player vs. wall or vs. player).

How does the data look?

Change Collisions serial Time serial Collisions parallel Time parallel
initial 8058 1814 8058 562
line segments initialized 8058 1061 8058 308
bounding boxes 8058 25 8058 7
quad tree basic 7054 30 7054 10
_allChildren added 7054 25 7054 7
two segments check 6685 19 6685 7
rectangle optimization 6685 19 6685 7
line precalculations 6685 18 6685 7
triangle contains linq replacement 6685 13 6685 4
QuadTree collison check 3163 5 3163 3

As you can see collisions detected over time decreased as I’ve made the code more precise at the same time improving performance.

What’s to learn from this exercise? First – performance often comes with the price. You can do some things faster, but you will need to pay the price in memory usage (when precalculatig some values), code readability (when replacing LINQ with code duplications, however one could argue if resulted code was harder to read in this particular case) or time take to rewrite the code (changing simple list and replacing it with more complex Quad Tree).

Also – but that’s pretty common knowledge – there is no point in doing performance improvements without profiler set up. You may get better results, you may not, but you will only be guessing without something that will measure your code’s performance.

And final thought – many times there are complains that C# and Java are slow. Well, I’m pretty sure those collisions written in C or C++ would be faster. But would it make that much of a difference? I don’t think so, at least not in my case. For some applications it can be critical to get code running as fast as possible, for others 2ms is fast enough. And tool you know will serve you better at that point than tool you are not that experienced with.

Categories: Development Tags: , ,

C# performance tuning, part 6. Minimize loop repeats.

May 25, 2014 1 comment

All right. Some bigger and some smaller fixes are in place. 16ms threshold beaten. But there must be more to be done. And there is. Huge thing. Remember when I wrote that there are two check for each pair – A vs. B and B vs. A? Quad Tree helps with that a little bit, but there is much more to be done! After all at given tree level we are still doubling checks. That’s waste of time and energy.

There isn’t much to be done from the outside of a tree. It gets us collection of objects to check against and that’s it. But why not moving collision detection code inside of tree? It’s already taking care of holding data structured correctly – it can also traverse the data effectively so let’s do that. But that won’t help with double checks, right? Well – it won’t if we leave the code as it is. But if we will make a use of our knowledge of tree’s data structure and what we know about collisions, we may fix that pretty easily.

        public int CheckCollisions()
        {
            int collisions = 0;

            for (int i = 0; i < _objects.Count; i++)
            {
                var triangle = _objects[i];
                IReadOnlyList<Triangle> possibleColliding = _objects;
                for (int j = i + 1; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;

                possibleColliding = GetChildObjects(triangle);
                for (int j = 0; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;
            }

            if (_nodes[0] != null)
            {
                collisions += _nodes[0].CheckCollisions();
                collisions += _nodes[1].CheckCollisions();
                collisions += _nodes[2].CheckCollisions();
                collisions += _nodes[3].CheckCollisions();
            }

            return collisions;
        }

        public int CheckCollisionsParallel()
        {
            int collisions = 0;

            for (int i = 0; i < _objects.Count; i++)
            {
                var triangle = _objects[i];
                IReadOnlyList<Triangle> possibleColliding = _objects;
                for (int j = i + 1; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;

                possibleColliding = GetChildObjects(triangle);
                for (int j = 0; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;
            }

            if (_nodes[0] != null)
            {
                if (_level == 0)
                {
                    Parallel.For(0, 4,
                        i =>
                        {
                            Interlocked.Add(ref collisions, _nodes[i].CheckCollisions());
                        });
                }
                else
                {
                    collisions += _nodes[0].CheckCollisions();
                    collisions += _nodes[1].CheckCollisions();
                    collisions += _nodes[2].CheckCollisions();
                    collisions += _nodes[3].CheckCollisions();
                }
            }

            return collisions;
        }

I admit – those methods seem to be pretty messy at first look, but let me just explain what’s going on. We are first checking collisions with objects at current level (_objects collection). But instead of going through all of them, we are going only through those after the one we are checking. After all if we’ve checked first object with all others, there is no point in checking second object with first again – we can start from third one right away. Then of course we are checking collisions with all child objects that are possibly colliding – so taking only those in right quadrant. After that it’s simple recursive call for all child nodes – already expanded instead of using for loop. For parallel version we are creating parallel loop only when on first level – cost of additional parallelization would probably outweight multicore usage when applied at all lower levels.

Does it seem clear now? Was it worth making the code more complex? Oh it was, it was. 5 and 3 milliseconds for getting all collisions – that’s what I was hoping for! That’s 362 times faster than first implementation!

Categories: Development Tags: , ,

C# performance tuning, part 5. Quad Tree improvements and small fixes

May 25, 2014 1 comment

After introduction Quad Tree implementation performance degraded. Less collision checks is happening but tree implementation cost outweights those benefits. Profiler clearly shows that QuadTree.GetObjects() method is slowest one. No wonder – it needs to get all child objects, which means allocating at least few arrays, merging them etc. But that can be fixed. Again – paying the price in memory should give us benefit of faster calculations.

What can be done is introduction of some kind of store of all child objects. After all, when I’m requesting all objects, I don’t care which node they come from or in what order. So let’s do second collection, alongside _objects one.

        private List<Triangle> _objects;
        private List<Triangle> _allChildren;
		
        ...

        private QuadTree(Rectangle area, uint level)
        {
            _level = level;
            _objects = new List<Triangle>(4);
            _allChildren = new List<Triangle>(200);
            _nodes = new QuadTree[4];
            _area = area;
        }

There can be quite a few child objects involved so initial collection size is set to 200. Adjustments need to be made to insert and retrieval methods as well.

        public void Insert(Triangle obj)
        {
            if (_nodes[0] != null)
            {
                int index = GetIndex(obj);
                if (index != -1)
                {
                    _allChildren.Add(obj);
                    _nodes[index].Insert(obj);
                    return;
                }
            }

            _objects.Add(obj);

            if (_objects.Count > 4 && _level < MaxLevel)
            {
                if (_nodes[0] == null)
                    Split();

                int i = 0;
                while (i < _objects.Count)
                {
                    int index = GetIndex(_objects[i]);
                    if (index != -1)
                    {
                        Triangle objToMove = _objects[i];
                        _objects.Remove(objToMove);
                        _allChildren.Add(objToMove);
                        _nodes[index].Insert(objToMove);
                    }
                    else
                        i++;
                }
            }
        }
		
		...
		
		private IReadOnlyList<Triangle> GetChildObjects(Triangle obj)
        {
            if (_nodes[0] == null)
                return new Triangle[0];

            int index = GetIndex(obj);
            if (index != -1)
            {
                return _nodes[index].GetObjects(obj);
            }
            else
            {
                return _allChildren;
            }
        }

        public IReadOnlyList<Triangle> GetObjects()
        {
            var all = new List<Triangle>(_objects);
            all.AddRange(_allChildren);            
            return all;
        }

Each time we add object to child nodes, we are also storing it in additional collection which will allow us to get those objects without need of traversing the tree. Price in memory can be big if we will have a lot of objects though, since we are storing additional references at each tree level until the final one, where the object belongs.

But performance improved nicely: 25 and 7 milliseconds for serial/parallel versions gets me back where I was before.

What does profiler says? GetObjects is now much lower on dotTrace’s HotSpot list. We’re back with slowdown being in Triangle.Contains and Triangle.SegmentsAreCrossing methods. So testing whether one of three sides crosses any other side and if any single point is inside other triangle.

First segments – we are checking all three segments of first triangle against all three segments of second triangle. But we don’t need to. We only need to check two segments. Why? Well, triangles are either intersecting at two sides minimum or, if they are intersecting at only one side – that means one point of triangle must be inside the other, which is covered by the second check. And second think – we are using LINQ’s Any queries. LINQ is awesome querying tool, but it probably introduces some performance penalties. And I know exactly how many sides I have, I know what data structure I’m using, so I can take advantage of that. Avoiding loops can be pretty effective performance optimization.

        private bool SegmentsAreCrossing(Triangle b)
        {
            return
                _lines[0].Intersects(b._lines[0]) ||
                _lines[0].Intersects(b._lines[1]) ||
                _lines[0].Intersects(b._lines[2]) ||
                _lines[1].Intersects(b._lines[0]) ||
                _lines[1].Intersects(b._lines[1]) ||
                _lines[1].Intersects(b._lines[2]);
        }

Just this change saves another 5ms from sequential version. It is still pretty readable and easy to maintain, however it could potentially be easy to make some error in indexing those lines.

Another profiler check and Contains is high on HotSpot list, but there is another method, which takes a lot of time due to lots of calls – Rectangle.Intersects. After all in each collision check we are checking bounding boxes first, resulting in 664 184 rectangle intersection checks. But the code there is simple, what can be done about it?

Well, it is simple, but there four additions in those checks. And what do they calculate? Right and top position of rectangle. That clearly does not change throughout rectangle’s lifetime, so we can pre-calculate this.

        _rightTop = new Point(LeftBottom.X + Width, LeftBottom.Y + Height);

        public bool Intersects(Rectangle b)
        {
            return
                _rightTop.X >= b.LeftBottom.X &&
                LeftBottom.X <= b._rightTop.X &&
                _rightTop.Y >= b.LeftBottom.Y &&
                LeftBottom.Y <= b._rightTop.Y;
        }

It even seams easier to read. Another 1ms saved. Not much, but that’s still 20% saving compared to what it was.

Now there’s no running from Triangle.Contains method. But that method is pretty simple. Of course we can also rewrite it to not use LINQ loop, but that probably won’t save us significant amount of time alone. Especially when profiler says that we are doing close to 200 000 calls to Sign method. And that’s where I’ll be looking next.

There is one thing we can optimize there – we are always calculating start and end x and y variable difference. That’s a constant value for line segment so we can reuse it. I’ll call it reversed width and height respectively, since it take start – end. While we’re at it – there are similar things to be optimized in Intersects method – a lot of repeated calculations. I’ll fix that too, since I’m already messing with the class.

        public Line(Point start, Point end)
        {
            Start = start;
            End = end;
            _width  = End.X - Start.X;
            _height = End.Y - Start.Y;
            _revWidth  = Start.X - End.X;
            _revHeight = Start.Y - End.Y;
        }

        public static Line Create(int x1, int y1, int x2, int y2)
        {
            return new Line(new Point(x1, y1), new Point(x2, y2));
        }

        public bool Intersects(Line b)
        {
            double denominator = b._height * _width - b._width * _height;
            int startXDiff = Start.X - b.Start.X;
            int startYDiff = Start.Y - b.Start.Y;
            double aNumerator = b._width * startYDiff - b._height * startXDiff;
            double bNumerator = _width * startYDiff - _height * startXDiff;

            if (denominator.IsZero())
                return aNumerator.IsZero() && bNumerator.IsZero();

            double aParameter = aNumerator / denominator;
            double bParameter = bNumerator / denominator;

            return aParameter > 0 && aParameter <= 1 &&
                   bParameter > 0 && bParameter <= 1;
        }

        public double Sign(Point point)
        {
            return _revHeight * (point.X - End.X) - _revWidth * (point.Y - End.Y);
        }
    }

And it gives another 1-2ms saved. Add to that replacing LINQ in Contains method and we are finally below 16ms! 13 ms for serial and 4ms for parallel versions are good – almost 140 times faster than initial, naive implementation! But is it good enough? Not yet. But more on that in next post.

Categories: Development Tags: , ,

C# performance tuning, part 4. Quad Tree

May 25, 2014 1 comment

When profiling application with sampling mode, dotTrace does not show call count unfortunately. Switching to Tracing mode changes that. It has bigger performance impact on code – profiling will take more time, but it shows some interesting details. For example – Triangle.Contains method, which is slowest part of the code at the moment is being called 11 903 344 times! It should be halved since I’m running collision counting twice (serial and parallel) but that’s still close to 6 million calls. And it’s just 1000 triangles. Why is that? Well, I’m checking collisions between all triangles in the area, no matter where they are. One is in left top corner, the other in right bottom. Clearly far enough to not collide, but still check is done. And Contains check happens so often because it happens each time when triangle sides are not colliding.

So that needs to be fixed as soon as possible. One thing the code is missing is bounding box definition. Bounding box is pretty standard technique in collision detection – you surround your object with virtual rectangle to be able to quickly check if two objects are possibly colliding. If they don’t collide at bounding box level you don’t need to check anything else.

Rectangle definition looks like this:

    public class Rectangle
    {
        public Point LeftBottom { get; private set; }

        public int Width { get; private set; }
        public int Height { get; private set; }

        public Rectangle(Point leftBottom, int width, int height)
        {
            LeftBottom = leftBottom;
            Width = width;
            Height = height;
        }

        public bool Intersects(Rectangle b)
        {
            return
                (LeftBottom.X + Width) >= b.LeftBottom.X &&
                LeftBottom.X <= (b.LeftBottom.X + b.Width) &&
                (LeftBottom.Y + Height) >= b.LeftBottom.Y &&
                LeftBottom.Y <= (b.LeftBottom.Y + b.Height);
        }
    }

Rectangle intersection check is pretty simple to understand, try following the logic and it should be pretty obvious how it works.

Now adding bounding box to Triangle and quick check if boxes are intersecting should give some performance kick.

        private Line[] _lines;
        private Rectangle _boundingBox;
                
        public Triangle(Point a, Point b, Point c)
        {
            A = a;
            B = b;
            C = c;

            _lines = new[]
            {
                new Line(A, B),
                new Line(B, C),
                new Line(C, A)
            };
            
            var xCoordinates = new List<int> { A.X, B.X, C.X };
            var yCoordinates = new List<int> { A.Y, B.Y, C.Y };
            var leftBottom = new Point(xCoordinates.Min(), yCoordinates.Min());
            int width = xCoordinates.Max() - leftBottom.X;
            int height = yCoordinates.Max() - leftBottom.Y;

            _boundingBox = new Rectangle(leftBottom, width, height);
        }
		
		...
		
		public bool Intersects(Triangle b)
        {
            if (!_boundingBox.Intersects(b._boundingBox))
                return false;

            return SegmentsAreCrossing(b) || OneIsInsideTheOther(b);
        }

Simple and straightforward. And results? 25 and 7 milliseconds for serial and parallel respectively! Wow, that’s huge! To be honest I probably should have started with something as obvious as this. Now there are just 108 520 calls to slow Contains method.

Parallel version went below 16ms target. Serial version is still quite well above this limit. Am I satisfied with that? Of course not. There must be a lot that can be done to squeeze every bit of performance there is. Collision detection is not all.

So we have bounding boxes, but we still check each object with every other. Some are obviously in different areas so we could skip them. But how do we know? There is pretty well know structure called Quad Tree. It is similar to binary tree but each node can have up to four child nodes, not two. How is that useful? You can split screen area in 4 equal sections and just check collisions between objects in the this one section plus with objects that are placed across the borders. Again – code is bellow. It is pretty simple to follow, so go through with it.

    public class QuadTree
    {
        private List<Triangle> _objects;
        private QuadTree[] _nodes;
        private Rectangle _area;
        private uint _level;
        private readonly uint MaxLevel = 4;

        public QuadTree(Rectangle area) : this(area, 0)
        { }

        private QuadTree(Rectangle area, uint level)
        {
            _level = level;
            _objects = new List<Triangle>(4);
            _nodes = new QuadTree[4];
            _area = area;
        }

        public void Insert(Triangle obj)
        {
            if (_nodes[0] != null)
            {
                int index = GetIndex(obj);
                if (index != -1)
                {
                    _nodes[index].Insert(obj);
                    return;
                }
            }

            _objects.Add(obj);

            if (_objects.Count > 4 && _level < MaxLevel)
            {
                if (_nodes[0] == null)
                    Split();

                int i = 0;
                while (i < _objects.Count)
                {
                    int index = GetIndex(_objects[i]);
                    if (index != -1)
                    {
                        Triangle objToMove = _objects[i];
                        _objects.Remove(objToMove);
                        _nodes[index].Insert(objToMove);
                    }
                    else
                        i++;
                }
            }
        }

        public IReadOnlyList<Triangle> GetObjects(Triangle obj)
        {
            var all = new List<Triangle>(_objects);
            all.AddRange(GetChildObjects(obj));
            all.Remove(obj);
            return all;
        }

        private IReadOnlyList<Triangle> GetChildObjects(Triangle obj)
        {
            if (_nodes[0] == null)
                return new Triangle[0];

            int index = GetIndex(obj);
            if (index != -1)
            {
                return _nodes[index].GetObjects(obj);
            }
            else
            {
                var all = new List<Triangle>();
                for (int i = 0; i < _nodes.Length; i++)
                    all.AddRange(_nodes[i].GetObjects());

                return all;
            }
        }

        public IReadOnlyList<Triangle> GetObjects()
        {
            var all = new List<Triangle>(_objects);
            if (_nodes[0] != null)
                for (int i = 0; i < _nodes.Length; i++)
                    all.AddRange(_nodes[i].GetObjects());
            
            return all;
        }

        private void Split()
        {
            int subWidth = (int)(_area.Width / 2);
            int subHeight = (int)(_area.Height / 2);
            int x = _area.LeftBottom.X;
            int y = _area.LeftBottom.Y;

            // 1 | 0
            // -----
            // 2 | 3
            _nodes[0] = new QuadTree(new Rectangle(new Point(x + subWidth, y + subHeight), subWidth, subHeight), _level + 1);
            _nodes[1] = new QuadTree(new Rectangle(new Point(x, y + subHeight), subWidth, subHeight), _level + 1);
            _nodes[2] = new QuadTree(new Rectangle(new Point(x, y), subWidth, subHeight), _level + 1);
            _nodes[3] = new QuadTree(new Rectangle(new Point(x + subWidth, y), subWidth, subHeight), _level + 1);
        }

        private int GetIndex(Triangle obj)
        {
            int index = -1;
            int vMid = _area.LeftBottom.X + (_area.Width / 2);
            int hMid = _area.LeftBottom.Y + (_area.Height / 2);

            Rectangle box = obj.BoundingBox;

            bool topQuadrant = box.LeftBottom.Y > hMid;
            bool bottomQuadrant = box.LeftBottom.Y < hMid && box.LeftBottom.Y + box.Height < hMid;

            if (box.LeftBottom.X < vMid && box.LeftBottom.X + box.Width < vMid)
            {
                if (topQuadrant)
                    index = 1;
                else if (bottomQuadrant)
                    index = 2;
            }
            else if (box.LeftBottom.X > vMid)
            {
                if (topQuadrant)
                    index = 0;
                else if (bottomQuadrant)
                    index = 3;
            }

            return index;
        }
    }

It’s not the shortest code, but it is not that complex as well. And how do you use it? First – populate it with objects:

        static void Main(string[] args)
        {
            const int screenWidth = 1000;
            const int screenHeight = 1000;
            var rand = new Random(0);
            var triangles = new QuadTree(new Rectangle(new Point(0, 0), screenWidth, screenHeight));

            for (int i = 0; i < 1000; i++)
            {
                var a = new Point(rand.Next(0, screenWidth), rand.Next(0, screenHeight));
                var b = new Point(a.X + rand.Next(0, 100), a.Y + rand.Next(0, 100));
                var c = new Point(a.X + rand.Next(0, 100), a.Y + rand.Next(0, 100));
                triangles.Insert(new Triangle(a, b, c));
            }

            Collisions(triangles);
            CollisionsParallel(triangles);

            Console.ReadKey();
        }

And then slightly modified collision checkings:

       private static void Collisions(QuadTree triangles)
        {
            Stopwatch watch;
            int collisions;
            watch = new Stopwatch();

            watch.Start();
            collisions = 0;
            var trianglesList = triangles.GetObjects();
            for (int i = 0; i < trianglesList.Count; i++)
            {
                Triangle triangle = trianglesList[i];
                var pottentialColliding = triangles.GetObjects(triangle);
                collisions += CheckCollisions(pottentialColliding, triangle);
            }
            watch.Stop();
            Console.WriteLine("Collisions: {0}\t time: {1}", collisions, watch.ElapsedMilliseconds);
        }

        private static int CollisionsParallel(QuadTree triangles)
        {
            Stopwatch watch;
            int collisions;
            watch = new Stopwatch();

            watch.Start();
            collisions = 0;
            var trianglesList = triangles.GetObjects();
            Parallel.For(0, trianglesList.Count,
                i =>
                {
                    Triangle triangle = trianglesList[i];
                    var pottentiatlColliding = triangles.GetObjects(triangle);
                    int coll = CheckCollisions(pottentiatlColliding, triangle);
                    Interlocked.Add(ref collisions, coll);
                });
            watch.Stop();
            Console.WriteLine("Collisions: {0}\t time: {1}", collisions, watch.ElapsedMilliseconds);
            return collisions;
        }

        private static int CheckCollisions(IReadOnlyList<Triangle> triangles, Triangle triangle)
        {
            int collisions = 0;
            for (int j = 0; j < triangles.Count; j++)
                if (triangle.Intersects(triangles[j]))
                    ++collisions;

            return collisions;
        }
    }

So how did it go? Surprisingly not to well. 30 and 10 milliseconds is significant slowdown compared to previous version! And collisions detected fell down to 7054? What’s with that? Did I break the code?

First let’s handle performance downgrade. If you think about it a little bit and use profiler, it’s not all that unexpected. We are doing less collision checks – true. But we are paying the price of Quad Tree getting us the objects, calculating object position, traversing tree, allocating new lists for objects. That’s a lot of work. But I feel that Quad Tree is still a good choice. It just needs some performance tuning on its own and it may shine. And if not – I will admit defeat and go back to simple list.

But how can I even compare broken code’s performance? I’m close to 1000 collisions detected short! Well, truth be told – I’m not. You see – checking all objects against all others have one serious problem – you are doing double checks and double counting of collisions.
Imagine list of objects {A, B, C}. I’m checking A+B and A+C collisions first, then B+A, B+C and at the end C+A and C+B. See the problem? If A is colliding with B, then clearly B must be colliding with A and I’m still doing the check.

Quad Tree partially saves me from that. Objects at root level will check collisions with all sub-level objects, but objects at lower levels will not go back up to check those collisions again, so less checks are done and less collisions are counted. But that gets me closer to actual number of collisions.

Next I will look how Quad Tree can be optimized so that we will be doing step forward instead of backward.

Categories: Development Tags: , ,

C# performance tuning, part 3. Obvious optimizations

May 24, 2014 1 comment

There is one thing that makes performance optimization much more pleasant experience – profiler. Without one you will be guessing where’s your bottleneck. And there is good chance you will be wrong. So start up something nice, I’ve decided to use free trial version of dotTrace Performance by JetBrains available here: http://www.jetbrains.com/profiler/.

I didn’t try anything fancy yet – just let profiler start application by itself and check its performance. I’ve picked simple Sampling option as it has little overhead and should still give good idea about general bottlenecks in application. And it sure did.

Looking at plain list it is obvious that Triangle.GetSegments() method is slowest part of my code. I’m looking at Own time column to see how much time I’ve spent in this method alone, not counting children method calls. Well – of course it is slow. I’m returning new instance of list at least few times for each collision, see usage in SegmentsAreCrossing and Contains methods. That’s pretty lame. Triangle won’t change, so segments won’t change. Why keep recreating them when we can get them done once and reuse throughout Triangle’s life? Let’s do that.

I’ve added new private field called _lines and filled it in constructor of triangle and then adjusted methods to use this field instead of GetSegments method call.

        private Line[] _lines;
                
        public Triangle(Point a, Point b, Point c)
        {
            A = a;
            B = b;
            C = c;

            _lines = new[]
            {
                new Line(A, B),
                new Line(B, C),
                new Line(C, A)
            };
        }
		
		...
		
		private bool SegmentsAreCrossing(Triangle b)
        {
            return _lines.Any(sa => b._lines.Any(sb => sa.Intersects(sb)));
        }

What might be surprising to some (at least was surprising to me once) is how we can access private field from another object. Take closer look at SegmentsAreCrossing method. Inside lambda expression we are just calling b._lines but clearly b is not part of this class, it is parameter passed to the method! How can this work? It’s simple, really. Private fields are private not for object instance but for whole class. So as long as you are working with the same class (and that’s the case since b is Triangle’s class instance) you can access its private fields easily. Convenient and helps me here – less method calls must be better, right? How much better?

Code still found 8058 collisions but now it took just 1061ms for serial version and mere 308ms for parallel code. Wow, simple, obvious change and its already 753 and 254 milliseconds faster respectively. Of course nothing is free. What cost am I paying here? Memory usage. Before that change, lists were only created when needed. Now list will live as long as long Triangle class instance will be held in memory. Is that bad? It depends, but in this case not really. We often need those lines, performance benefit is much greater than memory price. And we are lowering Garbage Collection pressure since we are avoiding many short term allocations that could quickly fill up GC’s Generation 0 object space.

Easy win that was! Maybe there is something more we can get that easily? Starting profiling session again shows that the code that actually does something is now the biggest problem. Methods like Triangle.Contains and Line.Intersects are using most of available computing power. And those are calculations, I don’t think I want to touch them now.

So what we will look at next? At something you should always start with, when trying to optimize some code. Are you using right algorithm to solve the problem? Are you using the best available data structures? I am pretty sure I’m not, so we will see next where it will take me.

Categories: Development Tags: , ,

C# performance tuning, part 2. Collisions explained

May 24, 2014 1 comment

So how do we know if our triangle intersects with another one? It is pretty obvious – either their sides intersects or one triangle is inside the other. No other way around it. But how are we checking this? Let’s start from checking if two lines are intersecting.

We are trying to get point in which both lines will intersect. I’ll be honest here – I’ve just found this algorithm somewhere on the internet (to be exact – here: http://thirdpartyninjas.com/blog/2008/10/07/line-segment-intersection/), as I didn’t want to bother to much with maths here and get to code. Tests I wrote proved to me that it works. To get intersection point, we are calculating two nominators and one, common to both fractions, denominator. Logic is as follows: if denominator is zero, lines are parallel. If both numerators are also zero – lines are effectively the same, one is on top of the other if you will. In other cases, we calculate the fraction to get intersection point parameter value. If both those parameters are between 0 and 1 – we have intersections. If they are outside of this range – lines are colliding, but not within our segments of line, which is what we are interested in.

Now we can move onto triangles. As I said before – we need to have either triangle sides colliding, or one triangle needs to be inside of the other. Since we have line collision detection in place, checking sides is as simple as checking each side against sides of the other triangle. That gives us 9 checks at most. If we fail to detect collision here, we try to see if any point from triangle A is inside triangle B and vice versa.

To see if point is inside a triangle, we use power of internet again – and StackOverflow is always helpful. https://stackoverflow.com/questions/2049582/how-to-determine-a-point-in-a-triangle is great answer that gives us exactly what we need – and again, tests proves that this solution is correct. What do we do? Three points in triangle, checked against three sides of the other one (A inside B) and again (B inside A detection) will cost us 18 checks in worst case.

And how do we detect collisions between all those objects? Simply enough – we iterate over all objects in collection and check collision against every other object there. Or we do that in parallel to get some performance boost.

How do initial values look like? Well, not too good unfortunately. We got 8058 collisions detected but it took 1814ms for serial code and 562ms for parallel version. So in best case we can get not even two full checks in a second. That’s slow!

And final words before we get into optimization. Those values I give you are from Release compiled version. Debug if few times slower and is no good for us. Also – running it from Visual Studio will give you slower times, probably due to VS trying to debug code. So I run if from bin/Release folder directly.

And test machine: my few years old PC with Intel i5 2500k on board @3.3.GHz on each of its 4 cores. 8GB of RAM.

Categories: Development Tags: , ,

C# performance tuning, part 1

May 24, 2014 1 comment

I was wondering how fast can I make C# code. Well, I knew it can be pretty fast, probably not as fast as C or C++ since it is managed language, JIT compiled etc., but still – how much can I squeeze from it?

So why not have some fun and code something simple, yet performance sensitive at the same time? Like collision detection engine? Those definitely need to be fast if you want to have any decent frame rate in your games. To get 60fps you have 16ms to prepare each frame. And that’s not just collision detection, it is also drawing graphics, updating objects and so on. So each small part must be as fast as possible.

I’m limiting myself to triangle collision detection. All game shapes can be approximated by number of triangles. More you have – better shape representation you will have, but more complex collision detection you may end up with.

So here we are, few classes down the road, just the basics. Point, Line, Triangle. Minimal set of methods to detect collision (plus tests of course, but those I will skip here). Check the code below.

Point.cs

    public struct Point
    {
        public int X { get; private set; }
        public int Y { get; private set; }

        public Point(int x, int y) : this()
        {
            X = x;
            Y = y;
        }
    }

Line.cs

    public class Line
    {
        public Point Start { get; private set; }
        public Point End   { get; private set; }

        public Line(Point start, Point end)
        {
            Start = start;
            End = end;
        }

        public static Line Create(int x1, int y1, int x2, int y2)
        {
            return new Line(new Point(x1, y1), new Point(x2, y2));
        }

        public bool Intersects(Line b)
        {
            double denominator = (b.End.Y - b.Start.Y) * (End.X - Start.X) - (b.End.X - b.Start.X) * (End.Y - Start.Y);
            double aNumerator = (b.End.X - b.Start.X) * (Start.Y - b.Start.Y) - (b.End.Y - b.Start.Y) * (Start.X - b.Start.X);
            double bNumerator = (End.X - Start.X) * (Start.Y - b.Start.Y) - (End.Y - Start.Y) * (Start.X - b.Start.X);

            if (denominator.IsZero())
                return aNumerator.IsZero() && bNumerator.IsZero();

            double aParameter = aNumerator / denominator;
            double bParameter = bNumerator / denominator;

            return aParameter > 0 && aParameter <= 1 &&
                   bParameter > 0 && bParameter <= 1;
        }

        public double Sign(Point point)
        {
            return (Start.Y - End.Y) * (point.X - End.X) - (Start.X - End.X) * (point.Y - End.Y);
        }
    }

    public static class DoubleExtension
    {
        internal static bool IsZero(this double denominator)
        {
            const double epsilon = 0.000001;
            return Math.Abs(denominator) < epsilon;
        }
    }

Triangle.cs

    public class Triangle
    {
        public Point A { get; private set; }
        public Point B { get; private set; }
        public Point C { get; private set; }
                
        public Triangle(Point a, Point b, Point c)
        {
            A = a;
            B = b;
            C = c;
        }

        public bool Intersects(Triangle b)
        {
            return SegmentsAreCrossing(b) || OneIsInsideTheOther(b);
        }

        private bool SegmentsAreCrossing(Triangle b)
        {
            return GetSegments().Any(sa => b.GetSegments().Any(sb => sa.Intersects(sb)));
        }

        private bool OneIsInsideTheOther(Triangle b)
        {
            bool oneInsideOther =
                Contains(b.A) || Contains(b.B) || Contains(b.C) ||
                b.Contains(A) || b.Contains(B) || b.Contains(C);
            return oneInsideOther;
        }

        private bool Contains(Point point)
        {
            return GetSegments().All(s => s.Sign(point) < 0);
        }

        private List<Line> GetSegments()
        {
            return new List<Line>
            {
                new Line(A, B),
                new Line(B, C),
                new Line(C, A)
            };
        }
    }

And simple program to trigger those collision detections. I’ve decided to work with 1000 x 1000 area and put there 1000 triangles. It gives tightly packed set of triangles, lots of collisions going on there for sure. There is also parallel implementation of this collision detection, just to see how much we can squeeze out of our multi-core processors. As you can see I’m randomizing triangles, trying not to make them to big, but starting with the same seed always – 0 that is. This way each run will generate exactly the same set of triangles so our micro benchmarks will be consistent.

Program.cs

    class Program
    {
        static void Main(string[] args)
        {
            const int screenWidth = 1000;
            const int screenHeight = 1000;
            var rand = new Random(0);
            var triangles = new List<Triangle>();

            for (int i = 0; i < 1000; i++)
            {
                var a = new Point(rand.Next(0, screenWidth), rand.Next(0, screenHeight));
                var b = new Point(a.X + rand.Next(0, 100), a.Y + rand.Next(0, 100));
                var c = new Point(a.X + rand.Next(0, 100), a.Y + rand.Next(0, 100));
                triangles.Add(new Triangle(a, b, c));
            }

            Collisions(triangles);
            CollisionsParallel(triangles);

            Console.ReadKey();
        }

        private static void Collisions(List<Triangle> triangles)
        {
            Stopwatch watch;
            int collisions;
            watch = new Stopwatch();

            watch.Start();
            collisions = 0;
            for (int i = 0; i < triangles.Count; i++)
            {
                Triangle triangle = triangles[i];
                collisions += CheckCollisions(triangles, triangle);
            }
            watch.Stop();
            Console.WriteLine("Collisions: {0}\t time: {1}", collisions, watch.ElapsedMilliseconds);
        }

        private static int CollisionsParallel(List<Triangle> triangles)
        {
            Stopwatch watch;
            int collisions;
            watch = new Stopwatch();

            watch.Start();
            collisions = 0;
            Parallel.For(0, triangles.Count,
                i =>
                {
                    Triangle triangle = triangles[i];
                    int coll = CheckCollisions(triangles, triangle);
                    Interlocked.Add(ref collisions, coll);
                });
            watch.Stop();
            Console.WriteLine("Collisions: {0}\t time: {1}", collisions, watch.ElapsedMilliseconds);
            return collisions;
        }

        private static int CheckCollisions(List<Triangle> triangles, Triangle triangle)
        {
            int collisions = 0;
            for (int j = 0; j < triangles.Count; j++)
                if (triangle.Intersects(triangles[j]))
                    ++collisions;

            return collisions;
        }
    }
	

In next post I will cover how collisions are detected in the first place before we will get to any optimization.

Categories: Development Tags: , ,
Follow

Get every new post delivered to your Inbox.