C# performance tuning, part 6. Minimize loop repeats.

All right. Some bigger and some smaller fixes are in place. 16ms threshold beaten. But there must be more to be done. And there is. Huge thing. Remember when I wrote that there are two check for each pair – A vs. B and B vs. A? Quad Tree helps with that a little bit, but there is much more to be done! After all at given tree level we are still doubling checks. That’s waste of time and energy.

There isn’t much to be done from the outside of a tree. It gets us collection of objects to check against and that’s it. But why not moving collision detection code inside of tree? It’s already taking care of holding data structured correctly – it can also traverse the data effectively so let’s do that. But that won’t help with double checks, right? Well – it won’t if we leave the code as it is. But if we will make a use of our knowledge of tree’s data structure and what we know about collisions, we may fix that pretty easily.

        public int CheckCollisions()
        {
            int collisions = 0;

            for (int i = 0; i < _objects.Count; i++)
            {
                var triangle = _objects[i];
                IReadOnlyList<Triangle> possibleColliding = _objects;
                for (int j = i + 1; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;

                possibleColliding = GetChildObjects(triangle);
                for (int j = 0; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;
            }

            if (_nodes[0] != null)
            {
                collisions += _nodes[0].CheckCollisions();
                collisions += _nodes[1].CheckCollisions();
                collisions += _nodes[2].CheckCollisions();
                collisions += _nodes[3].CheckCollisions();
            }

            return collisions;
        }

        public int CheckCollisionsParallel()
        {
            int collisions = 0;

            for (int i = 0; i < _objects.Count; i++)
            {
                var triangle = _objects[i];
                IReadOnlyList<Triangle> possibleColliding = _objects;
                for (int j = i + 1; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;

                possibleColliding = GetChildObjects(triangle);
                for (int j = 0; j < possibleColliding.Count; j++)
                    if (triangle.Intersects(possibleColliding[j]))
                        ++collisions;
            }

            if (_nodes[0] != null)
            {
                if (_level == 0)
                {
                    Parallel.For(0, 4,
                        i =>
                        {
                            Interlocked.Add(ref collisions, _nodes[i].CheckCollisions());
                        });
                }
                else
                {
                    collisions += _nodes[0].CheckCollisions();
                    collisions += _nodes[1].CheckCollisions();
                    collisions += _nodes[2].CheckCollisions();
                    collisions += _nodes[3].CheckCollisions();
                }
            }

            return collisions;
        }

I admit – those methods seem to be pretty messy at first look, but let me just explain what’s going on. We are first checking collisions with objects at current level (_objects collection). But instead of going through all of them, we are going only through those after the one we are checking. After all if we’ve checked first object with all others, there is no point in checking second object with first again – we can start from third one right away. Then of course we are checking collisions with all child objects that are possibly colliding – so taking only those in right quadrant. After that it’s simple recursive call for all child nodes – already expanded instead of using for loop. For parallel version we are creating parallel loop only when on first level – cost of additional parallelization would probably outweight multicore usage when applied at all lower levels.

Does it seem clear now? Was it worth making the code more complex? Oh it was, it was. 5 and 3 milliseconds for getting all collisions – that’s what I was hoping for! That’s 362 times faster than first implementation!

Advertisements

One thought on “C# performance tuning, part 6. Minimize loop repeats.

  1. Pingback: C# performance tuning, part 7. Summary. | Why u no code?!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s