Hello,as the team is getting slowly bigger and we still don't have any dedicated project manager, we had to start looking for tools to help us manage the team. We are testing software that allows our team members to track time spent on individual tasks, so right now my timer on "Friday facts related work" is running. I hope it to give me better insight into what kind of tasks our time goes to, where are we losing most of it, or what were the people doing when I was not here. People tend to not like these kind of changes, but we just have to admit that we are not the 4 people punk development team working from our living room and we need to invest more time into working efficiently.
Prefetching (Technical)
Kovarex already presented a concise summary of the prefetching patch, here is some more background and dirty technical details.
I started to look into Factorio performance improvements a while back, more specifically UPS (updates per second) improvements for large bases. It is widely recognized that the UPS are mostly limited by memory performance (more). That is normal - even highly optimized scientific simulation codes are rarely limited by arithmetic instructions.
At first, I looked into ways to reduce the size of Entities. Common entity sizes like Inserter (536 bytes) or AssemblingMachine (648 bytes) seem surprisingly large at first. I tried some changes, e.g. moving less frequently accessed data out of the actual entity in a separate object in memory. These changes had significant impact to the code in many files, but just saving a few bytes didn't make a measurable impact to performance.
Back to a bit of theory - there are two different ways in which memory can become a bottleneck: bandwidth (the amount of data supplied over time, e.g. 50 GB/s) and latency (the time until a requested piece of data is available, e.g. 50 ns). Comparing the results for different RAM timing settings (CAS latency) shows, that latency has a significant impact. It is important to note, that Factorio is not a homogeneous workload - some parts are still limited by memory bandwidth, others by CPU.
Modern CPUs are extremely good at mitigating memory bottlenecks by using caches, speculative execution and prefetchers. However, all active entities are read at every tick of the game. In large factories, this is too much data for caches. Also a virtual function call - such as the update of an entity - cannot be executed speculatively. Prefetchers are a part of the CPU that predicts what memory is going to be accessed soon and transfers it even before it is explicitly loaded. But since the entity update loop iterates over a linked list - the address of the next entity is stored within the entity itself - it is difficult to predict (not impossible).
This is where software prefetching comes in - the programmer gives a hint to the CPU what memory is accessed soon. That is what we now do in Factorio: Before an entity is updated, the next entity is already requested so that it can be loaded in the background. The principle also applies to a few other loops over linked lists. The nice thing about this, is that it is an extremely simple and isolated change in the code. The downside is, that you are entering the realm of architecture-specific micro-optimization. If you aren't careful, it can even be bad for performance.
A good rule is to never guess about performance - always verify. So I did some tests with different maps and the results were promising. Entities are larger than a single cache line and the pointers point into the middle of the object due to multiple inheritance. Many experiments later, the optimal range showed to be -128 byte to +384 byte (8 cache lines). This coincides with the sizes of typical entities. The prefetching instruction has another parameter determining the cache level used - which again was determined experimentally.
To get a bit more diversity, the measurements for this chart were done on a different CPU (i7-6700K vs i7-4790K previously), and include some more maps. It showed that the new belt-heavy map got less speedup (+5%) from software prefetching than the others. As a remedy, this map gets a huge boost from the belt optimization before. Other saves got a nice 9-13% speedup. All measurements are averages update times over 3600 ticks, the boxplots show 20 repeated runs.
Overall software prefetching is a nice effective micro-optimization with very little code changes, but many measurements to find the right configuration and verify.
Crafting machine animation optimisation
The issue is, that crafting machines can have arbitrary count of secondary animations tied to it (rotating fan, liquid in the chemical plants etc.). As each of the animations can have different speed and frame count, we kept positions of all of these animations in dynamically allocated vector and just updated each of these independently whenever the crafting machine was producing. But now, we just have one number representing the overall offset of the animations. We move it depending on the speed of the crafting machine and all the animations calculate their cyclic position depending on the modulo of this value only when we need to actually draw the machine.
This means, that this complicated code:
void CraftingMachine::setupWorkingVisualisationFrames(double performance) { const CraftingMachinePrototype& prototype = *this->getPrototype(); this->frame.move(performance, prototype.animation.getAnimation(this->direction)); if (this->workingVisualisationFrames.empty()) { this->workingVisualisationFrames.resize(prototype.workingVisualisations.size()); for (size_t i = 0; i < this->workingVisualisationFrames.size(); ++i) this->workingVisualisationFrames[i].randomize(prototype.workingVisualisations[i].getAnimation(this->direction), this->getMap().getRandomGenerator()); } for (size_t i = 0; i < this->workingVisualisationFrames.size(); ++i) this->workingVisualisationFrames[i].move(prototype.workingVisualisations[i].getAnimation(this->direction));
The memory size of crafting machine is decreased and the overall performance of game is improved by additional 2%. Another day, another optimisation :)
HR Lab
The weekly dose of update high resolution graphics:
Related to HR entities, It turned out that our zooming system never showed an exact zoom of 2.0, which would be the 'pixel perfect' zoom level for the HR entities. By changing the zoom rate from 1.1, to the 7th root of 2 (1.104089...), the zoom now increments perfectly from 1.0 to 2.0 in 7 steps.
As always, let us know any thoughts or feedback over on our forum.
Hello,as the team is getting slowly bigger and we still don't have any dedicated project manager, we had to start looking for tools to help us manage the team. We are testing software that allows our team members to track time spent on individual tasks, so right now my timer on "Friday facts related work" is running. I hope it to give me better insight into what kind of tasks our time goes to, where are we losing most of it, or what were the people doing when I was not here. People tend to not like these kind of changes, but we just have to admit that we are not the 4 people punk development team working from our living room and we need to invest more time into working efficiently.
Prefetching (Technical)
Kovarex already presented a concise summary of the prefetching patch, here is some more background and dirty technical details.
I started to look into Factorio performance improvements a while back, more specifically UPS (updates per second) improvements for large bases. It is widely recognized that the UPS are mostly limited by memory performance (more). That is normal - even highly optimized scientific simulation codes are rarely limited by arithmetic instructions.
At first, I looked into ways to reduce the size of Entities. Common entity sizes like Inserter (536 bytes) or AssemblingMachine (648 bytes) seem surprisingly large at first. I tried some changes, e.g. moving less frequently accessed data out of the actual entity in a separate object in memory. These changes had significant impact to the code in many files, but just saving a few bytes didn't make a measurable impact to performance.
Back to a bit of theory - there are two different ways in which memory can become a bottleneck: bandwidth (the amount of data supplied over time, e.g. 50 GB/s) and latency (the time until a requested piece of data is available, e.g. 50 ns). Comparing the results for different RAM timing settings (CAS latency) shows, that latency has a significant impact. It is important to note, that Factorio is not a homogeneous workload - some parts are still limited by memory bandwidth, others by CPU.
Modern CPUs are extremely good at mitigating memory bottlenecks by using caches, speculative execution and prefetchers. However, all active entities are read at every tick of the game. In large factories, this is too much data for caches. Also a virtual function call - such as the update of an entity - cannot be executed speculatively. Prefetchers are a part of the CPU that predicts what memory is going to be accessed soon and transfers it even before it is explicitly loaded. But since the entity update loop iterates over a linked list - the address of the next entity is stored within the entity itself - it is difficult to predict (not impossible).
This is where software prefetching comes in - the programmer gives a hint to the CPU what memory is accessed soon. That is what we now do in Factorio: Before an entity is updated, the next entity is already requested so that it can be loaded in the background. The principle also applies to a few other loops over linked lists. The nice thing about this, is that it is an extremely simple and isolated change in the code. The downside is, that you are entering the realm of architecture-specific micro-optimization. If you aren't careful, it can even be bad for performance.
A good rule is to never guess about performance - always verify. So I did some tests with different maps and the results were promising. Entities are larger than a single cache line and the pointers point into the middle of the object due to multiple inheritance. Many experiments later, the optimal range showed to be -128 byte to +384 byte (8 cache lines). This coincides with the sizes of typical entities. The prefetching instruction has another parameter determining the cache level used - which again was determined experimentally.
To get a bit more diversity, the measurements for this chart were done on a different CPU (i7-6700K vs i7-4790K previously), and include some more maps. It showed that the new belt-heavy map got less speedup (+5%) from software prefetching than the others. As a remedy, this map gets a huge boost from the belt optimization before. Other saves got a nice 9-13% speedup. All measurements are averages update times over 3600 ticks, the boxplots show 20 repeated runs.
Overall software prefetching is a nice effective micro-optimization with very little code changes, but many measurements to find the right configuration and verify.
Crafting machine animation optimisation
The issue is, that crafting machines can have arbitrary count of secondary animations tied to it (rotating fan, liquid in the chemical plants etc.). As each of the animations can have different speed and frame count, we kept positions of all of these animations in dynamically allocated vector and just updated each of these independently whenever the crafting machine was producing. But now, we just have one number representing the overall offset of the animations. We move it depending on the speed of the crafting machine and all the animations calculate their cyclic position depending on the modulo of this value only when we need to actually draw the machine.
This means, that this complicated code:
void CraftingMachine::setupWorkingVisualisationFrames(double performance) { const CraftingMachinePrototype& prototype = *this->getPrototype(); this->frame.move(performance, prototype.animation.getAnimation(this->direction)); if (this->workingVisualisationFrames.empty()) { this->workingVisualisationFrames.resize(prototype.workingVisualisations.size()); for (size_t i = 0; i < this->workingVisualisationFrames.size(); ++i) this->workingVisualisationFrames[i].randomize(prototype.workingVisualisations[i].getAnimation(this->direction), this->getMap().getRandomGenerator()); } for (size_t i = 0; i < this->workingVisualisationFrames.size(); ++i) this->workingVisualisationFrames[i].move(prototype.workingVisualisations[i].getAnimation(this->direction));
The memory size of crafting machine is decreased and the overall performance of game is improved by additional 2%. Another day, another optimisation :)
HR Lab
The weekly dose of update high resolution graphics:
Related to HR entities, It turned out that our zooming system never showed an exact zoom of 2.0, which would be the 'pixel perfect' zoom level for the HR entities. By changing the zoom rate from 1.1, to the 7th root of 2 (1.104089...), the zoom now increments perfectly from 1.0 to 2.0 in 7 steps.
As always, let us know any thoughts or feedback over on our forum.
I finished the item stack optimisations mentioned in FFF-198, and was able to do some performance tests. First I tested how many stacks on a big map actually need to use an externally allocated object (Item), and how many of them are plain. On the huge map I tested, it turned out that only 36K out of 1M stacks need the Item object. These were mainly science packs, as they need it for the progress of how used-up they are (and now when I think about it, it could also be omitted by only using the objects for science packs that are partially used up already). Overall factory performance was increased approximately 2% by this. It is nothing huge, but every bit matters.
One of the programmer that has read access to the code (Zulan), came up with a pull request that improves performance in Factorio by prefetching memory in the update loops ahead.
The problem when normally updating objects is, that CPU asks for memory representing the object. The memory is slow, at least compared to the CPU cache or the CPU speed. The memory transfer speed itself is not that slow, but the waiting (latency) time between ordering and receiving it is. This means, that what very often happens is, that CPU orders data of next entity from the memory, then it waits for quite a long time to get it, and then it does its logic. The memory prefetching partially solves it by doing this:
Order data of the next entity from memory (prefetch)
Do the logic of the current entity in the meantime
Go back to start
The overall measured performance improvements vary between 6-10%, which is certainly a nice addition.
Logistic buffer chest
As flexible and powerful as it is, we have always felt there was one key missing to the puzzle. The main issue is that requester chests cannot provide their items to any other member of the logistics system. Trying to workaround this by putting an inserter to a passive provider, just leads to the robots moving the items in a loop. This is also a nuisance trying to supply construction robots with materials, as they can only collect them from storage or provider chests, and they are only typically located in the main base areas.
It is easy to design a system to resupply far out areas using trains to directly put items in provider chests, but if they are in the same logistic network we encounter the same loop as before. We were also concerned of people segregating their logistic networks for more control, it seems to us it was a workaround to a problem we should fix. The solution is the buffer chest, which functions as both a requester and passive provider chest.
You can see the buffer can act as an 'in-between' for storage/provider chests and the requesters. This leads to a solution of the main annoyances we identified.
Typically when you set up all the provider chests, they are spread out across your whole factory. When you return to base for a resupply, you end up waiting for a long time while the bots travel from all over the factory with items. By using a buffer chest, you can setup a dedicated 'supply area', where the buffer chest will already contain all the typical items, and the bots can quickly top-up your inventory.
Another problem is when you have a large perimeter defence, and you want it to be maintained by the construction robots. When the main base is so far away, it can take a long time for robots to arrive with repair packs, so the biters might be able to break through. Using the buffer chest, it will be easy to setup nearby supplies to quickly repair the walls when needed.
Last, but not least use-case is when you want to dedicate part of your storage for specific things. The reason can be either just OCD, or the fact, that you can make sure that too much coal for example won't make it impossible to store enough of iron ore in your storage system.
High resolution robots
Here we present you the regular dose of new entities updated for high resolution for the hopefully fully high-res friendly 0.16 release.
As always, leave us any feedback or comments on our forum
I finished the item stack optimisations mentioned in FFF-198, and was able to do some performance tests. First I tested how many stacks on a big map actually need to use an externally allocated object (Item), and how many of them are plain. On the huge map I tested, it turned out that only 36K out of 1M stacks need the Item object. These were mainly science packs, as they need it for the progress of how used-up they are (and now when I think about it, it could also be omitted by only using the objects for science packs that are partially used up already). Overall factory performance was increased approximately 2% by this. It is nothing huge, but every bit matters.
One of the programmer that has read access to the code (Zulan), came up with a pull request that improves performance in Factorio by prefetching memory in the update loops ahead.
The problem when normally updating objects is, that CPU asks for memory representing the object. The memory is slow, at least compared to the CPU cache or the CPU speed. The memory transfer speed itself is not that slow, but the waiting (latency) time between ordering and receiving it is. This means, that what very often happens is, that CPU orders data of next entity from the memory, then it waits for quite a long time to get it, and then it does its logic. The memory prefetching partially solves it by doing this:
Order data of the next entity from memory (prefetch)
Do the logic of the current entity in the meantime
Go back to start
The overall measured performance improvements vary between 6-10%, which is certainly a nice addition.
Logistic buffer chest
As flexible and powerful as it is, we have always felt there was one key missing to the puzzle. The main issue is that requester chests cannot provide their items to any other member of the logistics system. Trying to workaround this by putting an inserter to a passive provider, just leads to the robots moving the items in a loop. This is also a nuisance trying to supply construction robots with materials, as they can only collect them from storage or provider chests, and they are only typically located in the main base areas.
It is easy to design a system to resupply far out areas using trains to directly put items in provider chests, but if they are in the same logistic network we encounter the same loop as before. We were also concerned of people segregating their logistic networks for more control, it seems to us it was a workaround to a problem we should fix. The solution is the buffer chest, which functions as both a requester and passive provider chest.
You can see the buffer can act as an 'in-between' for storage/provider chests and the requesters. This leads to a solution of the main annoyances we identified.
Typically when you set up all the provider chests, they are spread out across your whole factory. When you return to base for a resupply, you end up waiting for a long time while the bots travel from all over the factory with items. By using a buffer chest, you can setup a dedicated 'supply area', where the buffer chest will already contain all the typical items, and the bots can quickly top-up your inventory.
Another problem is when you have a large perimeter defence, and you want it to be maintained by the construction robots. When the main base is so far away, it can take a long time for robots to arrive with repair packs, so the biters might be able to break through. Using the buffer chest, it will be easy to setup nearby supplies to quickly repair the walls when needed.
Last, but not least use-case is when you want to dedicate part of your storage for specific things. The reason can be either just OCD, or the fact, that you can make sure that too much coal for example won't make it impossible to store enough of iron ore in your storage system.
High resolution robots
Here we present you the regular dose of new entities updated for high resolution for the hopefully fully high-res friendly 0.16 release.
As always, leave us any feedback or comments on our forum
I decided to write about the results of the item stack optimisations explained in the FFF-198, so I rushed today to finish its implementation, just to find out that the task affects an even bigger part of the code than I expected, Items are related to many things in Factorio :)
After many hours of rewriting and fixing, I can compile it and even start a game, but most of the things are broken. It is quite funny to see some of the basic item interactions to be broken. Now I'm making commits like "Now I can split stacks", "Now I can merge stacks", etc. It reminds me the old days. In conclusion, the details of the optimization will have to wait for next week, and since it is after 10pm, this Friday facts will be somewhat shorter :)
High res and improved circuit connectors
I can at least present you the continued work on the updated high resolution graphics. The update of circuit connectors not only provides them in high resolution, but as it is possible to see it in more detail, the graphics can show more accurately what it represents. Specifically, if the connector is only reading the state of the machine (blue LED), controlling its behaviour (red/green LED) or both.
You can also clearly see how weird it looks when we combine a low res entity (roboport, chests, liquid tank) with a high res connector, but this is just a temporary state and most of these entities should be high-res compatible when the release is ready.
You can also notice (on the chests for example), that the green and red connectors are not vertically aligned, which might look slightly weird, but it is on purpose. In the current version (0.15), when two entities are in one column and they are using both the green and red cable, they overlap perfectly so it is impossible to see both of the cables at the same time as shown below.
Additionally, the green LED in this example doesn't make any sense, as the chest will always have only read mode, which is also addressed in the new graphics.
As always, leave us any feedback or comments over on our forum
I decided to write about the results of the item stack optimisations explained in the FFF-198, so I rushed today to finish its implementation, just to find out that the task affects an even bigger part of the code than I expected, Items are related to many things in Factorio :)
After many hours of rewriting and fixing, I can compile it and even start a game, but most of the things are broken. It is quite funny to see some of the basic item interactions to be broken. Now I'm making commits like "Now I can split stacks", "Now I can merge stacks", etc. It reminds me the old days. In conclusion, the details of the optimization will have to wait for next week, and since it is after 10pm, this Friday facts will be somewhat shorter :)
High res and improved circuit connectors
I can at least present you the continued work on the updated high resolution graphics. The update of circuit connectors not only provides them in high resolution, but as it is possible to see it in more detail, the graphics can show more accurately what it represents. Specifically, if the connector is only reading the state of the machine (blue LED), controlling its behaviour (red/green LED) or both.
You can also clearly see how weird it looks when we combine a low res entity (roboport, chests, liquid tank) with a high res connector, but this is just a temporary state and most of these entities should be high-res compatible when the release is ready.
You can also notice (on the chests for example), that the green and red connectors are not vertically aligned, which might look slightly weird, but it is on purpose. In the current version (0.15), when two entities are in one column and they are using both the green and red cable, they overlap perfectly so it is impossible to see both of the cables at the same time as shown below.
Additionally, the green LED in this example doesn't make any sense, as the chest will always have only read mode, which is also addressed in the new graphics.
As always, leave us any feedback or comments over on our forum