How to make the Full System Scan 6x faster in 10 days
During the last few weeks, we have been tweaking the avast! 5 engine; and while doing this, we found out that there were some hidden reserves with respect to its performance (namely, the duration of the on-demand scans).
One of the great new features of avast 5 is the persistent cache, a mechanism which allows us to skip rescanning of certain files. In particular, this applies to files which are on our internal whitelists, as well as files which are digitally signed by trusted publishers (we maintain a relatively short list of software publishers that we trust, and we consider any files produced and digitally signed by these publishers as safe).
Previously, we were using the crypto services provided by the operating system (called “wintrust”) to do the actual verification of the digital signatures. We knew this wasn’t ideal though – especially because we realized that in case the underlying system was somehow compromised, any such system API could already be redirected/hijacked by malware and so trusting it was not 100% bulletproof. For this reason, we have been working on our own implementation of the signature verifier. What seemed like an easy task in the beginning actually turned out to be a fairly large project with tens of thousands of lines of code, and many months of work.
The works on this were finished about a month ago, and after some additional reliability testing, we finally released it to the public as part of the April 19th definition update (last Monday). What’s interesting that this change brought us not only increased reliability (the reason why we decided to implement it in the first place), but also significant performance gain. On our test system (a Dell workstation with an Intel Core i7 CPU, 4GB RAM and Windows 7) the duration of the Full System Scan time suddenly went from 39:35 to 16:03 – meaning almost 2.5x speedup!
We haven’t really done a full analysis of what’s actually causing this, but our current hypothesis is that the performance gain is related to checking of the signature catalogs. It is possible that the Wintrust APIs reopen/reread the catalogs every time a file is checked, whereas our implementation only reads them once and keeps them cached in memory for the whole duration of the scan.
Now, this by itself raised a lot of interest in exploring if things could be improved even more. So we revisited the verification code once more, and found out that the code spends most of the time in a function that is responsible for the calculation of SHA-1 hashes. This is no surprise, as pretty much all signing certificates are currently based on the SHA-1 algorithm, and the actual hashing is the most expensive part of the verification process.
So the next logical step was to optimize our implementation of she SHA-1 algorithm. Interestingly enough, one of the engineers on the Intel performance team recently published a nice article describing the possibilities to speed up SHA-1 by means of the SSE2 instructions added in the Pentium 4 processor. Using these ideas, we were able to further optimize the code so that it ran about 30% faster (especially on the latest Core 2 and i7 CPUs).
While doing all these tests, we also noticed one strange thing: the Full System Scan ran pretty much the same time during the first and all subsequent runs. It was not supposed to be like this though – the persistent cache was supposed to let the 2nd and all subsequent scans run faster. Not so dramatically as the Quick Scan (as the Full System Scan is set up so that it does not trust the persistent cache by default), but still quite significantly as we weren’t supposed to be verifying the digital signatures of files during these repeated scans. So we reviewed the relevant code, and were quite surprised to find out that the verification task was indeed performed every time, not just in the first pass. Fixing this (in the yesterday’s engine update, April 24th), we were able to cut down the scan time on that reference machine down to mere 6 minutes 54 seconds – which translates to almost 6x speedup (with no effect on dection rates, of course)!
For us, this was a great exercise which showed the beauty of software engineering. Sometimes, if you try really hard, you can make a heck of a difference.
By the way, I encourage you to run a Full System Scan and report your findings here in the Comments section below. Of course, your mileage may vary (it all depends on your hardware configuration, but generally the higher-end hardware, the more significant speedups you should expect) but we expect that at least 2-3x speedup should be measurable on pretty much all systems. Also, please keep in mind that the first scan is supposed to take significantly longer so if you have never ran a Full System Scan yet, it’s good to run it twice and compare the results.
Tip: to make the Full System Scan even faster, configure it to actually take advantage of the persistent cache. To do this, open the Full System Scan details, click the Settings button and check the box “Speed up scanning by using the persistent cache” on the Performance page.