One of the major disadvantages of VBA (and VB6) UDFs is that they cannot be multi-threaded. And since everyone now has PCs with multiple cores and Excel 2007 or Excel 2010 (well except for my better half Jane who is currently struggling with a rather ancient laptop) this is starting to be a problem for us Excel speed freaks/geeks.
Thats one of the reasons why I decided to rewrite my next generation of fast UDFs as C++ XLLs.
I am using XLL+ Version 7 and VS 2010: using these tools to make a UDF multithreaded is easy: just tick the multi-threaded option and away you go.
I am now about 6 months into learning how to write these C++ UDFs things and its mostly going OK. Of course there are minor hiccups along the way: like when it took me half a day to figure out why this numeric parameter always arrived in the UDF as 1 regardless of whatever value was passed in! (If you really want to know its because Excel and the Oper data structure don’t really know about integers: they insist on thinking they are primitive booleans. I knew that happened on output – always use doubles – but stupidly had not considered that it would also happen on input …).
Anyway I have now written most of my simple functions: they all work OK and calculate fast, so obviously its now time to get overconfident and go for a real challenge.
Multi-threaded UDFs sharing a global resource.
One of the tricks involved in writing faster Lookup and Matches is to be able to store the row number in the input range where the answer was found and use it the next time the Lookup or Match is executed. This technique is fast with unsorted data.
For unthreaded UDFs this is conceptually straightforward: just store the row number in a global container of some kind. But for multithreaded UDFs its not so simple because you can have multiple instances of the UDF being executed on different threads and all trying to update or read the container at the same time – a definite no-no for multi-threading. The answer is to be able to set a lock on the global container.
After a lot of Googling I decided to use a shared lock from the Boost library and either a Map container from the Standard Template Library (STL), or an unordered Map from Boost. (Both of these libraries contain an enormous framework of containers and algorithms and other excellent stuff). The shared lock allows only 1 thread to update but multiple threads to read.
There were a few glitches getting VS2010 to acknowledge the Boost libraries, but once I had figured that out it all works and seems quite efficient (half a million updates and half a million reads in about 1 second using 8 cores). The unordered Map using integer keys is fastest.
The next thing to get my head around is how to handle multiple global containers at the same time. Simples! I thought, you just have a separate Lock for each container. But I suspect this could lead to the dreaded
“race condition” “deadlock” where one thread holds one lock and is waiting for another lock, and another thread holds that lock and is waiting for the first thread: the corresponding image is 2 meerkats chasing each other’s tails.
(Mike Woodhouse points out that a “race condition” is having 2 threads trying to update the same thing at the same time.)
Time for a cold towel wrapped round my head, swiftly followed by a large glass of Central Otago Pinot Noir (fantastic stuff).
Having slept on it I think I need a timeout on the locks as well (I wondered why there was a timeout parameter on the shared lock). The problem is that even if all MY udfs depend on a single lock the user could be using someone elses UDFs at the same time that could have their own lock:
- MyXLL creates MyLockUDF and OtherXLL creates OtherLockUDF
- Formula1 =MyLockUDF()+OtherLockUDF()
- Formula2 =OtherLockUDF()+MyLockUDF()
- Formula1 and Formula2 are on separate threads being calculated at the same time
then there could be a deadlock.