Sometimes you stumble upon an exceptional bug ; most bugs are stupidly boring
but some are interesting, especially the ones that are perfectly deterministic
but whose causes are really mysterious at first.
My weirdest bug ever was a UI involving layered window on Windows. If you’re
unfamiliar with win32 programming, it is the name of the technology that allows
to do transparent windows under Windows. That UI was a sort of very fluid
transparent floating buttons over the desktop.
the mystery machine
This bug manifested itself by the display rate being slow once every 2 times. Once
the UI was loaded, it was slow or fast, but we could never know beforehand. I
didn’t noticed that distribution before knowing what’s the problem really was,
because of course it didn’t occured in a row. It only made sense at the end.
Facing a performance issue you know nothing about, your first tool is kernrate,
which is a statistical profiler. It pools the instruction pointer at periodic
interval and summarize where the time goes. It’s a very fast and easy, non
invasive way of doing profiling. Nyanaeve has a good introduction about
With kernrate, we learned that the time was spend in kernel level, not in user
level. Kernel level mostly means : drivers. I also got a position :
The only issue is that this meant : nothing.
Windows source code is obviously unavailable to most of us, but Microsoft just
know that sometimes you need a little more insight on the platform and provide
debug symbol files which can tell you more precisely where you are. If you’re
unfamiliar with them, they mainly provide the correspondance between memory
adresses and function names.
With the debug symbols installed, we got a much more interesting position :
MMX & memory allocations
It took me a while to exactly understand it. What happened was that the video
card didn’t had an hardware implementation for layered windows so its driver
allocated memory and forwarded it to the default implementation.
Writing gui drivers for Windows is not an easy task, you have to handle pixel
writing in many pixels format like 2/2/2, 2/2/4, etc and conversions between all
thoses format as well. So MS was nice enough to write default implementation
for all those cases. When a driver doesn’t know how to handle something, it
allocates memory and gives it to the default Windows implementation. So you
write a miniui driver and let windows handle all the special cases.
This default implementation is potentially slower, but not that slow, and
beyond that it wouldn’t be random. So that’s not the reason. But this was
all we got, and we needed to know.
This specific default implementation of the operation was MMX optimised,
because it’s faster of course, and was probably enabled as soon as an
appropriate processor is available. MMX intructions can only operate on 8 bytes
The driver didn’t cared about that and provided a 4 byte aligned memory block.
By shear luck the allocation was magically 8 bytes aligned, other times it was
4 bytes aligned.
The Windows implementation couldn’t operate on 4 bytes
aligned memory so it copied it to another memory block, processed it and copied
it back again.
As the driver only allocated the memory once at the creation of
the window it was fast once every 2 and never changed after that.
That’s a leaky abstraction to its fullest, an hardware requirement caused a very visible user experience problem, sadly unfixable.
A fix would require the source code or the driver, rebuilding it and redeploying
it to everyone. A workaround would be to keep creating window until once got created
with correct memory alignement, but there is no way to detect it.
Dead end, won’t fix ! But the trip was awesome still.
And you what was your weirdest bug ?