{"id":1882,"date":"2019-03-19T00:00:00","date_gmt":"2019-03-18T23:00:00","guid":{"rendered":"https:\/\/wwwneu.strehle.de\/tim\/weblog\/archives\/2019\/03\/19\/1641-2\/"},"modified":"2019-03-19T00:00:00","modified_gmt":"2019-03-18T23:00:00","slug":"1641-2","status":"publish","type":"post","link":"https:\/\/www.strehle.de\/tim\/weblog\/archives\/2019\/03\/19\/1641-2\/","title":{"rendered":"The story of my favorite bug"},"content":{"rendered":"<p>In the fifteenth year of my <a href=\"\/tim\/weblog\/archives\/2017\/07\/11\/1622\">software developer career<\/a>, I encountered a remarkable bug that would \u201centertain\u201d me for weeks.<\/p>\n<h3>\u201cGarbage in, garbage out\u201d?<\/h3>\n<p>It started off rather innocuously: At a recently-installed customer site, I noticed that a few XML files sent from the editorial system weren\u2019t imported correctly into our DAM system. The files were imported, but parts of the data looked bad.<\/p>\n<p>My first assumption was that the editorial system had generated faulty files \u2013 \u201cgarbage in, garbage out.\u201d But they seemed no different from successfully-imported files, and when I reimported the files that originally failed, the problem was gone.<\/p>\n<p>I had no idea where to look and decided to live with the problem for a while. Each day, I checked for failed imports and manually triggered a reimport. But sooner or later I would have to find and fix the cause.<\/p>\n<h3>Reproduce it<\/h3>\n<p>I tried to reproduce the problem on a development server and on the customer\u2019s test server, but was unable to. I could reproduce it only when doing test imports of large batches of files on the production server.<\/p>\n<p>I found out that the bad data after import was due to an XPath expression in an XSL transformation not returning what I was expecting \u2013 some of the time. <a href=\"https:\/\/twitter.com\/tistre\/status\/190030462733926400\">I tweeted<\/a>:<\/p>\n<blockquote>\n<p>\u201cWeird #XSLT bug keeps haunting me: Each day, one or two of some 100 transformations w\/ #XPath a\/* return a\/b, a\/b\/c instead of just a\/b.\u201d<\/p>\n<\/blockquote>\n<p>It took me many hours to boil the pretty complex XML and XSLT files down to a tiny test case that let me still reproduce the bug. I ran that test case on as many Linux servers as I could get my hands on, learning that only the SuSE Linux Enterprise 10 (SLES 10) operating system was affected. (The test server had openSUSE instead of SLES installed, that\u2019s why I didn\u2019t run into the problem there.)<\/p>\n<p>Still, I had no idea where this was coming from. <a href=\"https:\/\/twitter.com\/tistre\/status\/194758092192354304\">My next tweet<\/a>, slightly desperate, linked to the test:<\/p>\n<blockquote>\n<p>\u201cI\u2019m running the same #XSLT transform 1,000 times, and get a wrong result 29 times: <a href=\"\/tim\/check-for-xslt-bug.phps\">http:\/\/www.strehle.de\/tim\/check-for-xslt-bug.phps<\/a> \u2026 (SLES 10 only) #PHP #XML #heisenbug\u201d<\/p>\n<\/blockquote>\n<h3>Root cause<\/h3>\n<p>By now I was pretty sure that this weird behaviour was not my fault. Finally, I discovered a report on the Web that linked strange <a href=\"http:\/\/xmlsoft.org\">Libxml2<\/a> symptoms to a bug in the low-level glibc library. That broken glibc version was shipped with the SLES 10 release we had on the production servers.<\/p>\n<h3>The fix<\/h3>\n<p>It was great to know the root cause, but updating the operating system on the production servers was not going to happen for several months.<\/p>\n<p>Many parts of the DAM workflow relied on complex XSL transformations. Rewriting XSLT logic in PHP would be a ton of work. But how could I keep using XSLT when I couldn\u2019t trust the XSL processor? It worked most of the time, but I never knew when it would fail.<\/p>\n<p>Suddenly, I had a revelation \u2013 the previous sentence already contained the solution: \u201cIt worked most of the time\u201d! I simply had to run all XSL transformations in a loop and use the result that was returned most often.<\/p>\n<p>This approach fixed the problem once and for all, and the workaround was active for years. Two months after noticing the first symptoms, <a href=\"https:\/\/twitter.com\/tistre\/status\/205281813042696192\">my final tweet<\/a> said:<\/p>\n<blockquote>\n<p>\u201cWeird hack: The heisenbug happens only sometimes, so I run each XSL transform 20 times and use the most common result\u2026\u201d<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>In the fifteenth year of my software developer career, I encountered a remarkable bug that would \u201centertain\u201d me for weeks. \u201cGarbage in, garbage out\u201d? It started off rather innocuously: At a recently-installed customer site, I noticed that a few XML files sent from the editorial system weren\u2019t imported correctly into our DAM system. The files [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_share_on_mastodon":"0"},"categories":[1],"tags":[],"class_list":["post-1882","post","type-post","status-publish","format-standard","hentry","category-weblog"],"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1882","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/comments?post=1882"}],"version-history":[{"count":0,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1882\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/media?parent=1882"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/categories?post=1882"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/tags?post=1882"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}