Jaap Vossers' SharePoint Blog

Just another WordPress.com site

Archive for January 2012

How the XsltListViewWebPart in SharePoint 2010 can be a real performance killer

with 32 comments

This article describes how the out-of-the-box XsltListViewWebPart, XLV in short, can cause performance problems, sometimes up to the point where pages being served to users are hitting the ASP.NET execution timeout (110 seconds by default).

The XLV is a Web Part that was introduced in SharePoint 2010 as the replacement for the ListViewWebPart. The most important feature of the XLV is the ability to use a custom XSLT style sheet to customise the rendering of the list data surfaced through the XLV. On the project I am currently working we are making extensive use of this technique. What turns out is that this has been the main cause of the – seemingly random – performance problems we have been experiencing on our production environment for the last couple of months. On one day, it completely got out of hand and for the majority of the working day, our 3000 users were getting request timeouts and were simply unable to load most the pages in our portal. As a result, with support from a few colleagues, I have spent about 2 weeks trying to get to the bottom of this problem, which involved extensive analysis of the ULS logs, IIS logs, Content Database, Performance Counters and a lot of Reflected SharePoint code. By the end, I had come up with a set of steps to reproduce the problem on a vanilla SharePoint environment. I had gotten a better understanding of the internal workings of the XLV; enough to explain why it’s doing what it does. Keep on reading if you are interested in finding out more. Warning, this article contains code.

So – what are the ingredients?

  1. a SharePoint 2010 Publishing Site
  2. One or more XsltListViewWebParts with the XslLink property set to point to an XSLT file
  3. Multiple Web Front End servers

What are the symptoms?

  • A variable increase in response times for pages with XLVs using XslLink; sometimes hardly noticeable, sometimes so bad that the request will time out.

Why? What? How?

In short, the problem is related to this thing called XSLT compilation which happens on the Web Front Ends at runtime. There seems to be a bug in the XLV that’s causing it to discard the compiled XSLT from its cache when it should be keeping it.

Compilation of XSLT is quite an expensive operation, which is why the XLV is meant to be caching the result so that the XSLT doesn’t have to be recompiled for every subsequent request. The time it takes to compile an XSLT style sheet seems to be dependent on its contents. Compilation of style sheets with more content seem to take longer than compiling small ones. On our environment, we’ve got fairly large XSLT stylesheets and it’s not uncommon for a single compilation to take 2 seconds.

Why is it a problem if the Web Frond End servers are busy compiling XSLT style sheets all the time? Have a look at the following reflected code (obtained using ILSpy, an open-source alternative to .NET Reflector),

image

The developers amongst you will understand that the call to AppDomainHelper.DoSerialize() is wrapped in a lock, meaning that XSLT compilation is something that can only be done by one thread at a time.

This funnel becomes a massive bottleneck when compilation cannot keep up with the compilation requests coming in, as they will form a growing queue of threads blocked at Monitor.Enter(). As the queue of blocked threads keeps growing, the amount of time that each individual thread has to wait goes up. Imagine for a moment what this will do to page response times.

Not to mention that the compilation of XSLT is actually being performed in a separate AppDomain, whose lifetime is managed by SharePoint. For some reason, if this AppDomain has served a total of 100 compilation requests, SharePoint decides to unload the AppDomain and create a new one. This is also an expensive operation and gives the queue of blocked threads the opportunity to grow even more.

image

To back up this theory, have a look at the following SCOM chart, which presents the “Total appdomains unloaded” performance counter for each of our Web Front End servers over a period of 24 hours. The counter gets reset after an Application Pool recycle, hence the drops after midnight.

image

It shows that it’s quite likely that our Web Front End servers are busy compiling XSLT all the time, causing and increase in response times for all requests to pages containing an XLV with XslLink set.

Let’s go back to our scenario where we experienced those severe performance issues for a whole day. The queue of threads waiting for the lock to be released had grown so much that we had threads waiting long enough for ASP.NET to decide it was time for it throw its virtual towel in the ring for that request. When the ASP.NET execution timeout is hit, ASP.NET it will throw a ThreadAbortException on the thread that is handling the request.

The picture below is showing a filtered set of entries from a ULS log file which contains 12 minutes worth of logs at Verbose level from a single Web Front End, captured during this period when we were experiencing our severe performance problems. The definition of the filter that is applied is Where [Message] Contains “Monitor.Enter”.

image

476 timeouts in 12 minutes, that’s about 40 per minute.

The nice thing about these entries in the ULS logs is that we can tell which method the thread was executing when the ThreadAbortException was thrown by ASP.NET because they contain a Stack Trace, and as you can see in the screenshot above, these threads were all “executing” Monitor.Enter(), i.e. they were blocked, waiting for the lock to be released.

Why are the XLVs compiling the XSLT all the time? Isn’t it supposed to cache the result and reuse it? You’d think so. Let’s have a good look at the Stack Trace that we’ve got that leads to the call to Monitor.Enter() as I am pretty sure that somewhere in the call stack a decision was made that it should not use a cached version of the compiled XSLT.

Error while executing web part: System.Threading.ThreadAbortException: Thread was being aborted.
at System.Threading.Monitor.Enter(Object obj)
at Microsoft.SharePoint.WebPartPages.BaseXsltListWebPart.GenerateCustomizedXsl(BaseXsltInfo baseXsltInfo)
at Microsoft.SharePoint.WebPartPages.BaseXsltListWebPart.get_CustomizedXsl()
at Microsoft.SharePoint.WebPartPages.BaseXsltListWebPart.LoadXslCompiledTransform(WSSXmlUrlResolver someXmlResolver)
at Microsoft.SharePoint.WebPartPages.DataFormWebPart.GetXslCompiledTransform()
at Microsoft.SharePoint.WebPartPages.DataFormWebPart.PrepareAndPerformTransform(Boolean bDeferExecuteTransform)

Note that the XsltListViewWebPart derives from BaseXsltListWebPart which in turn derives from DataFormWebPart.

DataFormWebPart.GetXslCompiledTransform() contains logic which decides whether BaseXsltListWebPart.LoadXslCompiledTransform() should be called or not. It only calls this method if it was unable to retrieve the compiled XSLT from its cache. Note that the DataFormWebPart uses HttpContext.Current.Cache as its “first level” of caching for the compiled XSLT (I will explain later why I call it “first level”).

So by the looks of it, the XLV was consistently unable to retrieve any compiled XSLT from the HttpContext.Current.Cache? Did it fail to add it to the cache the last time it compiled the XSLT, or has it perhaps been removed from the cache since it’s been added?

Let ‘s have another look at the same ULS file we looked at earlier. We’ve got the following suspicious entries. The definition of the filter that is applied is Where [Message] Contains “InvalidateAll”.

image

Not a very descriptive message you’d say, but descriptive enough to suspect it might be clearing some kind of cache for whatever reason.

ILSpy agrees.

image

What is the MemoryPartCache class? And what is it removing from the cache exactly?

The Microsoft.SharePoint.WebPartPages.WebPart class exposes the protected methods PartCacheRead() and PartCacheWrite() which internally use an object that derives from the abstract Microsoft.SharePoint.WebPartPages.PartCache class. Depending on what’s configured in the web.config, either an instance of MemoryPartCache of DatabasePartCache will be used. With the default web.config for SharePoint, an instance of MemoryPartCache will be used.

image

Let’s drill down into the constructor of MemoryPartCache.

image

Two things are interesting here. First of all, the call to BuildDependsKeys() as it looks like inside it is building the value for this.dependStringKey, which is the key for the object that is being removed in the InvalidateAll() method which we looked at earlier. Let’s have a look at that first, before we have a look at what the EnsureDependsKeyValid() method does.

image

Assuming the Web Part for which this MemoryPartCache object is being created has a StorageKey (Web Part ID), it will build a string with two “dynamic” elements; the Storage Key and the id of the SPSite. This implies there is a one-to-one relationship between an instance of Microsoft.SharePoint.WebPartPages.WebPart and such a key. Let’s remember this and find out where else this key is being used.

image

Before we have a look at EnsureDependsKeyValid(), let’s look at the Add() method. The MemoryPartCache.Add() is called when PartCacheWrite() is called on Microsoft.SharePoint.WebPartPages.WebPart. Whatever object needs to be added to the cache will be passed into MemoryPartCache.Add() as a method parameter named “data”. On line 59, at the very bottom of the bit of code below, you can see that the “data” object is being added to the cache. But hold on, it is being added with a CacheDependency! The CacheDependency that is being passed in is based on a string array, which means that it establishes a dependency on other object(s) in the cache, with the array items representing the key(s) of the “master” item(s). In this case we are only dependent on a single “master” item. This “master” item itself is added to the cache on line 30, if it didn’t already exist (which is tested on line 24). The interesting bit here is that the “master” item’s key is in fact this.dependentStringKey, which means that whenever MemoryPartCache.InvalidateAll() is called for a Web Part, its “master” item will be removed from the cache, along with any other “child” items that have been added through MemoryPartCache.Add().

image

We had already proven that InvalidateAll() is getting called quite a bit, so it’s possible that other cached objects are being removed from the cache as a result of that.

Let’s have a look at MemoryPartCache.EnsureDependsKeyValid(), which is also being called from within the MemoryPartCache constructor.

image

We are getting closer. So in certain circumstances, InvalidateAll() will be called, which essentially does what pretty much what it says it does; remove the master item from the cache along with any other objects in the cache related to this particular Web Part. What exactly is it testing in the if statement on line 6? It is comparing the result of the call to this.GetDepencyKeyHash() to the value of the cached “master” item, which, at the time of adding, also held the return value of a call to this.GetDepencyKeyHash(), as can be seen in line 30 in MemoryPartCache.Add(), above.

Let’s have a look at what GetDepencyKeyHash() does.

image

Aha! It looks at the Web Part version.

In a nutshell this means that whenever a web part version changes, any cache stored against other (previous) versions of that web part, will be cleared.

The version of a Web Part is not exposed through a public API, so if you want to see it you need to either write code which uses reflection to read the internal Version property of Microsoft.SharePoint.WebPartPages.WebPart, or you can have a quick peek in the Content Database. Note that you should never touch the Content Database on a production environment, which is why you should back up the database and restore it elsewhere before you start running queries against it.

As I was exploring the Content Database, I spotted the tp_Version column on the AllWebParts table. When I selected all rows from the AllWebParts table, and ordered it by tp_Version in descending order, I found that some of our web parts had versions greater than 100.000, which is pretty suspicious I must say!

Let’s summarise what information we’ve got so far.

  1. Web Part versions are incrementing A LOT for some unknown reason
  2. Objects stored in Web Part Cache are being cleared every time a Web Part version changes
  3. XLVs seem to be unable to retrieve compiled XSLT from the cache, with recompilation of the XSLT as a result, causing a variable increase in response times

Even though the compiled XSLT itself is not stored using Web Part Cache, another object of type CloneableHashtable is. This object is added and retrieved from the Web Part Cache to facilitate the retrieval of the compiled XSLT by taking part in process of building the cache key required to retrieve the compiled XSLT. This means that when InvalidateAll() is called and the Web Parts’s “master” object will be removed from the cache, so will this CloneableHashtable as it’s dependent on the Web Part’s “master” object. As a result, the XLV cannot retrieve the CloneableHashtable from the cache (as it does not exist anymore), and consequently it cannot build the cache key that is required to retrieve the compiled XSLT from the cache. This logic is defined in Microsoft.SharePoint.WebPartPages.DataFormWebPart.GetXslCompiledTransform().

Why are the Web Part versions being incremented automagically?

Remember I mentioned “first level” of caching earlier? Well, there is a second level of caching of the compiled XSLT, in the content database, in the tp_Cache column in the same AllWebParts table. How I understand how this is SUPPOSED to work, is that if you have multiple Web Front End servers, and the first server to serve a page with a new XLV with XslLink on it does the compilation, it will first add the compilation result to its own cache as described above, and then it will also save it to the database (line 32, below), so that all the other Web Front End servers in the farm can reuse it.

Pretty clever.

Too bad it doesn’t work.

image

The call to CacheWriteInternal on line 32 results in a call to a Stored Procedure called proc_UpdateWebPartCache. However, the value passed in to the VarBinary @Cache parameter of the Stored Procedure does not contain any data, as can be observed in the Developer Dashboard.

image

Also, If you run a trace with SQL Server Profiler whilst CacheWriteInternal() is being called you can see the call to proc_UpdateWebPartCache, and you can see that a NULL value was passed as the value for the @Cache parameter.

To confirm the database really didn’t receive the compilation result, we can run a select query on the AllWebParts table. It shows only NULL values in the tp_Cache column. Essentially, it’s not saving the compiled XSLT to the database properly.

What it did do, just before it called CacheWriteInternal(), was that it made a call to SaveChanges() on line 31. And guess what that does. It calls a Stored Procedure called proc_UpdateWebPart, and one of the many things this Stored Procedure does is incrementing the Web Part version!

image

What would have happened if the compiled XSLT would have been saved to the database successfully?

Lets have a look at the definition of PartCacheUpdate() which is being overridden in the Microsoft.SharePoint.WebPartPages.BaseXsltListWebPart class, the base class for the XLV.

image

Now the name of this method is a bit confusing, but what it really is, is a mechanism to allow to the WebPartManager to “inject” the value of the tp_Cache column from the AllWebParts table into the Web Part that owns the cache at the time of instantiating the Web Part as a Control on the page, somewhere fairly early on in the page lifecycle. So IF the tp_Cache column would have a held a value other than null, we would have captured it in the this._customizedXsl field, line 7.

Now let’s go back to a bit of code we looked at just a moment ago.

image

Notice the if statement on line 9, where it tests if this._customizedXsl is null. Assuming we had received it from the database via the WebPartManager, there would have been no call to this.GenerateCustomizedXsl() and we would have successfully reused the compilation result from another Web Front End server.

There is however a caveat. Even if we did receive the compiled XSLT from the database, there is a “second chance” to recompile when the if statement on line 18 is satisfied. The BaseXsltHashKey property on the XLV holds a string containing a list of filenames of .xsl files that live in the TEMPLATE\LAYOUTS\XSL folder and their last modified times. The if statement on line 18 tests if the value stored in the Web Part is the same as the one it just calculated by checking the files on the file system of Web Front End serving the current request. If they are different, SharePoint thinks you just deployed a new .xsl file and it’s time to recompile.

What this means is, if, for some reason, the list of .xsl files contained in the TEMPLATE\LAYOUTS\XSL and their last modified times are not exactly the same across all Web Front End servers, you will have a problem. Because, SharePoint will keep detecting a difference between the value stored in the Web Part property and the calculated values, as the Web Part property keeps getting updated with the calculated value from the Web Front End server that last detected the difference, and the never ending ping pong game has started.

We have seen scenarios where standard WSP deployment of .xsl files into the TEMPLATE\LAYOUTS\XSL folder sometimes caused files to be deployed / updated with timestamps that we not exactly the same across all Web Front End servers in the farm, i.e. a difference of a single second. This is enough SharePoint to start playing the never ending ping pong game. I must say that I have not seen this kind of odd behaviour with WSP deployments before, so I am not sure if this is something environmental or something that affects everyone.

Overview

For a quick overview of the overall flow which is described in this article please refer to the following a diagram.

xsl_compilation_01

The main question

Why does the call to CacheWriteInternal() result in a Stored Procedure call with a NULL value for the @Cache parameter? I would love to know.

What’s next?

I am planning to write a follow up blog post soon on this topic where I will describe a work around that has solved the problem for us. Some of our worst hit pages went from an average of 8 second response times to less than a second.

Also, I have been in touch with Microsoft, and they have been able to reproduce the issue on their end. I am waiting to hear back from them.

Thanks

I would like to thank my colleagues and mates Nick Mulder, Glyn Clough, Steve Kay, Andres Ramon, and Kristof Kowalski for their help and great ideas.

How to reproduce

For those wanting to reproduce the issue on their own environment, you can do so by following these steps.

  1. Create a hosts file entry to point to 127.0.0.1 for the hostname “xlv” on each Web Front End.
  2. Create a new Web Application using mostly default settings:
    1. Claims Mode
    2. New IIS website
      1. Port 80
      2. Hostname “xlv”
    3. New Content Database
  3. Create a new root Site Collection under new Web Application
    1. Template: Publishing Portal
  4. Edit /Pages/default.aspx
    1. Remove all existing web parts
    2. Add new Web Part to page
      1. Lists and Libraries > Pages
    3. Edit newly added Web Part
      1. Property: Miscellaneous > XSL Link
      2. Value: /_layouts/XSL/main.xsl
    4. Check-in and publish/Pages/default.aspx
  5. Turn on Developer Dashboard with TraceEnabled=True and click “Show or hide additional tracing information . . .” at the bottom of the page to show additional tracing information.
  6. Open an RDP session to two of the Web Front Ends (let’s call them WFE1 and WFE2)
  7. On WFE1, Request http://xlv/Pages/default.aspx
  8. On WFE2, Request http://xlv/Pages/default.aspx
  9. WFE1, Request http://xlv/Pages/default.aspx
    1. Confirm the text “InvalidateAll” appears on the page in the additional tracing information
    2. Confirm that in the list of Database Queries on the Developer Dashboard the following entries exist:
      1. DECLARE @DN nvarchar(256), @LN
      2. proc_UpdateWebPartCache
        1. Confirm that the “Size” of the @Cache parameter is 0
  10. On WFE1, refresh the page a number of times
    1. Confirm the text “InvalidateAll” does NOT appear on the page in the additional tracing information
    2. Confirm that in the list of Database Queries on the Developer Dashboard the following entries do NOT exist:
      1. DECLARE @DN nvarchar(256), @LN
      2. proc_UpdateWebPartCache
  11. On WFE2, Request http://xlv/Pages/default.aspx
    1. Confirm the text “InvalidateAll” appears on the page in the additional tracing information
    2. Confirm that in the list of Database Queries on the Developer Dashboard the following entries exist:
      1. DECLARE @DN nvarchar(256), @LN
      2. proc_UpdateWebPartCache
        1. Confirm that the “Size” of the @Cache parameter is 0
  12. On WFE2, refresh the page a number of times
    1. Confirm the text “InvalidateAll” does NOT appear on the page in the additional tracing information
    2. Confirm that in the list of Database Queries on the Developer Dashboard the following entries do NOT exist:
      1. DECLARE @DN nvarchar(256), @LN
      2. proc_UpdateWebPartCache
  13. Run the following SQL query on the Content Database for the xlv Web Application
    1. SELECT wp.* FROM AllWebParts wp INNER JOIN AllDocs doc ON wp.tp_PageUrlID = doc.Id WHERE doc.DirName = ‘Pages’ AND doc.LeafName = ‘default.aspx’ AND wp.tp_IsCurrentVersion = 1
    2. Confirm the value for the tp_Cache column of the only row returned is NULL
  14. Repeat step 9 to step 12 a number of times
    1. Rerun the query in step 13.1 and confirm the value for tp_Version is incremented by one every time you reload the page and “InvalidateAll” appears on the page in the additional tracing information.

Written by jvossers

January 28, 2012 at 12:53 pm

Posted in SharePoint 2010

Follow

Get every new post delivered to your Inbox.