Wednesday, February 27, 2013

Tuning Indexers, Crawlers & Query servers in SharePoint 2007 & 2010 to achieve Redundancy, Fault Tolerance and Maximize Search Performance

Here are some key concepts for Indexers, Crawlers & Query servers in different versions of SharePoint. (SharePoint 2007 and SharePoint 2010) to achieve redundancy, fault tolerance and maximize the Search performance.

SharePoint 2007 Index & Query Servers: 
There can be only 1 dedicated Index server configured per Shared Services Provider (SSP) associated with a SharePoint Web Application. Hence, Index servers cannot be made redundant, but you can scale them per SSP. This gives it the role of building and storing the index.

The query role does not have to be on your index server.  It's good to have the Web Front ends play the role of a query server so that the searches are fast (queries itself locally) and for some redundancy, since index servers cannot be made redundant.  What this does is it tells the index server to propagate its index to the WFE's that are set as query servers so that they have a local copy of the index. Then, when someone does a search (this is done on the WFE), then that WFE will search itself locally instead of going across the network to query the index server.  This increases speed at the time of query, but it of course introduces additional overhead in terms of having multiple full copies of the index on the network and the network demand of propagating those index copies all the time.

If the index server goes down for some reason, WFE's still have a local copy of the index for allowing searches with current content - they just don't get refreshed until the index server comes back online.

The crawl server (or servers) is the WFE that the indexer uses for crawling content. You can choose to make your index server a WFE that isn't part of your load balancing and then set itself as the dedicated crawler.  What this does is allows the indexer to crawl itself, which does two things: avoid the network traffic of building the index across the network and eliminates the crawling load on the WFEs. Since your index server becomes an out-of-rotation WFE for regular browsing, you can actually use it to
host your Central Admin and SSP web apps, which again reduces load/overhead on the content WFEs.

But if you put Query on the Index server, then queries have to go all the way from the WFE to the Index server and back, which can cause a performance hit. Acting as a Query server will compete with the very intense indexing process if they're on the same box.

Reference: Above are excerpts from Social Technet Forum:

SharePoint 2010 Enhancements:
Architecture in SharePoint 2010 is flexible. You can configure Multiple Crawlers, Indexers and Query Components.

Crawl Component –  It is commonly referred to as the crawling component or indexer. Crawl component is hosted on an Index server and its primary responsibility is to build indexes. Unlike the previous version of SharePoint the crawl component is stateless; meaning the index that is created is not actually stored in the crawl component. The index is propagated to the appropriate query server. The crawl component runs within MSSearch.exe process and the MSSearch.exe process is a windows service “SharePoint Server Search 14”.  

Crawl Database – As you just learned, the crawling component itself is stateless.
State is actually managed in the crawl database which will track what needs to be crawled and what has been crawled. When a crawler component is provisioned, it requires a mapping to a SQL crawl database.   Both of these can be created by using either Central Administrator or PowerShell.

A crawl component can only map to one SQL Crawl database. Multiple crawl components can map to the same Crawl database. By having multiple crawl components mapped to the same crawl database, fault tolerance is achieved. If the Index server hosting crawl component 1 crashes, crawl component 2 picks up the additional load while 1 is down. If the server hosting crawl component 1 crashes, crawl component 2 picks up the additional load while 1 is down. Performance is improved in this setup because you effectively now have two indexers crawling the content instead of one. If you’re not satisfied with crawl times, simply add an additional crawl component mapped to the same crawl DB. The load is distributed across both index servers.

Indexers – Indexers are“Server(s) hosting a crawl component(s)” associated to that crawl database that is responsible to crawl that host or Content Sources associated with the Search Service Application. When multiple crawl databases exist, an attempt is made to distribute these host entries or Content sources evenly. Index is no longer a single point of failure and is stored on Query servers. The Query component holds the entire index or partition of an index.

Query Component – This is the component that will perform a search against an index created by the crawler component. It is also commonly referred to as the query server. A Query Server is a server that runs one or more Query Components. These servers hold a full or partial of the search index. Query Servers are now the sole owner of storing the index on the file system. As stated above, the indexer crawls content and builds a temporary index.The Indexer propagates portions of the temporary index over to Query Server to be indexed. Query Servers contain a copy of the entire or partial index referred to as an Index Partition.

In previous builds of SharePoint, every query server stored the entire index. While this achieved fault tolerance it didn’t help with performance. There is a direct correlation between the size of an index and query latency. The size of an index can easily become a bottleneck for query performance.

Index Partition – Is a new feature of SharePoint 2010 and is directly correlated to the query component.
We now have the ability to break the indexes into multiple partitions to improve the amount of time it takes to perform a search by the query component. For every query component there will be a single index partition that is queried by the query component. Another way of putting it is, every time a query component is created, another index partition is created. By creating additional query components, a new index partition is created that owns a portion of the index.

By partitioning large indexes, query times are reduced and a solution to this type of bottleneck can be solved. Partitioning an index is as simple as provisioning new Query Components from the Search Application Topology section in Central Administrator. The crawler evenly distributes crawled content to Index Partitions using a hash algorithm based on Doc ID’s.

Index Partition Mirror – There is a new capability to create mirrors of the index partitions.
These mirrors again provide the ability to provide fault tolerance. It’s highly recommended to create fault tolerance with your index. This is accomplished by mirroring a Query component assigned to a different server. Under the Search Application Topology, you can simply select the Query Component and Add mirror:

Property Database – Stores metadata and security information items in the index.
The property database will be associated to one or more query components and is user as part of the query process. These properties will be populated as part of the crawling process which creates the index.

Just like Query components, Property Store DB can be scaled out and share the load of the metadata stored in the Property Store DB. If the Property Store DB becomes a bottleneck due to the size of the database and\or strains the disk subsystem with high I/O latency on the back end, a new Property Store DB can be provisioned to share the load.  Just like the Crawl DB, the Property Store DB is useless unless it’s mapped to something.  In this case, a Property Store DB must be mapped to a Query component. If a decision is made to provision an additional Property Store DB to boost performance, an additional non-mirrored Query Component must be provisioned and mapped to it.

Query Processor – Property Store DB and Query component scale out is only half of the battle. The Query Processor remains and still plays a vital role in Search 2010. The Query processor is responsible for processing a Query and runs under w3wp.exe process.  It retrieves results from Property Store DB and the Index\Query Components. Once results are retrieved, they are packaged\security trimmed and delivered back to the requester which is the WFE that initiated the request.  The Query Processor will load balance request if more than one Query Component (mirrored) exists within the same Index Partition.  The exception to this rule is if one of the Query Component’s is marked as fail over only.

Just like the Query Component and Property Store DB, the Query Processor role can be scaled out to multiple servers.