How we deployed SharePoint WSP Solutions without downtime

 

Introduction

As your SharePoint environment grows and more solutions are built on the platform it becomes critical to the business. Often it gets to the level that the business is very reluctant to allow SharePoint to be unavailable. This makes it difficult to plan application deployments which generally cause downtime.

Recently, I have been working on an environment which was in use globally and therefore the window for taking down the SharePoint farm is very small.

With that in mind I thought I’d share the process that I now use to deploy SharePoint WSP solutions without any downtime to the SharePoint farm. There are going to be times where you cannot avoid downtime, certain SharePoint object model calls cause all the server to recycle their application pools.

In order for deployments to be executed with no downtime there are a few requirements. There are infrastructure requirements as well as following a process.

 

Farm Configuration Requirements

In order to deploy SharePoint solutions there must be at least two SharePoint Servers configured as Web Front Ends. These web front ends are running the Web Application Service instance.

The next step is to ensure that these servers are being load balanced, either using software load balancing such as Windows Network Load Balancing (NLB) or using hardware load balancers such as the F5 Network’s BIG IP.

The F5 Network’s BIG IP configuration guide can be found here.

Without either the load balancers or the two web front ends then you are not going to be able to prevent downtime.

Regarding the load balancers, they must be configured so that when a server is down then no traffic is directed to that server.

At one client this required a bit more thought. The client used the F5 Network BIG IP load balancers unfortunately the load was not balanced as Round Robin balancing was being used.

Also the method to detect whether a server was down did not work. The way that the load balancers tested that a server was available was by using a low level call into IIS looking for an HTTP status of 200. This meant that as long as IIS up then the server was up and it would receive requests,  even if SharePoint was failing for some reason.

This was fixed by using a cURL script which allows Windows Authentication to be used to call into SharePoint. The script would then look for a particular text string in the page. As long as this was found then the server is up and available.

The last tweak to the load balancers were configured to use Observed dynamic balancing. The following article by Don MacVitte gives a great overview of the different types of load balancing.

Observed load balancing, balances the load by using a number of metrics, this snippet from the article explains in more detail:-

Observed: The Observed method uses a combination of the logic used in the Least Connections and Fastest algorithms to load balance connections to servers being load-balanced. With this method, servers are ranked based on a combination of the number of current connections and the response time. Servers that have a better balance of fewest connections and fastest response time receive a greater proportion of the connections. This Application Delivery Controller method is rarely available in a simple load balancer., including the response time from each server.

 

Deployment Process

SharePoint Deployment Without Downtime Diagram Process

With these infrastructure requirements in place it is possible to deploy solutions without downtime.

For SharePoint 2010 deployments, PowerShell is the tool of choice when deploying SharePoint Solutions.

Once you have the farm configured correctly then you will need to deploy the solutions through the SharePoint 2010 Management Shell.

So generally I login onto the server through Remote Desktop, copy over the solution file(s) into a folder such as c:\install\[solutionname]\[solutionversion].

Once the files are copied over:

  • Start SharePoint 2010 Management Shell
  • Change Directory to where the SharePoint solution (.wsp) are found.
    • cd c:\install\[solutionname]\1.0\

    The approach that I take is to use PowerShell variables where I can. This helps reduce the amount of typing and there are less mistakes made.

  • Create a variable to the current folder
  • $curdir = gl;
  • Create a variable for the solution’s filename
  • $solutionname = "MySolution.wsp”
  • Add the solution
  • Add-SPSolution –LiteralPath $curdir\$solutionname

Next is the important part, the solution has been added to the Configuration database and the next step is to deploy the solution to the servers. To ensure that there is no downtime, we will ensure that the deployment only occurs on one machine at a time.

Depending on the type of solution being deployed there are a few additional attributes that you may have to specify to the Install-SPSolution command.

However the most important attribute we need to remember is the –Local attribute this will only ensure that the solution deployment will only occur on the local server.

Other attributes include:-

  • WebApplication – this will deploy the solution to a specific Sharepoint Web Application
  • GacDeployment – this option allows a solution which contains .NET assemblies to be installed in to the GAC to be installed.
  • CasPolicies – this option allows a solution which contains code access security policies for the assemblies / web parts to be deployed.

Once the Install-SPSolution command has been run then the solution deployment success needs to be checked. This can also be achieved using PowerShell.

   $checkSolution = Get-SPSolution $solutionname;   $checkSolution.LastOperationalDetails;  

When this has been executed an output such as the following will be displayed for the local server.

sharepoint_solution_lastoperationdetails

The process now has to wait until the SPSolution.LastOperationDetails call returns back that the deployment is successful.

Once the deployment has completed, I now restart IIS and the SharePoint Timer Job Service using the following PowerShell.

 Restart-Service sptimerv4;   iisreset;   

The installation of the SharePoint solutions will have caused IIS to restart and therefore the server will not have responded to the load balancer. Provided that the load balancer has been configured correctly the server should no longer be processing requests to clients and therefore there is no loss of service.

Depending on configuration of the load balancers, the time it takes for a server to start responding to requests will based on a setting called ramp up time.

The slow ramp up time is the time the load balancer will wait after the server has come back online before it starts sending all requests to the server. This gives the server time to get back on its feet and processing those requests.

Once the server is back online, the SharePoint solution process can be repeated for each of the other SharePoint Servers.

 

Deployment Gotchas

Although this approach will for most cases work there are a few gotchas that you need to watch out for. In some of these occasions all SharePoint WFEs will have their application pools restarted.

Currently these are:-

  • SPWebConfigModification – its a good practice to apply changes to the web.config files for an IIS Web Application using the SharePoint SPWebConfigModification object. This will ensure that any settings required in the web.config for your solution will be applied. This reduces the likelihood of an issue with the solution due to down to missing configuration. Also if new servers are added to the farm then their web.config files wil be setup correctly. This will save you lots of time if you ever have to do a disaster recovery exercise!
  • SharePoint Field Controls – if your solution includes a new custom SharePoint field. Then this will not be available until all SharePoint servers have been updated. This can be more of a problem when for example you are using something like Telerik’s RadEditor control and are performing an upgrade. When these type of deployments are being made then its best to inform the users that the application will not be available. At least the rest of the SharePoint farm is up and running!

Note: As more issues are found then this section will be updated.

Deployment Script

The final deployment script is as follows:-

  cd c:\install\[solutionname]\[versionnumber];   $curdir=gl;   $solutionname=”SolutionName.wsp”;   Add-SPSolution –LiteralPath $curdir\$solutionname;   Install-SPSolution –Local –Identity $solutionname –GACDeployment –CASPolicies;   $checksolution = Get-SPSolution $solutionname;   $checksolution.LastOperationalDetails;   

 

Finally

I hope that you find this useful and would love to hear about your experience and approach to deployment of SharePoint solutions.

20 Comments

  1. Thanks Simon – looks interesting – I found the page while looking for something else, but have bookmarked it for future reading as I think it could be helpful. I appreciate your taking the time to blog about this!

    Reply

  2. Hi I tried this approach, but when I executes the $checksolution.LastOperationalDetails; it returns inmediatly with value empty. Should I put a loop waiting to a non empty string?

    Reply

    1. Hi Andres,
      Yes will need to put this into a loop to check again for a successful entry. I would suggest that you test the string value looking for a successful message as if the deployment fails for what ever reason (Assembly locked by another process for example) you will get a message in there too.
      You can use other properties of the Solution object such as $checkSolution.IsDeployed but I found them unreliable.

      Please take a look at another blog post, http://blog.ithinksharepoint.com/2012/01/08/powershell-to-detect-web-configuration-modification-jobs that I did. This shows you an approach and also includes a timeout if the job doesn’t run in a particular amount of time. Which can happen if the SharePoint Admin or SharePoint Timer Service aren’t running.

      Reply

  3. This is only for the deployment of new Solutions. When you need to retract a SharePoint solution, won’t it cause an IIS Reset on all WFE Servers?

    Reply

    1. hi is there a reason why you are retracting the solution? if its because you are removing then fine but if you are upgrading then i suggest using update-spsolution instead.
      Saying that if you were to retract a solution you can keep the retraction to just one server at a time by using uninstall-spsolution -local repeat for eaxh server and then use remove-spsolution to delete the solution from the farm. the remove-spsolution doesnt perform an iisreset.
      Regards
      Simon

      Reply

    2. Hi. Same problem on our end. The main reason why one might want to retract the solution is to remove resources that are no longer part of the new version of the solution. Retracting locally works fine. The problem is retracting once again on the second server after the new installation on the first one. There is nothing to retract anymore; even when retracted locally, the state is “not deployed” and the config is gone.

      The show-stopper is to not be able to retract the same solution, for whatever reason, twice on two servers!

      Any ideas?

      Walter

      Reply

      1. Hi Jean-Walter,
        Thanks for taking the time to post, an interesting problem.

        When you say resources do you mean language resource files or other assets like images, css or something like a webpart?
        If they are images, css or javascript files are these resources stored in the site collection or layouts folders?

        Are there any custom event receivers for feature deactivation events which remove resources?

        Regards
        Simon

      2. Hi Simon,

        Thanks for the reply.

        By resource I meant any file deployed by the WSP on the server’s file system. The obvious use case we are facing today is a DLL in the GAC, but it really could be any file.

        As for its location – site collections or layouts – it is both, but it’s not really relevant. The first local retraction, on the first WFE, was done cleanly and resources that were embodied in the site collection were cleaned up. The problem is whatever is left on the other WFE; stale stuff.

        A listener could work. Thank you. But that would imply solution design planning ahead for any possible deletion. It makes for heavy governance for a retraction process that should have all info available in the manifest file of WEPs.

        Speaking of manifest, that’s actually our backup strategy. After the first local retraction, the solution cannot be retracted again on the other servers. However, the solution’s manifest still exists. It’s easy to traverse it and cleanup the solution manually (PS). That is what’s frustrating! It’s impossible to retract the solution again on the WFE, but the info is all there to do it; just not officially and in manner supported by Microfost.

        Regards,
        Walter

  4. Jean-Walter Guillery :

    Hi Simon,

    Thanks for the reply.

    By resource I meant any file deployed by the WSP on the server’s file system. The obvious use case we are facing today is a DLL in the GAC, but it really could be any file.

    As for its location – site collections or layouts – it is both, but it’s not really relevant. The first local retraction, on the first WFE, was done cleanly and resources that were embodied in the site collection were cleaned up. The problem is whatever is left on the other WFE; stale stuff.

    A listener could work. Thank you. But that would imply solution design planning ahead for any possible deletion. It makes for heavy governance for a retraction process that should have all info available in the manifest file of WEPs.

    Speaking of manifest, that’s actually our backup strategy. After the first local retraction, the solution cannot be retracted again on the other servers. However, the solution’s manifest still exists. It’s easy to traverse it and cleanup the solution manually (PS). That is what’s frustrating! It’s impossible to retract the solution again on the WFE, but the info is all there to do it; just not officially and in manner supported by Microfost.

    Regards,
    Walter

    hi Walter,
    Sorry, my last comment wasn’t suggesting any approach, instead trying to understand your solution.
    The approach that I use is to update solutions rather than retract and then deploy.

    Agreed it is difficult to tidy up resources, particularly anything that has been deployed into the site/site collection.
    However, depending on the type of resource there are different ways to upgrade them.
    Are you up issuing assembly version numbers? i have seen that assemblies aren’t being removed if the assembly versions change and they are left in the GAC.

    I am going to do some testing on retracting across multiple servers I am surprised that you cant retract locally without causing the solution to be globally retracted.

    cheers
    Simon

    Reply

  5. Hi Simon,

    Don’t be sorry. Those are excellent points and it’s good to revisit architecture and design.

    For versions, it’s a good observation. It is however not a problem. We don’t actually use version numbers. Considering the rate at which our solution changes throughout our agile methodology, experience showed it’s easier to manage versions with fresh installs on multiple environments.

    As you said it, it is painfully weird to not be able to retract locally without affecting the global configuration… and if it is possible, it sure isn’t broadly documented.

    Walter

    Reply

  6. This is a great article! Thanks! Would you please help me with the following two issues?
    a) One other scenario that Update-SPSolution is not enough is when there are new features in the solution. In this case, in order for the new features to appear in the features list, we need to first use Remove-SPSolution and then Add-SPSolution and Install-SPSolution. Would this work without downtime?
    b) Currently my solution is deployed by running a deployment script only once and not once for each front end. I understand that the operations team will need to change this and from now on deploy the solution separately for each front end. Right?

    Reply

    1. Hi Papadi,
      a) So you don’t need to remove the solution and add it back in again. Instead use Upgrade-SPSolution and then use Install-SPFeature providing the name of the feature you wish to install. You can also use Install-SPFeature -ScanForFeatures (think that is right) which will tell you which features are available but not installed.

      b)
      Yes that’s right you will need ensure that the deployment script is run on each server. Now you may have to make some changes if you are doing more than just upgrade-spsolution. Anyway, create a test environment and prove the process there before trying to production.

      Good luck

      Regards
      Simon

      Reply

      1. a) Thanks, this is not a bad idea. But for the moment I would like to stick with the remove+add solution, since it has been tested and works well in my environment. It seems that it will not affect our goal in this case, which is to install without downtime. Right?

      2. Papadi,
        Well if thats the approach you wish to take, then go for it. I will just say that its not the approach I take unless there are changes to the solution design that require it.
        Like anything make sure you test your deployment to ensure your load balancers are configured correctly and your approach works as you require.

        Make sure you get your load balanc

  7. While this is amazing and a great article why does sharepoint leave out the configuration of a dedicated separate SQL server environment in the 100 percent uptime configuration in every guide or article. What are your suggestions for the SQL configuration in this same scenario that you can reboot sql in a cluster that you have no loss of connectivity. Any
    SQL configuration hints for large scale environments that would prevent loss of connection even on a fail over.

    Reply

    1. Hi John
      Unfortunately you are never going to get 100% uptime. I am looking at implementing SQL Always On at a current client and will post my findings once we have been through the experience. Sorry that I can’t give you any more info at this time.

      Cheers
      Simon

      Reply

    2. My one tip is make sure you use SQL Aliases and also have separate aliases for Admin, Content and Search as these are the areas that you will most likely move to scale out later on in your deployment.

      Cheers
      Simon

      Reply

  8. Everyone, please read Papadi’s and Simon’s threads up above before you decide to pursue this “zero downtime” approach. Understand that if you are not 101% sure that a solution’s structure hasn’t changed you cannot use the update-spsolution method. Also, if you do happen to get the solutions to upgrade you may find yourself later spending a lot more time trying to track down a nebulous issue caused by a change in structure to your solutions that you were unaware of. Also understand that this approach is pretty well abandoning any sort of formal change control process. If you leave users in the system while you perform and upgrade, then find later that something happened and the only way back is to restore a database or a VM snapshot, you (and/or your users) are going to be screwed.

    Speaking from years of experience I would advise you to do a proper, formal maintenance window if at all possible and allow yourself the time to take snapshots, install solutions, test, and then move forward or roll back. There are few shortcuts in life – and SharePoint – that are genuinely worth the risk.

    Reply

    1. Dear Sir Poon,
      Thanks for taking the time to comment but sorry I disagree with your views.

      Firstly, this approach needs to be wrapped up as part of a robust development cycle that means it needs to be tested on system integration / QA and UAT environments before going anywhere near production. If this isn’t happening, it does not matter how you deploy your code you are going to hit unforeseen issues.

      You make a good point that this needs to be wrapped up within a change control process, I totally agree and would not have it any other way. The change request process will require evidence of testing and also measurements of impact so that the risk can be understood.
      There will be times when this approach will not work and I have covered the ones that I have encountered in the article.

      The reason for this article was to show how you can deploy solutions without taking the SharePoint service down. This is something that is important if you are trying to follow Continuous Delivery approaches. I recommend you read Continuous Delivery by Jez Humble and David Farley. http://www.amazon.co.uk/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912 When it comes to it the solution deployment framework is just installing binaries and assets.

      I would like to understand when you cannot use the Upgrade-SPSolution approach, I can only think of one and that’s when you change the scope of the features and include something like a web part which needs to be deployed against one or all of the web applications in the farm.
      Other changes to structure like features being added can be resolved using the Install-SPFeature command. Now, if developers are moving files around then that is going to cause problems regardless of whether you retract / install solutions or perform an upgrade. It really comes back to making sure you test the deployments.

      Whether you roll back or roll forward is a philosophical argument. Personally I don’t like to rollback VMs as the whole farm state has moved forward from when that backup was taken and so you could end up introducing more problems than you are fixing. Rather I would build a new server using my deployment scripts and retire the broken server.

      Finally, in a lot of the businesses I have worked in the size of the environment means that you cannot take the service down for long periods while you do your deployments as the actions to upgrade content take days and taking down the environment for days is out of the question.

      Therefore, and maybe I should add this to the article, I split deployment into two parts.
      1. Get the binaries and assets installed – the point of the article
      2. Perform the post solution framework steps to upgrade content, switch on features etc

      Kind Regards
      Simon

      Reply

Leave a reply to papadi Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.