Archive

Author Archive

How to: SharePoint Search PDF Document Preview using Acrobat

June 12, 2014 Leave a comment

Introduction

This is what we are looking to build, a SharePoint 2013 search experience where PDFs can be previewed within Search.

pdf_preview_demo

 

On a recent project I built a solution which made extensive use of the SharePoint 2013 search features. Microsoft have been pushing building solutions with SharePoint 2013 and I have to say it does give a great experience in terms of performance and functionality.

One of the main reasons for selecting search based solution is that it scales and this solution needed to provide quick access to documents. The complexity was that due to the nature of the documents they each had unique security permissions. The solution stored a few hundred thousand documents, this would have been a killer for performance if using traditional data querying methods.

95% of the documents were PDFs and I thought it would be cool if we could provide a preview of the PDF in the search results. Unfortunately we didn’t have Office Web Apps and so I started to think about how I could build something similar. The preview would work in a similar way to the search visualisation for Office when Office Web Apps is available.

At the time Office Web Apps wasn’t available and so we could use its ability to view PDFs.

One of the important goals was to ensure that the solution would show the PDF preview window in the hover panel but would also fail gracefully when a user didn’t have the necessary components installed on their machine.

 

The solution relies on the following:-

  • SharePoint 2013
  • SharePoint Search Centre
  • Custom Display Template
  • Acrobat Reader / Acrobat
  • Acrobat Reader / Acrobat’s Web Browser Browser Plugin
  • the PDF Object JavaScript library

 

Adobe Acrobat Reader Pre-requisite Configuration

As mentioned we need a few pre-requisites for this solution to work.

The user must have Acrobat or Acrobat Reader installed and configured to use the PDF Web Browser plugin. You can see my previous post, which explains how to enable the PDF Web Browser plugin.

http://blog.ithinksharepoint.com/2011/02/21/open-pdf-files-from-a-document-library-in-a-new-window/

Adobe Acrobat Reader can be downloaded from: https://get.adobe.com/uk/reader/

 

SharePoint 2013 Setup

So the SharePoint 2013 Search configuration requires the following:-

  • custom Search Item Display Template
  • custom Search Hover Panel Display Template
  • configured Search Result Type

The Search Result type is required to actually configure search to use our custom display templates to render PDF results with our preview hover panel.

We are expecting that you are using a search centre for the sake of this blog post but the solution could be deployed into any site collection.

 

Setting up display templates

Lets set this thing up. We will need to have a SharePoint 2013 Visual Studio project created so that we can put the display templates into source control, plus we need to be able to deploy them too!

  • Start Visual Studio 2012 / Visual Studio 2013
  • Create a new project using the SharePoint 2013 Empty Project
  • Give it a suitable name and location

pdf_vs_newproject

  • Provide the URL to the SharePoint search centre site collection that will be used for development.
  • Choose a Sandbox Solution
  • Click Create

Once we have a Visual Studio project setup we need to create our display templates. The best way to learn is to take an existing display template and hover panel and make a new set. Display Templates are found in the master page document library. This can be found in a site collection’s root web, for example. http://sharepoint/search/_catalogs/masterpages.

Within the Master Pages document library is a sub folder called DisplayTemplates, this is where the Display Templates for the site collection are stored. I suggest that you create your own folder within this so that you can differentiate your changes from the out of the box display templates.

The deployment of display templates is a little tricky because they are made up of two parts. A display template is made up of an html file which SharePoint parses and creates a JavaScript representation of automatically. This occurs each time the .html file is saved.

Unfortunately if you deploy a Display Template via a feature the JavaScript representation is not created automatically. To get around this, I used the following process:-

As we mentioned earlier, we will create our display templates by copying an existing display template, renaming and modifying. This is achieved by the following:-

  • Browse to your SharePoint site.
  • Click on the cog icon in the top right hand section of the page. Click on Site Settings.
  • Click on display page layouts and master pages Pages link under “Look and Feel”.
  • Browse to the Display Templates folder
  • Download the following html display templates
  • Item_PDF.html and Item_PDF_HoverPanel.html

Now that we have a copy of the display templates, we need to add them to our Visual Studio project Create a new feature in Visual Studio by clicking on the Features node in the Solution Explorer window and choosing new feature.

  • Rename the feature folder to an appropriate name such as:-
    • PDFPreviewTemplateWithAcrobatimport
  • Open the new feature and fill in the feature’s information.

pdf_vs_featuredef

Next we will add a SharePoint Project Item to deploy the Display Templates.

  • Right-Click on the project and choose new item
  • From within the SharePoint project items choose the Module project item and call it something appropriate such as “PDFDisplayTemplates”
  • Drag the files into the newly created Project Item

If you cannot drag them then you are likely running Visual Studio as Administrator and you will need to copy the files into the folder created for PDFDisplayTemplates project item and then use the View All Files icon (found at the top of solution explorer) and add them by right clicking on each item and choosing “Include in Project” from the menu.

  • Next, rename the files to something appropriate, like this
    • item_pdf.html –> item_pdf_acrobat.html
    • item_pdf_hoverpanel.html –> item_pdf_acrobat_hoverpanel.html

Once you have uploaded your files into Visual Studio then we need to edit the files and configure Visual Studio to deploy the files as part of the solution by changing the file properties in Visual Studio solution explorer to change your deployment type from “No Deployment” to “Element File”.

We also need to update the display templates so that the name and description allows a user to identify the template. These descritions are modified using the metadata for the display template which isactually provided by a set of headers in the .html file.

Please see the screenshots below, we have modified the name and description to:-

  • Name: PDF Item with Acrobat
  • Description: Displays a result tailored for a Portable Document Format (PDF) document and displays a preview using Adobe Acrobat.

 

Before:-

pdfitem_properties_before

After:-

pdfitem_properties_after

The last step is to add the PDFObject.js file to Visual Studio so that its deployed with the solution. We could do this in a number of ways, I am going to include it within the PDF Display Template feature so that we only have to activate the one feature.

 

Finally I am going to clean up the the feature’s element.xml file so that it is deployed into the correct location within SharePoint, if you remember it should be deployed into the masterpage document library.

So here is the element.xml before:-

pdffeature_elementsbefore

and here is the element.xml after:-

pdffeature_elements_after

Once you have updated your display template’s name then they are ready to be deployed to SharePoint via the visual studio solution. Install your solution and activate the feature that was created.

 

Modifying the display templates to show PDFs

Hopefully now you have the display templates uploaded into SharePoint, though they are just copies of an existing display templates, so we’ll need to update the display template with the code to show PDFs.

Next we need to modify the contents of the display template, I have already create updated versions which you can download but to lets give you an overview of what I have changed.

 

Item_Pdf_Acrobat.html

This has been modified so that the line reading

var hoverUrl = "~sitecollection/_catalogs/masterpage/Display Templates/Search/Item_PDF_HoverPanel.js";

is updated to the use are version of the hover panel.

var hoverUrl = "~sitecollection/_catalogs/masterpage/Display Templates/Search/item_pdf_acrobat_hoverpanel.js";

Item_Pdf_Acrobat_HoverPanel.html

This has also been modified to update the name and description metadata elements. The entire html has been replaced and rather than explain all the changes, download the solution zip file and take a look at the file within Visual Studio.

The standout piece is the following section, which is called after the display template is rendered using the AddPostRenderCallback() function. This checks to see if the file is a pdf, and if it is will try and create the Acrobat Web Viewer window, if it fails for whatever reason the preview div is hidden.

<!--#_
AddPostRenderCallback(ctx, function()
{
var csrId = ctx.CurrentItem.csr_id;

if(ctx.CurrentItem.SecondaryFileExtension=="pdf")
{
try
{
var employeePDFDocument = new PDFObject({
url: $urlHtmlEncode(ctx.CurrentItem.OriginalPath),
pdfOpenParams: {
navpanes: 0,
toolbar: 0,
statusbar: 0,
view: "FitV"
}

}).embed($htmlEncode(id + HP.ids.viewer));
}
catch(e)
{
var viewDivId = $htmlEncode(id + HP.ids.viewer);
var viewDiv = document.getElementById(viewDivId);
if(viewDiv!=null)
{
viewDiv.style.display = 'none';
}

}
}

});

_#-->

Next the display templates need to be updated, I will show you the following approach which is the way that Microsoft recommend when developing display templates.

  • browse to the master page catalogue using http://sharepoint/search/_layouts/15/masterpages
  • click on the library tab in the ribbon toolbar
  • click on open in windows explorer
    • if you get an error make sure that IE is setup to include the web site in the Local Intranet zone and also if you are developing on Windows Server 2008/2012 make sure you have the Desktop Experience feature installed.
  • Once you have the windows explorer folder open you can now copy and paste the files from Visual Studio into Display Templates folder. SharePoint will take care of updating the JavaScript representation of your html file

Now we have our updated display templates we need to configure them to be used!

 

Configure search result: Tell SharePoint about the display templates

The approach to configuring search to use our display templates is by setting up a result type. Search Result types are the new Search Scopes but with SharePoint 2013 they also allow you to specify how each search result is displayed. This is really powerful and one of my favourite features of SharePoint 2013!

This feature allows you to have a list of search results and if a search result for example had a custom content type with custom metadata that result could be displayed differently to show that additional information.

Anyway, on to setting up the custom search result type:-

  • browse to your SharePoint search centre site collection
  • click on the cog icon and then choose site settings
  • Click Search Result Types (under Site Collection Administrator heading)
  • Click new search result type
    • Name: PDF using Acrobat
    • Sources: All Sources
    • Type of Content: PDF
    • Display Template: PDF using Acrobat
    • Tick optimize for frequent use

Lets try it out!

Deploy the solution to your SharePoint environment and add some PDF content to your SharePoint sites. Perform an incremental crawl or wait for your continuous crawl to pick up the content.

If you perform a search for  pdf, you should get some search results with the content uploaded, as you hover over the PDF search result you should get a preview of the PDF!

pdf_preview_demo

 

Troubleshooting the solution

Then the solution was being built then there were a couple of problems that I had. To be honest most of the issues were down to the Adobe Acrobat configuration.

Please take a look at my post Opening PDFs in a New Window and the section

 

Solution Files and resources

The Visual Studio Project, PowerShell script and Display Templates can be found here:

ITSP.SP.PDFSearchPreviewWithAcrobat.zip

 

 

PowerShell: Deleting SharePoint List Items

June 5, 2014 Leave a comment

Introduction

Whilst I love SharePoint Workflows and how versatile they can be, they can generate quite a bit of data. Well mine do as I like to log plenty of information so that the support / admin teams can find out what’s going on with the workflow.

Unfortunately when you log plenty of information this means that the workflow history list can get quite large.

One of the workflows that we built over a ten month period has processed a couple of hundred thousand list items and has created about 3 million list items in the workflow history list.

We wanted to clear down this list and so PowerShell came to the rescue.

Solution

We built the following PowerShell script which you provide the following parameters:0

  • Url – Url of web hosting the workflow history list
  • AgeOfItemsToDelete – days of logs that you wish to keep
  • ListName – the display name of the workflow history list
  • NumberOfItemsInBatch – the number of items that should be returned in each query.

The original script looked like this:-

param
(
	[Parameter(Mandatory=$false, HelpMessage='System Host Url')]
	[string]$Url = "http://sharepoint",
	[Parameter(Mandatory=$false, HelpMessage='List Name')]
	[string]$ListName = "Workflow Tasks",
	[Parameter(Mandatory=$false, HelpMessage='Age of items in list to keep (number of days).')]
	[int]$AgeOfItemsToKeepInDays = 365,
	[Parameter(Mandatory=$false, HelpMessage='What size batch should we delete the items in?')]
	[int]$NumberOfItemsToDeleteInBatch = 1000
	
)

$assignmentCollection = Start-SPAssignment -Global;

$rootWeb=Get-SPWeb $Url -AssignmentCollection $assignmentCollection;

$listToProcess = $rootWeb.Lists.TryGetList($ListName);
if($listToProcess -ne $null)
{
	$startTime = [DateTime]::Now;
	$numberOfDaysToDelete = [TimeSpan]::FromDays($AgeOfItemsToKeepInDays);
	$deleteItemsOlderThanDate = [DateTime]::Now.Subtract($numberOfDaysToDelete);
	$isoDeleteItemsOlderThanDate = [Microsoft.SharePoint.Utilities.SPUtility]::CreateISO8601DateTimeFromSystemDateTime($deleteItemsOlderThanDate);
	$numberOfItemsToRetrieve = $NumberOfItemsToDeleteInBatch;
	
	$camlQueryString = [String]::Format("<Where><Leq><FieldRef Name='Modified' /><Value IncludeTimeValue='TRUE' Type='DateTime'>{0}</Value></Leq></Where>", $isoDeleteItemsOlderThanDate);
	$camlQuery = New-Object -TypeName "Microsoft.SharePoint.SPQuery" -ArgumentList @($listToProcess.DefaultView);
	$camlQuery.Query=$camlQueryString;
	$camlQuery.RowLimit=$numberOfItemsToRetrieve;
	
	$deletedItemCount=0;
	
	do
	{
		$camlResults = [Microsoft.SharePoint.SPListItemCollection] $listToProcess.GetItems($camlQuery);
		$itemsCountReturnedByQuery = $camlResults.Count;
		Write-Host "Executed Query and found " $camlResults.Count " Items";
		
		$listItemDataTable = [System.Data.DataTable]$camlResults.GetDataTable();
		foreach($listItemRow in $listItemDataTable.Rows)
		{
			$listItemIdToDelete = $listItemRow["ID"];
			$listItemModifiedDate = $listItemRow["Modified"];
			Write-Host "Deleting Item $listItemIdToDelete - Modified $listItemModifiedDate";
			$listItemToDelete = $listToProcess.GetItemById($listItemIdToDelete);
			$listItemToDelete.Delete();
			$deletedItemCount++;
		}
	}
	while($itemsCountReturnedByQuery -gt 0)
	
	$totalSecondsTaken = [DateTime]::Now.Subtract($startTime).TotalSeconds;
	Write-Host -ForegroundColor Green "Processing took $totalSecondsTaken seconds to delete $deletedItemCount Item(s).";
}
else
{
	Write-Host "Cannot find list: " $ListName;
}

Stop-SPAssignment -Global -AssignmentCollection $assignmentCollection;

Write-Host "Finished";

However, whilst this worked ok for a list that was quite small. When we went to use it on the Production environment it performed like a dog. Fortunately the script was run out of hours so didn’t impact the environment too much. Though the memory that it consumed was quite large (4GB) after deleting the second item.

There was something seriously wrong with approach being taken, so after a bit of investigation it was obvious what was going on.

Look at the script again, there is a line of code that is:-

$listToProcess.Items.DeleteItemById($listItemIdToDelete);

Well it turns out that this call, updates the collection after the DeleteItemById function is called. So we made a small modification and the offensive line became:-

$listItemToDelete = $listToProcess.GetItemById($listItemIdToDelete);
$listItemToDelete.Delete();

This change meant that the PowerShell session now only consumed 270Mb (I say only!) and memory usage did not rise. The deletion of the items was much quicker too, probably by a few 1000%!

Here is the final script for completeness.

param
(
[Parameter(Mandatory=$false, HelpMessage='System Host Url')]
[string]$Url = "<a href="http://sharepoint&quot;">http://sharepoint"</a>,
[Parameter(Mandatory=$false, HelpMessage='List Name')]
[string]$ListName = "Workflow Tasks",
[Parameter(Mandatory=$false, HelpMessage='Age of items in list to keep (number of days).')]
[int]$AgeOfItemsToKeepInDays = 365,
[Parameter(Mandatory=$false, HelpMessage='What size batch should we delete the items in?')]
[int]$NumberOfItemsToDeleteInBatch = 1000

)

$assignmentCollection = Start-SPAssignment -Global;

$rootWeb=Get-SPWeb $Url -AssignmentCollection $assignmentCollection;

$listToProcess = $rootWeb.Lists.TryGetList($ListName);
if($listToProcess -ne $null)
{
$startTime = [DateTime]::Now;
$numberOfDaysToDelete = [TimeSpan]::FromDays($AgeOfItemsToKeepInDays);
$deleteItemsOlderThanDate = [DateTime]::Now.Subtract($numberOfDaysToDelete);
$isoDeleteItemsOlderThanDate = [Microsoft.SharePoint.Utilities.SPUtility]::CreateISO8601DateTimeFromSystemDateTime($deleteItemsOlderThanDate);
$numberOfItemsToRetrieve = $NumberOfItemsToDeleteInBatch;

$camlQueryString = [String]::Format("&lt;Where&gt;&lt;Leq&gt;&lt;FieldRef Name='Modified' /&gt;&lt;Value IncludeTimeValue='TRUE' Type='DateTime'&gt;{0}&lt;/Value&gt;&lt;/Leq&gt;&lt;/Where&gt;", $isoDeleteItemsOlderThanDate);
$camlQuery = New-Object -TypeName "Microsoft.SharePoint.SPQuery" -ArgumentList @($listToProcess.DefaultView);
$camlQuery.Query=$camlQueryString;
$camlQuery.RowLimit=$numberOfItemsToRetrieve;

$deletedItemCount=0;

do
{
$camlResults = [Microsoft.SharePoint.SPListItemCollection] $listToProcess.GetItems($camlQuery);
$itemsCountReturnedByQuery = $camlResults.Count;
Write-Host "Executed Query and found " $camlResults.Count " Items";

$listItemDataTable = [System.Data.DataTable]$camlResults.GetDataTable();
foreach($listItemRow in $listItemDataTable.Rows)
{
$listItemIdToDelete = $listItemRow["ID"];
$listItemModifiedDate = $listItemRow["Modified"];
Write-Host "Deleting Item $listItemIdToDelete - Modified $listItemModifiedDate";
$listItemToDelete = $listToProcess.GetItemById($listItemIdToDelete);
$listItemToDelete.Delete();
$deletedItemCount++;
}
}
while($itemsCountReturnedByQuery -gt 0)

$totalSecondsTaken = [DateTime]::Now.Subtract($startTime).TotalSeconds;
Write-Host -ForegroundColor Green "Processing took $totalSecondsTaken seconds to delete $deletedItemCount Item(s).";
}
else
{
Write-Host "Cannot find list: " $ListName;
}

Stop-SPAssignment -Global -AssignmentCollection $assignmentCollection;

Write-Host "Finished";

Hope that helps someone who has the same problem. Please let me know if you have an alternative solution!

Links to the scripts:-

Delete-ListItemsOlderThan-Slow.txt

Delete-ListItemsOlderThanV2.txt

%d bloggers like this: