Archive for April, 2008

>InfoPath and BOMs (Byte Order Mark)

April 30, 2008 Leave a comment

>Today I had an issue with an application that I wrote. The application makes heavy use of InfoPath forms and routes them around to people via Windows Workflow Foundation. The whole application is written on MOSS of course. The application has been extensively tested and I have never seen the error that has occured before.

Part of the solution goes through an approval stage and once that stage has been completed the form is moved over to another document library in a separate site. For some reason this failed, the error was pretty cryptic “Unknown Server Error Number : D:” showed up in my exception handling.

I have some theories on what happened but first of all I wanted to get the solution working again and on to the next stage so that people could get using it again. So I though no worries I’ll just copy the InfoPath from the one library to another. So I set to work and used the built in move content functionality.

However when I looked at the file in its new destination it didnt look like an InfoPath form just a normal xml file. The workflow had automatically started but displayed that horrible “Error Occured” I looked at the logs and could see that when the workflow was trying to parse the form it threw an error unable to find BOM (Byte Order Mark)……..

So the SharePoint move/copy function seems to convert the xml file to another format which is not the same used by InfoPath.

I thought well I better write a little utility which uploads this file with the right encoding. So taking another utility that I wrote I sent to work. Now at the moment I dont have the code to hand and will update this post when I go, I might even upload the utility though it should have quite a few caveats of use attached to it.

Basically the code does this:-
Start the utility with site url with sub-web url, give it a list name, file name and tell it what the source encoding is and the target encoding.

The code snippet for uploading and encoding is the following:-

class InfoPathFileUploader : System.Object
private string m_sSPPath = “”;
private string m_sListName = “”;
private string m_sFilePath = “”;
private System.Text.Encoding m_encodeSource = Encoding.UTF8;
private System.Text.Encoding m_encodeTarget = Encoding.Unicode;

internal InfoPathFileUploader(string sSite, string sList, string sFilePath)
m_sSPPath = sSite;
m_sListName = sList;
m_sFilePath = sFilePath;

internal InfoPathFileUploader(string sSite, string sList, string sFilePath, System.Text.Encoding source, System.Text.Encoding target)
m_sSPPath = sSite;
m_sListName = sList;
m_sFilePath = sFilePath;
m_encodeSource = source;
m_encodeTarget = target;

public bool DoUpload(bool bOverwrite)
bool bSuccess = false;
Trace.TraceInformation(“Starting Upload (Overwrite:” + bOverwrite.ToString() + “)”);
bool m_bOverwrite = bOverwrite;
List listErrors = new List();
Uri uriSite = new Uri(m_sSPPath);
using (SPSite site = new SPSite(uriSite.OriginalString))
using (SPWeb web = site.OpenWeb(uriSite.LocalPath))
if (web.Exists)
SPList listInfoPath = web.Lists[m_sListName];
if (listInfoPath != null)
FileStream fileInfoPath;
SPFile spFileInfoPath;
byte[] filecontents;
byte[] fileunicodecontents;
string sFileUrl = “”;
string sFileName = “”;

//reset all variables
sFileName = “”;
sFileUrl = “”;

Trace.TraceInformation(“Reading File: ” + m_sFilePath);
fileInfoPath = System.IO.File.OpenRead(m_sFilePath);
filecontents = new byte[fileInfoPath.Length];
fileInfoPath.Read(filecontents, 0, (int)fileInfoPath.Length);
Trace.TraceInformation(“Converting File: ” + m_sFilePath);
sFileName = System.IO.Path.GetFileName(fileInfoPath.Name);

if (fileInfoPath == null)
throw new System.IO.FileLoadException(“Failed to read file contents into buffer”);
//check to see if destinationfolder url has a leading /
if (!listInfoPath.RootFolder.ServerRelativeUrl.StartsWith(“/”))
sFileUrl = “/”;
sFileUrl = sFileUrl + listInfoPath.RootFolder.ServerRelativeUrl + “/” + sFileName;

Trace.TraceInformation(“Uploading File to Sharepoint: ” + sFileUrl);
spFileInfoPath = listInfoPath.RootFolder.Files.Add(sFileUrl, fileunicodecontents, m_bOverwrite);

if (!spFileInfoPath.Exists)
throw new System.IO.FileNotFoundException(“File failed to load into SharePoint”, sFileName);
bSuccess = true;
spFileInfoPath.Item["Title"] = sFileName;
Trace.TraceInformation(“Successful Upload: ” + sFileUrl);
catch (System.IO.FileNotFoundException fileex)
Trace.TraceError(“FileException during upload: ” + fileex.Message + ” ” + fileex.StackTrace);
listErrors.Add(“Upload Failed (FileNotFound): ” + fileex.FileName);
catch (System.IO.DirectoryNotFoundException direex)
Trace.TraceError(“DirectoryException during upload: ” + direex.Message + ” ” + direex.StackTrace);
listErrors.Add(“Upload Failed (DirectoryNotFound): ” + m_sFilePath);
catch (Microsoft.SharePoint.SPException spex)
Trace.TraceError(“SPException during upload: ” + spex.Message + ” ” + spex.StackTrace);
listErrors.Add(“SPException: ” + spex.Message + ” ” + sFileName);

//display if any errors occured
if (listErrors.Count >= 1)
foreach (string error in listErrors)
Trace.TraceError(“Errors whilst uploading: ” + error);
catch (ApplicationException appex)
throw appex;
catch (Exception ex)
throw ex;
return bSuccess;


The utility runs and using the System.IO.File.OpenRead() opens the file Reads the file into a byte[] then the clever bit. The code uses System.Encoding.Convert(Encoding.UTF-8, Encoding.Unicode, byte[]) and outputs the converted byte[].

This converted byte[] is then used and uploaded into sharepoint. The first time I ran it the file uploaded and after checking the properties and adding in the url to the infopath form template and file type (this is from memory). The form library reckonised the InfoPath form.

I restarted the workflow and it started working.

The encoding adds on to the front of the file the Byte Order Mark which seems to be striped by SharePoint when moving the file around. The file is also converted into UTF-8 when InfoPath uses UTF-16.

Updated (02 May 2008)
A little update to the utility when I originally ran this yesterday to fix the error it all seemed to be working well. My solution uses another InfoPath form which references these InfoPath forms reading out of it the Title property. The utility does not update the Title property. The only way that I could update the Title property was using SPFile.Item["Title"]=”Title” and doing a SPFile.Item.SystemUpdate() a normal SPFile.Item.Update() does not update the Title property.

Further to this if the utility is run against the file again once its been converted from UTF-8 to Unicode then it will corrupt the file. I will be looking at updating the code to try and detect the Byte Order Mark and make an informed decision on the encoding conversion.

Like I say I will post the utility but hopefully this will help someone else if they run into the problem.


Categories: Uncategorized

>A word breaker was not found for the given language error when crawling content.

April 20, 2008 Leave a comment

>The problem:
When a search crawl occurs an error is put in the crawl log.

“A word breaker was not found for the given language. Check your current language settings and ensure that search supports the current language. If the problem persists, reinstall search.”

The fix can be found here:-

Ah well found the fix and will document it here in case anyone else has a problem.

So the word breaker files are all found in the registry here:-

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\LanguageResources\Default

Under the registry key for each of the languages there is a value called

StemmerDLLPath and WBDLLPathOverride

For the English (United Kingdom) and English (United Status) languages these values were set to the following 8-3 path names:


So at the command prompt I typed in the following:-

dir /x C:\PROGRA~1\MICROS~2\12.0\Bin

Guess what the following line appeared:-

07/04/2008 13:27 3,976,520 NATURA~2.DLL naturallanguage6.dll

So updated the registry entry with NATURA~1.dll to NATURA~2.dll and restarted the Office and Windows Search. Performed a Full Crawl and finally a number of successful crawls entered the crawl logs.


This dll seems to handle a number of Languages including French, German, Dutch, Hebrew, Spanish and Italian.

Basically this issue happened on our QA environment when we were doing some testing. We racked our brains as to what changes have been made and couldnt think of a reason. Then after the problem was fixed we realised that the machine has been restored using TrueImage and this must have been the cause of the change in 8-3 filename!

Categories: Uncategorized
%d bloggers like this: