A better way to search

Anyone who has implemented Search knows all about the fun of setting up your Document Library content types as searchable meta-data. That step alone can be mind numbing, and once you have it done it always seems like one or two columns never show up for the mapping to be made until a few days later. No matter how many times you run full and incremental scans. It's like it has a mind of it own.

And then begins the fun of customizing the search page to make the information easily searchable. The default methods leave alot to be desired in the way of user-friendly.

So....we did something a bit different.

Background:


We have a group that scans information into scanners on a daily basis. This information is stored in PDF format and sent to a specific document library in Sharepoint. Once the information has been placed into the library a workflow item kicks off informing the group that they need to fill out the fields (meta-data) for the document and move it to the completed folder.


Issue:


The whole point of the meta-data (fields they fill out) was to make content easy to find. That's all well and good using Search but with 15+ columns ranging from date/time to drop-down lists to radio buttons it just never really looked right on the search page. Not to mention you have to wait for incremental crawls to populate the data so the search was never updated very efficiently (we have a 1 TB DB in total, it takes a bit of time). They needed to search this data as soon as it was made available, so we had to improvise.


Solution:


We decided to leverage our integrated SQL Reporting Server and build a custom RDL report that had paramter form data they could fill in and then search directly off the content database library meta data.


We were able to provide links to the physical document and build a dynamic query in such a manner as fit their requirements. To boot, the entire thing worked in real-time mode. This meant that as a document was scanned, the meta data completed, and dropped into the completed folder a search could be conducted at that very moment against the data.


Where to begin:

The trick was the Content database table and knowing where to look. This was the first thing we had to find out. As you create content types and associate them with a library (or columns in a Document Library) the field names you use don't really map well into the database. In fact, they don't map at all to any direct table column name due to the dynamic nature of using them in the first place.

** NOTE: Even though we did directly access the tables for our content database we created a specific read-only account for this purpose to avoid any issues. I do not recommend doing this without safeguarding the data with read-only accounts and backups. **

So we begin by going to the Content database on the Sharepoint server. In our case lets call it WSS_Content_Production. Once we navigated to the tables for our content database we expanded them and looked in AllUserData. Essentially this is the table that contains the guts to all of your documents, etc (site column name data included) in your site collection. You will have to identify the specific List and Site ID for the library in question (or list I suppose). These fields are called tp_ListID and tp_SiteID. As all of your sites and lists for sites will be in this table it is important that you grab the correct one.

Next you need to identify the site columns that hold your meta-data. This part wasn't fun. Essentially this table has a ton of custom rows that contain this data. How it stores the data isn't all that obvious either. For us it was trial and error to identify the approriate columns from our library to exactly what columns the actual data was stored in at the table.

The field names start out as nvarchar, ntext, int, float, datetime, bit, etc. followed by a number (1-64 or less). I have yet to determine the exact sequence of storing data in these fields and how it exactly maps to the columns in the library/list but there is some pattern to it. The fun part will be the multiple option drop down columns and so forth. Anytime you use a drop down or such column in a library the data is stored in a single field with a delimiter of <;#>. You will need to keep that in mind when you build your queries.



So , once we had all the data tables identified we proceede to write our query.
We tried inline SQL but it simply wouldn't give us the freedom we wanted (because, of course, they may only fill out portions of data or even incorrect data). We needed to be able to search for partial matches, date ranges and a wide variety of patterns. We opted for a Stored Procedure and a dynamic query.

Essentially we identified all the column names as parameters to a Stored Procedure. When called, the parameters would be passed, the query would then be built based off of logic in the procedure according to incoming data.

Something like this partial code below:

SELECT @SQLv = 'SELECT tp_DirName AS Directory, tp_leafName AS Filename, nvarchar10 AS FleetType, ntext2 AS Other, nvarchar15 AS ''LogPage Number'', nvarchar13 AS ''Tail Number'', nvarchar14 AS ''Page Version'' , nvarchar11 AS ''Stat Arr'' , nvarchar12 AS ''Stat Dept'' , datetime1 AS ''Start Date'', datetime3 AS ''End Date'' ,bit2 AS ServiceCheck ,bit3 AS SecurityCheck ,bit4 AS PartTag ,bit5 AS WeeklyCheck, ''http://astarintraweb/'' + tp_DirName + ''/'' + tp_LeafName AS URLName FROM AllUserData WHERE 1=1 AND tp_DirName LIKE ''%Completed%'''
IF @Tail <> ' '
SELECT @SQLv = @SQLv + 'AND nvarchar13 = ''' + @Tail + ''''

IF @PageNum <> ' '
SELECT @SQLv = @SQLv + 'AND nvarchar15 LIKE ''%' + @PageNum + '%'''
...
SELECT @SQLv = @SQLv + ' ORDER BY nvarchar10, datetime1, datetime3'
EXEC (@SQLv)
END

Once we had the SP tested we wimply wrapped it into an integrated SQL Server Report which we published to the site and provided a quick link to. So now when anyone wanted to search data in the library they simply fill out the report form data , hit Apply, and they have results right away. To boot they can use all of the features of integrated reporting to print to PDF, Excel, etc as well as scheduling of automated report runs.
All in all it works perfectly and the business loves it....

Comments

Popular posts from this blog

SharePoint Timesheet Solution - Part 1

Well Hello...

Missing Download as a Snapshot, so am I!