In simple terms, Full-Text Search relies on the indexing of character data, making it a lot quicker to perform matches against the index than the scans required by the SQL statements that use the LIKE predicate. Full-Text searching is also a lot more flexible than using the LIKE predicate, which only supports the wildcard operator.
Creating a Full-Text Index
-
The first step towards enabling Full-Text Search is to create a virtual storage location or catalog for the full-text index. This is achieved by expanding the Storage node under the database entry in SQL Server Management Studio:
Right click and choose New Full-Text Catalog from the context menu.
Give the catalog a unique name not exceeding 120 characters.
I chose to specify that searches are accent-insensitive, which means that the full text search will see "ole" and "olé" as the same thing.
Having created the catalog, refresh the storage node if needed, then right click to access the context menu and choose Properties:
-
Select the table and columns that you would like to include in the index. Only tables that have a unique index can be included for indexing. Eligible columns include nvarchar, varchar and varbinary data types. In other words, you can index the content of files as well as text:
I have left Track Changes at its default setting of "automatic". This means that the index will be refreshed whenever the content of the selected columns changes. In my case that's fine. I don't post new or updated content very often. You may need to implement dfferent change-tracking strategies based on the nature of your application. If your indexable content is constantly being updated, you may choose to defer the rebuilding of the index to a quiet time such as overnight. If you choose "manual", you must start or schedule this process manually to apply changes to the full-text index.
Once you click OK, the process will start building the full text index.The process may take a while, depending on the nature and quantity of data you want to index.
Full Text Searching
Now that the index has been created, it can be used for searching. The two basic search operators are CONTAINS
and FREETEXT
. The second of these, FREETEXT
, expands search terms to include a variety of noun and verb variations of the search term. So a FREETEXT
search for "search" will also match "searching, searches, searched, searcher" and so on. This may not be what you want and it can return a lot of non-relevant results. The CONTAINS
predicate returns only those results that include exact matches for the search terms. The complementary functions, FREETEXTTABLE
and CONTAINSTABLE
return a KEY
column and a RANK
column, which is particularly useful when attempting to establish relevance of results.
Full Text Searches can be enormously complex, consisting optionally of phrases, wildcards and boolean operators. SQL Server Full Text Search includes a list of "stop words", such as "and", "or", "in" etc in a number of languages. These are ignored in searches. Full Text Search also includes the NEAR
operator, which when used with CONTAINSTABLE
, helps to further define relevance. For example, a CONTAINSTABLE
search for "cat" NEAR "dog" will return a RANK value based on how close "cat" is to "dog" in the searched content. The closer the two words, the higher the value of RANK. The further away, the lower the value. If the two terms are more than 50 words apart, the RANK value will be 0.
Requirement
The search facility offered by this site permits visitors to enter any number of words as a search phrase. The phrase is sanitised so that any punctuation or numbers are removed, leaving just words and spaces. Then the stop words are removed from the phase and the remaining words are used to construct a predicate dynamically, inserting the NEAR
operator between them. Results are returned to the browser in order of relevance (RANK
) in pages of 20 results each.
Entity Framework
As regular visitors might be aware, I use Entity Framework for data access on this site. However, EF doesn't provide any kind of support directly for Full Text Search. There are (at least) a couple of ways around this. One is to use command interception, introduced in EF 6, which enables you to modify the SQL generated by EF before it is executed against the database. Another, more straighforward option is to use stored procedures. That is the option I chose to use.
To start with, I needed a class to represent the data that the stored procedure returns so that Entity Framework can map the data directly to a C# object.
public class SearchResult { public int ArticleId { get; set; } public string Headline { get; set; } public string Abstract { get; set; } public DateTime DateCreated { get; set; } public int Rank { get; set; } public string CategoryName { get; set; } public string ArticleTypeName { get; set; } public string Categories { get; set; } public int TotalRecords { get; set; } }
The properties will be mapped automatically to the return values from the stored procedure which follows:
CREATE PROCEDURE Search @SearchTerm varchar(8000), @CurrentPage int = 1, @PageSize int = 20 AS BEGIN DECLARE @NearPredicate varchar(8000), @AndPredicate varchar(8000), @TotalRecords int SELECT @NearPredicate = COALESCE(@NearPredicate + ' NEAR ', '') + items FROM Split(@SearchTerm, ' ') LEFT JOIN sys.fulltext_system_stopwords ON items = stopword WHERE stopword IS NULL SET @AndPredicate = REPLACE(@NearPredicate, 'NEAR', 'AND') SET @NearPredicate = '(' + @NearPredicate + ')' SET @TotalRecords = ( SELECT COUNT(*) FROM Articles WHERE CONTAINS(*, @AndPredicate ) ) SELECT a.ArticleId, a.Headline, a.Abstract, a.DateCreated, ct.Rank, STUFF ( ( SELECT ',' + c1.CategoryName FROM Articles a1 INNER JOIN ArticleCategories ac1 ON ac1.ArticleID = a1.ArticleID INNER JOIN Categories c1 ON ac1.CategoryID = c1.CategoryID WHERE a.ArticleID = a1.ArticleID ORDER BY c1.CategoryName FOR XML PATH('') ), 1, 1, '' ) AS Categories, at.ArticleTypeName, @TotalRecords AS TotalRecords FROM Articles a INNER JOIN CONTAINSTABLE (Articles, *, @NearPredicate ) AS ct ON a.ArticleId = ct.Key INNER JOIN ArticleTypes at ON at.ArticleTypeID = a.ArticleTypeID ORDER BY ct.RANK DESC OFFSET (@CurrentPage - 1) * @PageSize ROWS FETCH NEXT @PageSize ROWS only END
The procedure requires a bit of explanation. The parameters represent the search phrase, the current page of results (defaulted to 1) and the number of results to return (page size). In the body of the procedure, a couple of local variables are declared. The first represents the predicate that will be passed to the CONTAINSTABLE
function, and the second represents the predicate to be passed to a simpler CONTAINS
statement. The last variable captures the total number of records matching the search criteria.
The first section of code generates the predicate dynamically from the search term. The search term is passed to a user defined function named Split
, which takes the content of the search term and splits it on the character specified in the second parameter (an empty space in this case), returning the resulting individual words as a temporary table. The code for the split function is below:
CREATE FUNCTION Split(@string varchar(8000), @delimiter char(1)) RETURNS @temptable TABLE(nameIndex int identity(1,1), items varchar(8000)) AS BEGIN DECLARE @index int, @slice varchar(8000) SELECT @index = 1 IF LEN(@string) < 1 or @string IS NULL RETURN WHILE @index != 0 BEGIN SET @index = CHARINDEX(@delimiter, @string) IF @index != 0 SET @slice = LEFT(@string, @index-1) ELSE SET @slice = @string IF(LEN(@slice) > 0) INSERT INTO @temptable (items) VALUES (@slice) SET @string = RIGHT(@string, LEN(@string) - @index) IF LEN(@string) = 0 BREAK END RETURN END
This temporary table is left-joined on to the system table containing the stop words, and only those that don't exist in the stop words table are selected. They are then subjected to a COALESCE
function that joins the resulting words using ' NEAR '
to generate a valid predicate for the CONTAINSTABLE
function. If, for example, the search term was "entity framework in mvc", the resulting value of the @NearPredicate
variable will be "(entity NEAR framework NEAR mvc)". The word "in" will have been removed as it appears in the stop words table. The @AndPredicate
variable is generated from the @NearPredicate
variable, with the NEAR's being replaced with ANDs, resulting (using the previous example) in a value of "entity AND framework AND mvc".
The @AndPredicate
variable is used in a relatively simple SELECT COUNT
query, which returns the total number of items that contain "entity" and "framework" and "mvc". Finally, the actual results are returned in the last block - which is not as complicated as it might first appear.
A visual depiction of the schema can be found in a previous article - ASP.NET MVC, Entity Framework, Modifying One-to-Many and Many-to-Many Relationships. The first few fields selected are from the Articles table. Then the Rank column is selected from the table returned by the CONTAINSTABLE
function (which is aliased as "ct"). There is a many-to-many relationship between Articles and Categories in my database, which is managed by a union table called ArticleCategories. I wanted to return all the categories that an article is related to as a comma separated value. I have use the FOR XML
clause to return the data and join it using a comma. Then I have used the STUFF
function to replace the leading comma with an empty space. The TotalRecords value is also included in the result set, and the results are ordered by rank, with the most relevant (or highest) first. Finally, OFFSET
and FETCH
(introduced in SQL Server 2012) are used to grab the correct "page" of data.
Client Code
The service layer that calls the stored procedure illustrates how to use the Database.SqlQuery<T> method to execute a stored procedure with Entity Framework and pass the result to a strongly typed collection:
public class SearchService : ISearchService { public List<SearchResult> Search(string searchTerm, int? page) { using (DotnettingContext _context = new DotnettingContext()) { var param1 = new SqlParameter("@SearchTerm", searchTerm); var param2 = new SqlParameter("@CurrentPage", page); var result = _context.Database.SqlQuery<SearchResult>("Search @SearchTerm, @CurrentPage", param1, param2).ToList(); return result; } } }
This is called from within the controller to help populate a ViewModel:
public ActionResult Index(string searchTerm, int? page) { page = page ?? 1; searchTerm = Regex.Replace(searchTerm, @"[^\sa-zA-Z]", string.Empty).Trim(); var data = _service.Search(searchTerm, page); var model = new SearchResultViewModel(); model.PageTitle = "Search"; model.KeyWords = searchTerm; model.Results = data; model.CurrentPage = page.Value; return View(model); }
Summary
This article provied a very basic introduction to Full Text Search, and showed how to create a Full Text Search index from witin SQL Server Management Studio. It then discussed some basic queries and showed how to create a stored procedure to generate paged results from a Full Text Search. Finally it showed how to use the Database.SqlQuery
method of the Entity Framework DbContext
to execute the stored procedure and to pass the result back into C# code.