How To Do Hit-highlighting Of Results From A SQL Server Full-text Query
We have a web application that uses SQL Server 2008 as the database. Our users are able to do full-text searches on particular columns in the database. SQL Server's full-text functionality does not seem to provide support for hit highlighting. Do we need to build this ourselves or is there perhaps some library or knowledge around on how to do this?
BTW the application is written in C# so a .Net solution would be ideal but not necessary as we could translate.
Expanding on Ishmael's idea, it's not the final solution, but I think it's a good way to start.
Firstly we need to get the list of words that have been retrieved with the full-text engine:
declare @SearchPattern nvarchar(1000) = 'FORMSOF (INFLECTIONAL, " ' + @SearchString + ' ")' declare @SearchWords table (Word varchar(100), Expansion_type int) insert into @SearchWords select distinct display_term, expansion_type from sys.dm_fts_parser(@SearchPattern, 1033, 0, 0) where special_term = 'Exact Match'
There is already quite a lot one can expand on, for example the search pattern is quite basic; also there are probably better ways to filter out the words you don't need, but it least it gives you a list of stem words etc. that would be matched by full-text search.
After you get the results you need, you can use RegEx to parse through the result set (or preferably only a subset to speed it up, although I haven't yet figured out a good way to do so). For this I simply use two while loops and a bunch of temporary table and variables:
declare @FinalResults table while (select COUNT(*) from @PrelimResults) > 0 begin select top 1 @CurrID = [UID], @Text = Text from @PrelimResults declare @TextLength int = LEN(@Text ) declare @IndexOfDot int = CHARINDEX('.', REVERSE(@Text ), @TextLength - dbo.RegExIndexOf(@Text, '\b' + @FirstSearchWord + '\b') + 1) set @Text = SUBSTRING(@Text, case @IndexOfDot when 0 then 0 else @TextLength - @IndexOfDot + 3 end, 300) while (select COUNT(*) from @TempSearchWords) > 0 begin select top 1 @CurrWord = Word from @TempSearchWords set @Text = dbo.RegExReplace(@Text, '\b' + @CurrWord + '\b', '<b>' + SUBSTRING(@Text, dbo.RegExIndexOf(@Text, '\b' + @CurrWord + '\b'), LEN(@CurrWord) + 1) + '</b>') delete from @TempSearchWords where Word = @CurrWord end insert into @FinalResults select * from @PrelimResults where [UID] = @CurrID delete from @PrelimResults where [UID] = @CurrID end
1. Nested while loops probably aren't the most efficient way of doing it, however nothing else comes to mind. If I were to use cursors, it would essentially be the same thing?
@FirstSearchWord here to refers to the first instance in the text of one of the original search words, so essentially the text you are replacing is only going to be in the summary. Again, it's quite a basic method, some sort of text cluster finding algorithm would probably be handy.
3. To get RegEx in the first place, you need CLR user-defined functions.
- → How to make Laravel use my class instead of native PDO?
- → SQL: simple custom column in select doesn't work?
- → How to execute Stored Procedure from Laravel
- → Which database engine type should be specified for Microsoft SQL Database in Laravel?
- → How to troubleshoot PDOException?
- → laravel sql server stored procedure output
- → Issue with converting a date using Carbon
- → SQL microsoft query to Laravel 4.2
- → General error 20018 Cannot Continue the file execution because the session is in the Kill state
- → List names of all available MS SQL databases on server using python
- → Variable which replace DB of name in SSMS
- → Java: database connection. Where is my mistake?
- → How Can I use "Date" Datatype in sql server?