Sunday, April 5, 2009

Creating your own site's search Engine

Setting up you search page for your web application is as easy as having a DB table where you retrieve data from by a simple query. I will illustrate the steps of setting up the configurations needed in this post, and will write another one for detailed stuff in custom search page. So lets start.


Setting up your Search Index

  1. start>run> type mmc
  2. in the new window, File> Add Remove Snap-in, or press CTRL + M as a shortcut
  3. Press Add, and choose Indexing Service from the List box as shown below
    clip_image001
    Press the Add button.
  4. A dialog box will appear asking you about the Computer name you want to apply the Indexing on, choose the local computer for this Demo.
    clip_image002
  5. Close all dialogs you have on your screen until you reach your Console Window, with Indexing Service running.
  6. Right Click on the Indexing Service node under the Console Root, and choose New>Catalog.
    Make sure that the Start option in the menu is activated before doing this step, otherwise, you have to stop the service first.
    clip_image003
  7. Give the new Catalog any Name. I will call it PP4.
    Browse to a directory where the Catalog file will be stored in. This is not the directory which you want to Index.
    clip_image004
  8. You will find a new node called PP4 just appeared under the Indexing Service node.
    Now we want to tell the new Catalog to search in your site. Right click on the PP4 Catalog, and choose New>Directory
    clip_image005
  9. Now fill in the Data as below
    clip_image006
    The Path, is the Physical Path of the website you'd like to Index. Finally press OK.
  10. Now right click on the PP4 Catalog, and choose Properties. Go to the tracking tab, and change the WWW server to your Default Website.
    clip_image007
  11. Now go to the Generation tab, and uncheck the checkbox "Inherit above settings from Service".
    clip_image008
    It Enables by default the "Generate abstracts" checkbox. This tells the search index, to get some text from the searched pages, just like Google.com for instance when they get you a sample text under each item in the search result. The default is not generate any abstracts. This text is by default the first 320 characters in the page. You can customize that just by adding some text in the "description" meta tag in the HTML's head section.
  12. Now start the Service.
    clip_image009

Now we are done with setting up the search index configurations. Next you have to create a page that calls this catalog for searching. But before doing this, lets try to figure out whether everything is working fine or not.

 

Querying your Search Index through the built in page

Under the PP4 Catalog we have just created, you will find a node called "Query the Catalog". Press on that node and a page will load up at the right part of the window as shown below. Here you can try by typing "Partners" for instance, and have a look at the query result. As you can see, the query includes .cs, .vb, .css files and many other types of unwanted files. you can control that by creating your own page, using your own code.

clip_image010

 

Querying your Search Index through a custom page using Query Language

I will go through the main parts of the query, and you can figure out the rest. It's as if you are querying a simple SQL Database and binding the results to a Repeater.

Connection:

    OleDbConnection odbSearch = new OleDbConnection( "Provider=\"MSIDXS\";Data Source=\"PP4\";");

    Please note that the Data Source is the Catalog name you specified in the Search Index configurations.

Command:

    cmdSearch.Command Text = "select doctitle, filename, vpath, rank, characterization from scope() where FREETEXT(Contents, '"+ searchText +"') order by  rank desc ";

Where the "searchText" is the text you typed in the textbox for searching.

The rest is as easy as executing the query and binding the results to a repeater. I have made one on my own and took a snap shot of the running program below:

clip_image011

7 comments:

  1. Connection:

    OleDbConnection odbSearch = new OleDbConnection( "Provider=\"MSIDXS\";Data Source=\"PP4\";");

    Please note that the Data Source is the Catalog name you specified in the Search Index configurations.

    Command:

    cmdSearch.Command Text = "select doctitle, filename, vpath, rank, characterization from scope() where FREETEXT(Contents, '"+ searchText +"') order by rank desc ";

    Just curious looking at your command text it looks susceptible to injection. Am I wrong in assuming injection is still a potential threat in this scenario or did you choose to ignore it for a reason?

    ReplyDelete
  2. Most of the time the website data is stored in Database, and for that case you can use DBSight to create one easily, with much more features.

    http://www.dbsight.net

    ReplyDelete
  3. @blackhawksq
    I don't think this would be possible. Obviously because the catalog is a read only service,that you cant even modify anything using the mmc dialog unless you stop the whole service on your machine. We could also do that by putting more validations on your query before submission, like replacing the single quote with 2 single quotes for instance. What do you think?

    ReplyDelete
  4. @Chris
    This would be ok for you and me, but won't be ok at all for companies like the one I am working for :)
    We dont use any 3rd party code or softwares, unless we have a license for that. So we have to implement everything ourselves.

    ReplyDelete
  5. Have you checked "Lucene.NET" indexing library with "Seek A File Server" file indexer? It's really nice in terms of performance, very small index size, and plenty other features (integrates also with any IFilter for indexing contents of different file types), and makes it really easy to integrate this with DB(in scenarios when you want to search both DB and files system and integrate results of file system with other records in DB - I have been in that situation 2 years ago).
    Actually, I had to make this choice before, whether to go for Windows Search, Google Desktop, or Lucene.NET with Seek a File Server, and even with Windows Search 4.0, I'd go for Lucene/SeekAFile.

    ReplyDelete
  6. @Mohamed Meligy but is it free? Some companies argue that they would never use any non free service.

    ReplyDelete