Sunday, April 5, 2009

Creating your own site's search Engine

Setting up you search page for your web application is as easy as having a DB table where you retrieve data from by a simple query. I will illustrate the steps of setting up the configurations needed in this post, and will write another one for detailed stuff in custom search page. So lets start.


Setting up your Search Index

  1. start>run> type mmc
  2. in the new window, File> Add Remove Snap-in, or press CTRL + M as a shortcut
  3. Press Add, and choose Indexing Service from the List box as shown below
    clip_image001
    Press the Add button.
  4. A dialog box will appear asking you about the Computer name you want to apply the Indexing on, choose the local computer for this Demo.
    clip_image002
  5. Close all dialogs you have on your screen until you reach your Console Window, with Indexing Service running.
  6. Right Click on the Indexing Service node under the Console Root, and choose New>Catalog.
    Make sure that the Start option in the menu is activated before doing this step, otherwise, you have to stop the service first.
    clip_image003
  7. Give the new Catalog any Name. I will call it PP4.
    Browse to a directory where the Catalog file will be stored in. This is not the directory which you want to Index.
    clip_image004
  8. You will find a new node called PP4 just appeared under the Indexing Service node.
    Now we want to tell the new Catalog to search in your site. Right click on the PP4 Catalog, and choose New>Directory
    clip_image005
  9. Now fill in the Data as below
    clip_image006
    The Path, is the Physical Path of the website you'd like to Index. Finally press OK.
  10. Now right click on the PP4 Catalog, and choose Properties. Go to the tracking tab, and change the WWW server to your Default Website.
    clip_image007
  11. Now go to the Generation tab, and uncheck the checkbox "Inherit above settings from Service".
    clip_image008
    It Enables by default the "Generate abstracts" checkbox. This tells the search index, to get some text from the searched pages, just like Google.com for instance when they get you a sample text under each item in the search result. The default is not generate any abstracts. This text is by default the first 320 characters in the page. You can customize that just by adding some text in the "description" meta tag in the HTML's head section.
  12. Now start the Service.
    clip_image009

Now we are done with setting up the search index configurations. Next you have to create a page that calls this catalog for searching. But before doing this, lets try to figure out whether everything is working fine or not.

 

Querying your Search Index through the built in page

Under the PP4 Catalog we have just created, you will find a node called "Query the Catalog". Press on that node and a page will load up at the right part of the window as shown below. Here you can try by typing "Partners" for instance, and have a look at the query result. As you can see, the query includes .cs, .vb, .css files and many other types of unwanted files. you can control that by creating your own page, using your own code.

clip_image010

 

Querying your Search Index through a custom page using Query Language

I will go through the main parts of the query, and you can figure out the rest. It's as if you are querying a simple SQL Database and binding the results to a Repeater.

Connection:

    OleDbConnection odbSearch = new OleDbConnection( "Provider=\"MSIDXS\";Data Source=\"PP4\";");

    Please note that the Data Source is the Catalog name you specified in the Search Index configurations.

Command:

    cmdSearch.Command Text = "select doctitle, filename, vpath, rank, characterization from scope() where FREETEXT(Contents, '"+ searchText +"') order by  rank desc ";

Where the "searchText" is the text you typed in the textbox for searching.

The rest is as easy as executing the query and binding the results to a repeater. I have made one on my own and took a snap shot of the running program below:

clip_image011