Editing a Noise Word File

A noise word is a word such as the or an that is not useful for searches. A list of noise words for a particular language is stored in the noise word file for that language. Microsoft Office SharePoint Portal Server 2003 provides noise word files for the following languages:

  • Chinese-Simplified (noisechs.txt)
  • Chinese-Traditional (noisecht.txt)
  • Czech (noisecsv.txt)
  • Dutch (noisenld.txt)
  • English-International (noiseeng.txt)
  • English-US (noiseenu.txt)
  • Finnish (noisefin.txt)
  • French (noisefra.txt)
  • German (noisedeu.txt)
  • Hungarian (noisehun.txt)
  • Italian (noiseita.txt)
  • Japanese (noisejpn.txt)
  • Korean (noisekor.txt)
  • Polish (noiseplk.txt)
  • Portuguese (Brazil) (noiseptb.txt)
  • Russian (noiserus.txt)
  • Spanish (noiseesn.txt)
  • Swedish (noisesve.txt)
  • Thai (noisetha.txt)
  • Turkish (noisetrk.txt)

If there is no noise word list for a particular language, the neutral noise word file (noiseneu.txt) is used. The word breaker of the corresponding language parses noise words.

By default, SharePoint Portal Server stores noise word files in the following directory of the server: \Program Files\SharePoint Portal Server\DATA\Config. The data directory is located elsewhere if you chose to install the data files elsewhere during the server installation process. Making changes to this path will only affect any future installed search application. If you want to change the behavior in an existing installation, see the paragraph later in this section which provides a full path.

You can edit the noise word file. If you add noise words, the accuracy of your searches may decrease. However, the size of the content index also decreases, which helps performance. You can delete noise words if you want searches to return those words.

If you remove words from the noise word file, you will not see the effect unless you reset the content indexes and perform a full update. When an administrator removes words from the noise word file, these words are removed from files before the files are included in an index. You must update the index after you modify the noise word list. Otherwise, documents that contain the removed noise words will not be returned in queries for those terms.

You should never delete the noise word file. If you do not want noise words removed during update or query time, remove all entries from the file. If you delete the file, all single characters will be removed as noise words.

Noise word files are copied to \Program Files\SharePoint Portal Server\DATA\Applications\Application UID\Config. You can specify noise words at the application level instead of at the server or server farm level. For example, if SharePoint Portal Server and Microsoft SQL Server are installed on the same server, each can have different noise word lists.

ShowTip

If you want to include all words in the content index, even noise words, you should delete all entries from the noise word file for the language you are using. Leave the empty file in the data directory. If you delete the file, the neutral noise word file will prevent noise words from being included in the index.

Recommendation    If you remove words from the noise word file, it is highly recommended that you reset the content index (for more information about resetting the index, see Resetting a Content Index). When you remove a noise word from the list, queries will issue the term as valid, but the documents that were included in the index prior to the change will not have any occurrences of the term due to noise word removal. When you add a noise word, any query that is run after the word is added will have the term removed. The result of this is that the term will not return query results.

Edit a noise word file

  1. Open the file in Notepad. (The noise word files are in Unicode and require a text editor capable of editing Unicode files.)
  2. Add or delete a noise word.
  3. Save the file and close Notepad.
  4. Restart the Microsoft SharePoint Portal Server Search service (SharePointPSSearch). To do this:
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft SharePointPS Search, and then click Restart.
  5. Perform a full update of the index. For more information, see Starting a Full Update of a Content Index.

If the user is performing a search from the portal, SharePoint Portal Server discards some query terms as noise words even if the term itself is not a noise word. This occurs when the term is an inflectional form of the noise word. For example, if "be" is in the noise word file and you search for "am," "am" is treated as a noise word because it is a form of "be." If a user searches for a noise word, the portal returns no results.

 
 
Applies to:
Deployment Center 2003, SPS Admin 2003