By Erik Heino, Microsoft Corporation
Introduction
In Microsoft Office SharePoint Portal Server 2003, content from
sites other than the portal site is crawled by using the content
index for non-portal site content. This content index, which
also includes non-portal site content that is not in the site
directory, is managed separately from the content index for portal
site content.
All sites in the site directory are added to a content source
for the site directory that is provided in SharePoint Portal Server
2003 by default. The content of sites in the site directory is
crawled according to site inclusion and exclusion rules and is
included in the content index for non-portal site content
along with all other non-portal site content. Searches for
content on sites other than the portal site are then performed
using the information in the content index for non-portal site
content.
By default, all sites in the site directory are included when
the content source for the site directory is crawled, or when
updates are performed on the entire content index for
non-portal site content. If sites have not been added to the
site directory, the content of those sites is not included when the
content source for the site directory is crawled.
In large organizations with many sites in the site directory,
both crawling and searching using the site directory content source
can be complex and time-consuming. To simplify the management of
sites in the site directory, you can create new content sources
organized by content source groups. These content source groups can
be used to crawl only some of the sites in the content index for
non-portal site content, or to create search scopes so that
users can search only sites in a particular group.
By carefully managing the content source groups that you use for
your site directory, you can simplify the crawling of thousands of
sites using many fewer content sources. You can also aggregate
content sources across servers in a shared services configuration
to simplify crawling even further. This is a good practice for
large organizations that manage search across many portal
sites.
The process for scoping and configuring the content index for
the site directory is as follows:
- Add sites to the site directory.
- Approve or reject sites for crawling.
- Configure a content source group.
- Assign source groups to sites.
- Create search scopes for sites in the site directory.
- Crawl the site directory.
This topic is part of an eight-topic series.