Microsoft Office Online
Sign in to My Office Online (What's this?) | Sign in

 
 
Microsoft Office SharePoint Server (MOSS)
Search
Search
 
Icon: Flag: (c) Microsoft
Get up to speed
 
 
 
Warning: You are viewing this page with an unsupported Web browser. This Web site works best with Microsoft Internet Explorer 6.0 or later, Firefox 1.5, or Netscape Navigator 8.0 or later. Learn more about supported browsers.

Best Practices for Configuring the Site Directory in a Shared Services Configuration
 

Microsoft Corporation uses SharePoint Portal Server 2003 in a shared services configuration for its internal portal site, after historically developing separate search solutions for numerous internal sites. This configuration is used by over 250 portal sites in four regions, including more than 16 Microsoft intranet portal sites that are not part of the central collaboration platform.

The lessons learned from this deployment are helpful for any large organization that is planning to use search in a shared services configuration.

Using search in a shared services configuration with SharePoint Portal Server 2003 is more efficient than configuring search separately for each site in the enterprise and improves the search experience for employees.

The benefits of using shared services include:

  • The costs and resources used during deployment are lower, because administrators do not have to configure and deploy content indexes for each portal site.
  • Ongoing resources in terms of time, money, network bandwidth, storage, and computer capacity are lower, because multiple servers are not crawling and searching the same content source at the same time.
  • Deployment, management, and upgrading of computers are easier because those resources are centralized in one place.
  • Users can perform searches across the entire organization or scope searches to specific sites. They don't have to guess which site they need or browse through a complicated set of interlinked sites to find the right place to search for what they're looking for.

Before implementing a shared services configuration, and when configuring site directories for the first time, portal site administrators can consider several factors during each stage of administration that will simplify administration of site directories. These considerations affect the configuration of shared services, the creation and addition of sites to the site directory, the approval of sites for crawling, the management of the site directory and content sources in general, and the creation of search scopes.

For more information about the best practices identified by this scenario, see Deploying SharePoint Portal Server 2003 Shared Services at Microsoft. Other issues concerning that deployment are discussed in Microsoft Web Enterprise Portal.

Shared Services Configuration Considerations

Before setting up an organization to use shared services, administrators must consider the following points related to search functionality in that configuration:

  • An account must exist that can access all of the content on the portal site. This account will be used as the default content access account for the entire portal site.
  • If you are going to change the name or description for the central portal site, it is a good idea to change it before configurating shared services on the parent portal site.
  • Administrators must work with contacts throughout the larger organization to decide upon the best taxonomy to use when creating search scopes, as part of a larger effort of organizing a taxonomy for all of the portal sites in the organization.

Site Creation Considerations

The creation of new sites in your organization influences all of the steps that you must take as an administrator when you configure search in a shared services configuration using SharePoint Portal Server 2003. Several decisions must be made before people in your organization begin creating sites. These include:

  • Who can create sites, and how is the right to create sites enforced?
  • How is creation of those sites managed?
  • What kind of content is going to be available on those sites?
  • Who is going to review sites and decide whether to include them in the content index?
  • Which search scopes are necessary for users to find content on each site?

The answers to these questions will vary by organization. By default, all users in the Contributor site group or with the Create Sites right in SharePoint Portal Server 2003 can create sites. Portal site administrators can modify membership in site groups or the rights for each site group to limit or expand the ability to create sites.

Managing the creation of those sites is another potential problem. In large organizations, dozens of sites can be created each day. At Microsoft, it's not unusual to have as many as 50 sites added to the site directory in a typical day. To properly approve those sites, search managers or portal site administrators would have to visit the newly added sites individually, which can take a long time.

To make management of site creation easier, it may be necessary to modify the process for creating sites. For the Microsoft portal site, the site creation page was modified with two updated site creation forms that ask for more information about the sites that are being added.

The first site creation form is for sites that are being added using SharePoint Portal Server 2003 and Microsoft Windows SharePoint Services, the approved platform for the portal site. The second site creation form is for sites that have been created on different platforms that the user wants to register in the site directory so the contents are included in search results.

Both forms ask for metadata that provides additional information to portal site administrators when they are reviewing the sites that are added each day. The forms also submit each new site to a Microsoft Excel spreadsheet that is used by administrators to track all existing sites that have been added, approved, or rejected for easier management of the site directory.

Site Approval Considerations

After sites are created, they are added to the site directory but must be approved by a reviewer with the Manage Search right before their content is crawled and appears in search results. Existing sites that are added to the site directory must also be approved. If users creating a site have the Manage Search right, their sites are automatically approved.

To ensure the quality of its search results, Microsoft uses the following criteria when reviewing the list of sites included in search:

  • The business purpose and relevance of a site.
  • Timeliness of content.

Ideally, the content of a site is less than one year old. Sites with older content are considered on a case-by-case basis with inclusion depending on the subject area, product, or initiative for which the site was originally targeted.

In your organization, you may add or choose other criteria, or be more specific in the criteria you use. Whatever criteria you choose, consider ways to eliminate sites with less relevant, out-of-date, or inaccurate information.

During approval, the search manager or portal site administrator must also decide whether to include sites in a content source group, so that the content can be crawled and searched without crawling and searching all content. A decision must also be made about which content source group to use.

Each site can be included in only one content source group, although it is possible to create a second listing for the site with a different title and add them to different source groups. Because creating duplicate listings crawls the same site twice for the same content index and uses up additional resources, it is not recommended that you do this often.

Site Directory Management Considerations

After sites are approved for crawling, portal site administrators must continue to manage their inclusion in the site directory. In a large organization, this can be a formidable task.

The site directory for the Microsoft Web site, for example, contains over 240 pages of site listings. The user interface for SharePoint Portal Server 2003 can display only 40 site listings per page on the page used to manage crawls of the site directory when organized alphabetically by URL.

While there is search on the site directory itself, there is no search for the page used to manage crawls of the site directory. To find a site that you want to edit, reject, approve, or delete, you must guess roughly which page it will be found on.

The number of sites that must be administered each day can compound these difficulties. The search managers for the Microsoft Web site regularly see 50 or more new sites added to the site directory each day.

To make this management task a little easier, for the Microsoft Web site, the sites are tracked in a spreadsheet that lists their location on the page used to manage crawls of the site directory, along with other information about the site. Your organization may use any tracking system that helps you to organize, find, and manage a large number of sites.

Management tasks that you may consider include:

  • Watching for sites that have been created but are not in use. When a SharePoint site is created, it has a small starting size of about 512 KB. By tracking the size of sites, you can determine which sites are not getting larger than this initial size. This is a strong sign of a defunct site, which you may decide to delete from the site registry.
  • Watching for sites with content that is no longer relevant. This often includes sites that were useful when they were first added, but that are no longer used and contain no recent information.
  • Approving sites that were previously rejected but now contain useful information. Often, sites that are added by users are still under construction and are not currently useful, but may be useful later. Search managers might decide to keep these sites in the rejected pile but not delete them, waiting for a time when they can be approved.
  • Deleting sites that have been rejected for a long time.
  • Deleting duplicate sites.
  • Deleting child sites that are already crawled as part of the parent site.
  • Changing sites that were crawled as part of the full crawl of the site directory content index so that they are crawled as part of a content source group. Every time you create a new content source group, it is a good idea to review your sites to determine which of them belong in the new content source group.

Content Source Management Considerations

In a large organization with many sites in the site directory, portal site administrators can use content source groups to organize the many content sources that exist for the portal site. Rather than create separate content sources for every external site, the site directory can be used to crawl all sites in one site directory content source. Additional content sources are created that are scoped to specific content source groups.

By carefully organizing these content source groups, the total number of content sources can be kept to a manageable size. This is important because, while the object model for SharePoint Portal Server 2003 allows for more content sources, the user interface for the page used to manage crawls of the site directory only allows 250 content sources to be displayed at once. From a management point of view, even this may be too many.

When possible, it is a good idea to aggregate content sources. In a shared services scenario, you can associate each portal site with the parent portal site, so that all content for all approved sites is crawled as part of the single site directory for the parent portal site.

When you associate a portal site with your parent portal site, those sites are added to its site directory automatically. After you delete duplicate sites, you have all the sites in one content index. You can then crawl the content for each site from that parent portal site by creating a content source group for each site. The result is a site directory with one content source per content source group, with content source groups organized by portal site.

If necessary, for important subjects that apply across content source groups, you can create additional content sources that crawl content for those subjects. The overall goal is to reduce content sources to a manageable size, which can then be used to create a manageable number of search scopes. For the Microsoft Web site, dozens of content sources and source groups over the entire enterprise were reduced to a manageable list of 60 or so content sources.

To simplify management of search across a large organization or enterprise, it is recommended that you use one account to crawl all portal sites. If necessary, you can create other accounts for specific exceptions, but, when possible, using one account is simpler and easier to manage.This account is the default content access account for the parent portal site in the shared services configuration.

Search Scope Considerations

Every organization is going to develop an informal or formal system of categorization of the content available within that organization. To avoid confusing use of terminology and the grouping of content in confusing, poorly structured, or overlapping categories, many organizations develop a formal taxonomy to organize how content is discussed and presented. Developing a good taxonomy is a particularly good idea when you are implementing a portal site using SharePoint Portal Server 2003.

Search scopes can help you expose your underlying taxonomy to the users of your portal sites by providing a way for them to search for information using the organizational categories that you provide.

You can create search scopes using any kind of categorization or structure that makes sense for your organization, keeping the overall taxonomy of your organization in mind. Ideally, before you configure your portal site, the people in your organization will decide upon how major sites in the organization will be managed so that the content of those sites fits within the larger structure and purpose of the organization.

If your sites are created with this taxonomy in mind, they will reflect the categories and structure of your taxonomy. This allows portal site administrators to create search scopes for the parent portal site for each portal site that has been associated with the parent portal site's site directory.

For the Microsoft Web site, one search scope is created per site. Because sites are created according to an agreed-upon taxonomy, these search scopes are more than just pointers to content on generic sites; they also make sense as categories of information. Additional search scopes may be created for important information that is found across many portal sites, but, by properly fitting sites into the taxonomy, the number of these additional search scopes can be limited to a manageable number.

Search scopes are all visible on the parent portal site. Other search scopes may be created for each portal site to further scope searches within each site. But the content for the entire enterprise is still crawled and added to the content index for the main portal site.

This topic is part of an eight-topic series.

advertisement