Microsoft Corporation uses SharePoint Portal Server 2003 in a
shared services configuration for its internal portal site, after
historically developing separate search solutions for numerous
internal sites. This configuration is used by over 250 portal sites
in four regions, including more than 16 Microsoft intranet portal
sites that are not part of the central collaboration platform.
The lessons learned from this deployment are helpful for any
large organization that is planning to use search in a shared
services configuration.
Using search in a shared services configuration with SharePoint
Portal Server 2003 is more efficient than configuring search
separately for each site in the enterprise and improves the search
experience for employees.
The benefits of using shared services include:
- The costs and resources used during deployment are lower,
because administrators do not have to configure and deploy content
indexes for each portal site.
- Ongoing resources in terms of time, money, network bandwidth,
storage, and computer capacity are lower, because multiple servers
are not crawling and searching the same content source at the same
time.
- Deployment, management, and upgrading of computers are easier
because those resources are centralized in one place.
- Users can perform searches across the entire organization or
scope searches to specific sites. They don't have to guess
which site they need or browse through a complicated set of
interlinked sites to find the right place to search for what
they're looking for.
Before implementing a shared services configuration, and when
configuring site directories for the first time, portal site
administrators can consider several factors during each stage of
administration that will simplify administration of site
directories. These considerations affect the configuration of
shared services, the creation and addition of sites to the site
directory, the approval of sites for crawling, the management of
the site directory and content sources in general, and the creation
of search scopes.
For more information about the best practices identified by this
scenario, see Deploying SharePoint Portal Server 2003 Shared
Services at Microsoft. Other issues concerning that deployment are
discussed in Microsoft Web Enterprise Portal.
Shared Services Configuration Considerations
Before setting up an organization to use shared services,
administrators must consider the following points related to search
functionality in that configuration:
- An account must exist that can access all of the content on the
portal site. This account will be used as the default content
access account for the entire portal site.
- If you are going to change the name or description for the
central portal site, it is a good idea to change it before
configurating shared services on the parent portal site.
- Administrators must work with contacts throughout the larger
organization to decide upon the best taxonomy to use when creating
search scopes, as part of a larger effort of organizing a taxonomy
for all of the portal sites in the organization.
Site Creation Considerations
The creation of new sites in your organization influences all of
the steps that you must take as an administrator when you configure
search in a shared services configuration using SharePoint Portal
Server 2003. Several decisions must be made before people in your
organization begin creating sites. These include:
- Who can create sites, and how is the right to create sites
enforced?
- How is creation of those sites managed?
- What kind of content is going to be available on those
sites?
- Who is going to review sites and decide whether to include them
in the content index?
- Which search scopes are necessary for users to find content on
each site?
The answers to these questions will vary by organization. By
default, all users in the Contributor site group or with the Create
Sites right in SharePoint Portal Server 2003 can create sites.
Portal site administrators can modify membership in site groups or
the rights for each site group to limit or expand the ability to
create sites.
Managing the creation of those sites is another potential
problem. In large organizations, dozens of sites can be created
each day. At Microsoft, it's not unusual to have as many as
50 sites added to the site directory in a typical day. To properly
approve those sites, search managers or portal site administrators
would have to visit the newly added sites individually, which can
take a long time.
To make management of site creation easier, it may be necessary
to modify the process for creating sites. For the Microsoft portal
site, the site creation page was modified with two updated site
creation forms that ask for more information about the sites that
are being added.
The first site creation form is for sites that are being added
using SharePoint Portal Server 2003 and Microsoft Windows
SharePoint Services, the approved platform for the portal site. The
second site creation form is for sites that have been created on
different platforms that the user wants to register in the site
directory so the contents are included in search results.
Both forms ask for metadata that provides additional information
to portal site administrators when they are reviewing the sites
that are added each day. The forms also submit each new site to a
Microsoft Excel spreadsheet that is used by administrators to track
all existing sites that have been added, approved, or rejected for
easier management of the site directory.
Site Approval Considerations
After sites are created, they are added to the site directory
but must be approved by a reviewer with the Manage Search right
before their content is crawled and appears in search results.
Existing sites that are added to the site directory must also be
approved. If users creating a site have the Manage Search right,
their sites are automatically approved.
To ensure the quality of its search results, Microsoft uses the
following criteria when reviewing the list of sites included in
search:
- The business purpose and relevance of a site.
- Timeliness of content.
Ideally, the content of a site is less than one year old. Sites
with older content are considered on a case-by-case basis with
inclusion depending on the subject area, product, or initiative for
which the site was originally targeted.
In your organization, you may add or choose other criteria, or
be more specific in the criteria you use. Whatever criteria you
choose, consider ways to eliminate sites with less relevant,
out-of-date, or inaccurate information.
During approval, the search manager or portal site administrator
must also decide whether to include sites in a content source
group, so that the content can be crawled and searched without
crawling and searching all content. A decision must also be made
about which content source group to use.
Each site can be included in only one content source group,
although it is possible to create a second listing for the site
with a different title and add them to different source groups.
Because creating duplicate listings crawls the same site twice for
the same content index and uses up additional resources, it is not
recommended that you do this often.
Site Directory Management Considerations
After sites are approved for crawling, portal site
administrators must continue to manage their inclusion in the site
directory. In a large organization, this can be a formidable
task.
The site directory for the Microsoft Web site, for example,
contains over 240 pages of site listings. The user interface for
SharePoint Portal Server 2003 can display only 40 site listings per
page on the page used to manage crawls of the site directory when
organized alphabetically by URL.
While there is search on the site directory itself, there is no
search for the page used to manage crawls of the site directory. To
find a site that you want to edit, reject, approve, or delete, you
must guess roughly which page it will be found on.
The number of sites that must be administered each day can
compound these difficulties. The search managers for the Microsoft
Web site regularly see 50 or more new sites added to the site
directory each day.
To make this management task a little easier, for the Microsoft
Web site, the sites are tracked in a spreadsheet that lists their
location on the page used to manage crawls of the site directory,
along with other information about the site. Your organization may
use any tracking system that helps you to organize, find, and
manage a large number of sites.
Management tasks that you may consider include:
- Watching for sites that have been created but are not in use.
When a SharePoint site is created, it has a small starting size of
about 512 KB. By tracking the size of sites, you can determine
which sites are not getting larger than this initial size. This is
a strong sign of a defunct site, which you may decide to delete
from the site registry.
- Watching for sites with content that is no longer relevant.
This often includes sites that were useful when they were first
added, but that are no longer used and contain no recent
information.
- Approving sites that were previously rejected but now contain
useful information. Often, sites that are added by users are still
under construction and are not currently useful, but may be useful
later. Search managers might decide to keep these sites in the
rejected pile but not delete them, waiting for a time when they can
be approved.
- Deleting sites that have been rejected for a long time.
- Deleting duplicate sites.
- Deleting child sites that are already crawled as part of the
parent site.
- Changing sites that were crawled as part of the full crawl of
the site directory content index so that they are crawled as part
of a content source group. Every time you create a new content
source group, it is a good idea to review your sites to determine
which of them belong in the new content source group.
Content Source Management Considerations
In a large organization with many sites in the site directory,
portal site administrators can use content source groups to
organize the many content sources that exist for the portal site.
Rather than create separate content sources for every external
site, the site directory can be used to crawl all sites in one site
directory content source. Additional content sources are created
that are scoped to specific content source groups.
By carefully organizing these content source groups, the total
number of content sources can be kept to a manageable size. This is
important because, while the object model for SharePoint Portal
Server 2003 allows for more content sources, the user interface for
the page used to manage crawls of the site directory only allows
250 content sources to be displayed at once. From a management
point of view, even this may be too many.
When possible, it is a good idea to aggregate content sources.
In a shared services scenario, you can associate each portal site
with the parent portal site, so that all content for all approved
sites is crawled as part of the single site directory for the
parent portal site.
When you associate a portal site with your parent portal site,
those sites are added to its site directory automatically. After
you delete duplicate sites, you have all the sites in one content
index. You can then crawl the content for each site from that
parent portal site by creating a content source group for each
site. The result is a site directory with one content source per
content source group, with content source groups organized by
portal site.
If necessary, for important subjects that apply across content
source groups, you can create additional content sources that crawl
content for those subjects. The overall goal is to reduce content
sources to a manageable size, which can then be used to create a
manageable number of search scopes. For the Microsoft Web site,
dozens of content sources and source groups over the entire
enterprise were reduced to a manageable list of 60 or so content
sources.
To simplify management of search across a large organization or
enterprise, it is recommended that you use one account to crawl all
portal sites. If necessary, you can create other accounts for
specific exceptions, but, when possible, using one account is
simpler and easier to manage.This account is the default content
access account for the parent portal site in the shared services
configuration.
Search Scope Considerations
Every organization is going to develop an informal or formal
system of categorization of the content available within that
organization. To avoid confusing use of terminology and the
grouping of content in confusing, poorly structured, or overlapping
categories, many organizations develop a formal taxonomy to
organize how content is discussed and presented. Developing a good
taxonomy is a particularly good idea when you are implementing a
portal site using SharePoint Portal Server 2003.
Search scopes can help you expose your underlying taxonomy to
the users of your portal sites by providing a way for them to
search for information using the organizational categories that you
provide.
You can create search scopes using any kind of categorization or
structure that makes sense for your organization, keeping the
overall taxonomy of your organization in mind. Ideally, before you
configure your portal site, the people in your organization will
decide upon how major sites in the organization will be managed so
that the content of those sites fits within the larger structure
and purpose of the organization.
If your sites are created with this taxonomy in mind, they will
reflect the categories and structure of your taxonomy. This allows
portal site administrators to create search scopes for the parent
portal site for each portal site that has been associated with the
parent portal site's site directory.
For the Microsoft Web site, one search scope is created per
site. Because sites are created according to an agreed-upon
taxonomy, these search scopes are more than just pointers to
content on generic sites; they also make sense as categories of
information. Additional search scopes may be created for important
information that is found across many portal sites, but, by
properly fitting sites into the taxonomy, the number of these
additional search scopes can be limited to a manageable number.
Search scopes are all visible on the parent portal site. Other
search scopes may be created for each portal site to further scope
searches within each site. But the content for the entire
enterprise is still crawled and added to the content index for the
main portal site.
This topic is part of an eight-topic series.