
|
Converting a UNIX
.COM Site to Windows Microsoft
Internal Distribution This white paper discusses
the approach used to convert the Hotmail web server farm from UNIX
to Windows 2000, and the reasons the features and techniques were
chosen. It will focus primarily on the planners, developers, and system
administrators. The purpose of the paper is to provide insight for
similar deployments using Windows 2000. We will discuss the techniques
from the viewpoint of human engineering as well as software engineering. Early results from
the conversion, which was limited to the front-end web servers, are:
n
Windows 2000 provides much better throughput than UNIX.
n
Windows 2000 provides slightly better performance than UNIX.
n
There is potential, not yet realized, for stability of individual
systems to be equal to that of UNIX. The load-balancing technology
ensures that the user experience of the service is that stability
is as good as it was before the conversion.
n
As this paper will show, while the core features of Windows 2000 are
able to run the service, its administrative model is not well suited
to the conversion. The observations related
here are derived from experience gained at a single site. More work
would be needed to establish whether they are representative.
Critical
Features of Hotmail as a .COM Site
Hotmail
Architectural Decisions
Installation Methodology Conserved
System
Creation, Mastering and Installation
OS installation and configuration
Tuning and hardening the system
Application
Installation and Update
Distribution mechanism and format
Converting
the UNIX Administrator
Microsoft acquired
Hotmail at the end of 1997 as a going concern. The service’s creators
had defined a two-layer architecture built around various UNIX systems:
n
Front end web servers, built with dual Pentium systems on racked motherboards,
running Apache on FreeBSD (a configuration with no need to install
licensed software)
n
Back end file stores, built with Sun Enterprise 4500 servers, running
Solaris 2.6 (Sun’s UNIX) and with all user data stored on RAID arrays,
accessed using very simple filing semantics
n
Incoming mail listeners, built on Sun Sparc 5 processors, and interacting
directly with the back end
n
Name/password verification engines, build on Enterprise 4500 servers
n
Member Directory, built on PCs with NT and SQL The conversion of
the Hotmail web servers to Windows is an ongoing project with several
rationales. The team was hoping for better utilization of the existing
hardware resources. The superior development and internationalization
tools are important. A Microsoft property should eat its own dogfood.
Finally, we wished to use the conversion experience as a model for
other UNIX conversions that we hope to carry out in the future. The first phase of
the conversion, described here, was limited to the web servers. Appropriate
hardware was already in place, and the planning and development staff
were confident that they already understood how to perform the conversion
successfully. There were several
constraints on the conversion process, which are probably typical
of the average Internet site:
n
Hotmail has established an 8-week cycle of version upgrades, and there
was a desire (and some partner pressure) to keep that cycle going.
n
It is essential to keep the service running continuously.
n
The staff is small, and there was not an opportunity to add staff. We believe Hotmail
is instructive as an example of the large Internet server site. It
is one of the largest such sites on the planet, so we should be judicious
in applying its principles to sites that are comparatively very
small, and don’t have the issues deriving from multiplication of resources. As stated above, we
are concentrating on the front-end web servers. Although some of the
following comments are also applicable to the database machines, we
will not address them specifically in this paper.
1)
Restricted, well-controlled application. The application under UNIX
was a collection of CGI programs, serving about 100 distinct URLs,
which have been converted to an ISAPI module. The programs are written
in C++. The entire application is under the control of one team, and
its architecture is well understood by all of the teams (dev, test
and operations). Updates are only due to scheduled code releases,
or hotfixes. This contrasts with a site like microsoft.com, which
has many different authors and continuous updates.
2)
Lights-out administration. All the servers are in a controlled facility
that may be staffed by contractors, and it should not be necessary
for skilled staff to visit the individual machines for any reason.
Machines should be self-monitoring, and Operations staff should be
able to maintain them remotely using minimal interaction.
3)
Multiple identically configured machines. This leads to a need to
have all regular system administration functions, including OS and
application update, be scripted, rapid, reliable, and non-interactive.
There is simply not time for an administrator to interact personally
with all machines. A load-balancing mechanism routes customer requests
from a virtual address to one of the real servers.
4)
System costs suffer multiplicative effects. Adding a VGA monitor or
a second NIC to a server, or running a serial cable to it, may be
pocket change when applied to a single machine. Purchasing several
thousand such devices, however, becomes a significant investment and
has to be thoroughly justified.
5)
100% availability. A large Internet site must provide service 24x7.
Furthermore, the full capacity should be available all the time. Hotmail’s
load fluctuates daily according to the time across the US, but not
by much; the international usage is high.
6)
Simultaneous upgrade. The pervious two points mean that the servers
must be upgraded essentially simultaneously, unless some kind of server
affinity mechanism can be implemented per user session. Since a typical
user interaction involves several clicks, it would not be good for
a user to jump backwards and forwards between code releases; the problems
would range from inconsistency in style to (apparently) half-implemented
new features.
7)
No personal machines or accounts. All machines are assumed to be secure
because of physical location and electrical isolation. Generally speaking,
when an administrator is operating on the server or a scheduled tasks
runs, full administrative privileges are given. This increases the
danger, but reduces the load required to maintain and synchronize
accounts.
8)
Remote monitoring. All performance monitoring is done by querying
the server or by automated reports, and monitoring uses the single
NIC. In Hotmail’s case, there is plenty of spare network headroom
on each server for this monitoring not to penalize the primary operation.
9)
No architectural limits on growth. An Internet site expects to keep
growing, and built-in limits that seem unreasonably high in the early
days will one day loom up and need to be fixed, using resources that
should be enhancing the site. Hotmail has grown from 9 million accounts
when it was acquired by Microsoft, to 100 million in July 2000, without
significant changes in the hardware or software architecture. The final four items
are more closely related to Hotmail’s architectural choices, but we
believe they are representative of the market.
10)
Scale-out. The Hotmail website is built from several modules, with
each module present in different multiples and able to be scaled out
almost indefinitely. In this phase of the project, we are considering
the front-end, the web servers that house all of the user interface
logic and some parts of the business logic. Among the servers, the
majority (“front doors”) runs some code in response to each click,
and these were the primary targets for conversion. The machines are
single-board x86 PCs, moderately powerful, using Apache, running on
FreeBSD version 3.0, to deliver content. Fortunately, these servers
are good Windows 2000 hosts. There
are also some servers that serve static content and will be almost
trivial to convert once the front doors have been converted. Administration
of these servers will use the same methodology as the front doors.
They also run on FreeBSD, using the server “boa”, which is optimized
for serving static content rapidly.
11)
Configuration conservatism. There are more than 3,000 front door machines,
all identically configured. Having the servers essentially identical
is important to the operators’ ability to administer the site. The
approach to the hardware is very conservative: once a hardware configuration
is established, it is easy to keep rolling out copies rather than
try to qualify a newer model. This
conservatism also applies to the software design. The need to run
the project on Internet Time
[1]
has an impact on this project in several ways:
in this case, designers always need to be improving the application
and there is little resource left over for redesigning the basic architecture.
Furthermore, the various modules of the site are developed independently,
creating a force for stability in the internal protocols.
12)
Design for stability. Virtually continuous uptime and a consistent
response time are crucial. This is achieved by some overcapacity,
and highly reliable load-balancing hardware (Cisco Local Director).
Local Director is just another module in the scale-out solution.
13)
Controlled and understood systems. A fact about UNIX is that it is
easy for an administrator to ensure that there are no irrelevant services
running. As well as giving the potential for maximizing performance,
it is useful to be sure that there are no random TCP/IP or UDP ports
open that could be used as a basis for an attack. To some, this transparency
is intrinsic to UNIX, but it also comes from a greater familiarity
among system administrators with its internal workings. The headless nature
of the systems, and their remote location, have a profound influence
on the way the systems are administered. Headless operation means
that any direct interaction will be through a remote session (telnet
or Terminal Server); nobody will be able to detect an important dialog
on the console
[2]
, and even a bluescreen is not apparent. Remote
operation means that there is a specific cost associated with walking
up to the machine. The site is serviced by contractors whose job is
mainly limited to replacing failed servers and rebooting on demand;
it is possible to attach a monitor and keyboard to a running system,
but that is operationally an exception. Commonly, although
not strictly correctly, the generic term UNIX describes a family of
operating systems that are deployed on a variety of systems. Although
their internal design may be different, the variants appear to their
end-users as the same system, with minor (and annoying) differences
in usage. There are two variants in use at Hotmail: FreeBSD, which
can be used without license cost and is available in source form,
and Solaris, which is bundled with Sun hardware. Linux, which is just
another UNIX variant, was not used at Hotmail. The following sections
will examine facts about UNIX (specifically FreeBSD) as they relate
to the conversion problem. We also consider Apache as an intrinsic
part of the UNIX-based solution, in the same way that IIS is an intrinsic
part of Windows 2000 Server.
1)
Familiarity. Entrepreneurs in the startup world are generally familiar
with one version of UNIX (usually through college education), and
training in one easily converts to another. When setting up a new
enterprise, it’s easy to work with what you know than to take time
investigating the alternatives.
2)
Reputation for stability. Both the UNIX kernel, and the design techniques
it encourages, are renowned for stability. A system of several thousand
servers must run reliably and without intervention to restart failed
systems. For Windows 2000, we must first prove the stability in the
same environment, and we must then convince the rest of the world. Apache
is also designed for stability and correctness, rather than breadth
of features or high performance demands.
3)
FreeBSD is free. Although there are collateral costs (it’s not particularly
easy to set up) the freedom from license costs is a major consideration,
especially for a startup. The free availability of source also means
that it can be fairly simple (or it can be very difficult) to make
local changes
[3]
.
4)
Easy to minimize. The typical UNIX server is taking care of one task,
not acting as a desktop and development platform for a user. It is
particularly easy to cut down the load on the system so that only
the minimum number of services is running. This reduced complexity
aids stability and transparency.
5)
Transparent. It’s easy to look at a UNIX system and know what is running
and why. Although its configuration files may have arcane (and sometimes
too-simple) syntax, they are easy to find and change.
6)
Preference for text files. Most configuration setups, log files, and
so on, are plain text files with reasonably short line lengths. Although
this may be marginally detrimental to performance (usually in circumstances
where it doesn’t matter) it is a powerful approach because a small,
familiar set of tools, adapted to working with short text lines, can
be used by the administrators for most of their daily tasks. In particular,
favorite tools can be used to analyze all the system’s log files and
error reports.
7)
Powerful but simple scripting languages and tools. Again, familiarity
and consistency among UNIX implementations is the key. Over the years,
UNIX versions have evolved a good set of single-function commands
and shell scripting languages that work well for ad-hoc and automated
administration. The shell scripting languages fall just short of being
a programming language (they have less power than VBScript or JScript).
This may seem to be a disadvantage, but we must remember that operators
are not programmers; having to learn a block-structured programming
language is a resistance point. Scripts that combine executables into
pipelines are simple to build incrementally and experimentally, and
even the experienced Hotmail administrators seem to be taking that
approach for special purpose scripts (using CMD) rather than authoring
with one of the object-oriented scripts. On
the other hand, PERL (another language that has grown organically
with a lot of community feedback) is more of a programming than scripting
language. It is popular for repeated, automated tasks that can be
developed and optimized by senior administrative staff who do have
the higher level of programming expertise required. Consider the above
list of UNIX strengths to be also a list of Windows weaknesses. However,
there are some specific issues that need to be called out.
1)
A GUI bias. Windows 2000 server products continue to be designed with
the desktop in mind. There are too many functions that are either
too difficult or impossible to perform using a text-based interface. Why
is this important? There are several reasons:
n
GUI operations are essentially impossible to script. With large numbers
of servers, it is impractical to use the GUI to carry out installation
tasks or regular maintenance tasks.
n
Text-based operations are more versatile; an administrator can usually
do more to a system (good and bad) than is provided by the restricted,
planned methods using the GUI.
n
There is in place at Hotmail an established secure channel into the
production system, using a text-based secure shell interface.
n
Using a GUI amounts to hiding the true system modifications from the
system administrators and operators. UNIX operators like the sense
of control that comes from their ability to modify system tables and
configuration files more directly.
n
Operating a GUI through a slow network connection can be too slow
to be useful. Although this is less important, it can still be a consideration
when there is a need to administer or diagnose a system through a
dialup connection. There
are, indeed, many non-GUI administrative programs provided in the
core Windows 2000 product and in the Resource Kit. The problem is
that the collection is somewhat arbitrary, incoherent and inconsistent.
Programs seem to have been written to fill an immediate need and there
is stylistic inconsistency and poor feature coverage.
2)
Complexity. A Windows server out of the box is an elaborate system.
Although it performs specific tasks well (such as being a web server)
there are many services that have a complex set of dependencies, and
it is never clear which ones are necessary and which can be removed
to improve the system’s efficiency.
3)
Obscurity. Some parameters that control the system’s operation are
hidden and difficult to fully assess. The metabase is an obvious example.
The problem here is that is makes the administrator nervous; in a
single-function system he wants to be able to understand all of the
configuration-related choices that the system is making on his behalf.
4)
Resource utilization. It’s true that Windows requires a more powerful
computer than Linux or FreeBSD. In practice, this is a less important
constraint. When you are building a large operation, you will use
smaller numbers of relatively powerful systems. The PC systems in
use at Hotmail are perfectly capable of running Windows, and the machine’s
basic power is the same whether it is run with UNIX or Windows. For
most of the time, it is only executing application code and most of
the extra elaboration is not apparent.
5)
Image size. The team was unable to reduce the size of the image below
900MB; Windows contains many complex relationships between pieces,
and the team was not able to determine with safety how much could
be left out of the image. Although disk space on each server was not
an issue, the time taken to image thousands of servers across the
internal network was significant. By comparison, the equivalent FreeBSD
image size is a few tens of MB.
6)
Reboot as an expectation. Windows operations still involves too many
reboots. Sometimes they are unnecessary, but operators reboot a system
rather than take the time to debug it. For example, a service may
be hung, and rather than take the time to find and fix the problem,
it is often more convenient to reboot. By contrast, UNIX administrators
are conditioned to quickly identify the failing service and simply
restart it; they are helped in this by the greater transparency of
UNIX and the small number of interdependencies. Some reboots are demanded
by an application installation, and are not strictly necessary.
7)
License costs. As we will see when discussing load balancing, the
license cost of Windows software is a major consideration when converting
from the unencumbered UNIX implementations. Although there were no
costs to the Hotmail project, as a Microsoft department, the team
did consider the software costs in order to make the conversion a
useful model for future customers.
n
They used Server in preference to Advanced Server (no features of
Advanced Server were necessary).
n
They reluctantly used Services for UNIX and Interix, to get access
to features that were not adequately provided in Windows. Future releases
of Windows will have the features that would make it unnecessary to
add those subsystems and avoid their notional cost.
n
No business analysis was undertaken to determine whether the benefit
of the conversion would outweigh the notional cost of the Windows
licenses.
1)
Windows has more resources behind its development. It does have greater
complexity than the free UNIX distributions, and used wisely (and
with knowledge) that can lead to a more effective solution. For example,
IIS is more self-tuning than Apache. IIS
and Windows have many more tuning parameters than Apache and FreeBSD.
The problem here is to make them comprehensible to new administrators.
2)
The development platform, specifically Visual Studio, is a major advantage.
Even before the conversion to Windows was contemplated, Hotmail developers
used Visual Studio on NT4 to develop and debug their code. The code
was eventually recompiled for UNIX when the first level of testing
was complete. There is nothing approaching the power of Visual Studio
on any UNIX, let alone the free ones, with the possible exception
of the Java development tools. The
superior development platform has also had a positive operational
impact in the live site. In the first days of deployment, some server
threads went into a CPU-consuming loop. Using Visual Studio, Hotmail
developers were able to find the application-level problem in a few
minutes. That would have been impossible using UNIX tools.
3)
Vastly better monitoring infrastructure. UNIX has some rudimentary
event reporting and performance monitoring tools, but nothing to approach
the integrated power of the event logging and performance monitoring
features. Again, it is necessary to use them wisely; event logging
in particular has a human and system overhead that we’ll talk about
later.
4)
Better hardware detection. Setting up UNIX on a new PC is difficult,
requiring a more intimate knowledge of how the hardware is built.
That’s an up-front cost; given the existence of multiple identically
configured systems, cloning an established system doesn’t present
the same problems.
5)
Internationalization. The software tools available in Windows to provide
multiple localized solutions are far ahead of most UNIX systems. Project constraints
The constraints called
out earlier (the 8-week upgrade cycle, the need to keep the service
running, and the small number of staff) produced enough pressure on
the development and administrative staff that the team agreed to devote
one cycle to the platform conversion and not change the application
during that time. This allowed the developers and testers to focus
on the specific conversion issues. During the conversion, the application
itself was the same on both platforms. This means that a user may
have successive pages served by either platform, and not notice the
difference. The same constraints
led to a desire not to change operational practices without good reason,
because of the investment in training staff at all skill levels, and
the feeling that the fewer things were changed, the fewer were the
potential blocking problems. Finally, the economic
necessity of not adding technical staff to the conversion means that
there was no consideration given to major re-architecture of the application. Installation
Methodology Conserved
There is in place
a method of remotely bootstrapping a server to a new OS and application
suite, and converting one rack (21 machines) in about 20 minutes.
Replicating the installation capability was a goal of the project,
and conserving as much as possible of the infrastructure to do it
was strongly desired. Conversion
to ISAPI
The web server application
suite consists of about 90 different transactions, each corresponding
to a click on a web page. Using Apache, each one is implemented as
an executable program using the CGI interface, and run in a separate
process managed and owned by the web server. Processes are the natural
way of encapsulating a single stateless transaction using UNIX. Converting to Windows,
the development team decided not to use the CGI interface to IIS.
Creating a new Windows process is more expensive than creating a UNIX
process. Instead, the team converted the CGI code to run as an ISAPI
application, in which the transactions are processed by code that
(in the most basic implementation) runs within the IIS process. Running in process
will be more efficient than running as a CGI, because the process
creation overhead is avoided. We could have brought that advantage
to UNIX. Apache supports the same concept; the equivalent to an ISAPI
filter is called a module. Naturally, we did not waste time building
the module implementation just to throw it away. Conversion from CGI
to ISAPI was essentially automated by using a filter that effectively
presents the standard CG interface (using data streams and environment
variables) to the user code. Because the application code was well
written and did not make assumptions about its environment, the major
part of the conversion went very smoothly and did not require significant
unexpected engineering
[4]
. There were some intentional pieces of re-engineering:
n
The spell, dictionary, and thesaurus functions were rewritten to use
Microsoft technology from Office and Encarta. The UNIX versions use
binaries from Merriam Webster. The spellcheck feature is much improved;
there are coverage problems with the dictionary data that need to
be addressed.
n
The SMTP service of IIS was used to handle outgoing mail, replacing
a UNIX standard mail service.
n
Virus scanning of attachments used an external UNIX utility from McAfee;
this was replaced by its NT equivalent. The most challenging,
and anticipated, problem with converting from CGI to ISAPI derives
from the forgiving nature of the CGI architecture. Memory leaks, unclosed
files and similar problems can be tolerated, because they are automatically
cleaned up when the CGI process terminates. Even an occasional abort
is tolerated; it results in an invalid page to one customer, but does
not usually affect any other part of the system. By contrast, ISAPI
modules share a process with the web server, as do Apache modules.
Resource leaks will accumulate, and crashes have the potential to
bring down the server (although not the entire service, thanks to
load balancing). There are process isolation techniques available
in IIS to minimize these problems, but the team decided to use the
in-process model for full efficiency. Among the actions taken:
n
Use a private heap that is cleared at the end of each web transaction.
n
In testing, monitor for resource leaks and fix them.
n
Implement an IIS heartbeat monitor that will quickly notice and restart
any failed IIS service. Converting to ASP
was not considered. That would have been a complete rewrite of the
application, with no great advantage (Hotmail does not use a WinDNA
infrastructure, for example). In fact, the implementation uses some
ASP ideas and terms, as much of the user content is determined by
template files that look like ASP files, but the interpretation engine
is completely homegrown. One motivation for borrowing ASP syntax was
to use Microsoft development tools (for example, to aid internationalization). Load Balancing
Technology
Hotmail has a large
investment in Cisco Local Director;
every web access goes to an LD, which redistributes the load among
real servers. Hotmail chose to continue with LD, rather than use the
Windows load balancing technology, because the infrastructure was
in place and did not need to be reconfigured (reducing the learning
curve). Also, LD fits the Hotmail model well; it is possible to place
up to 400 servers behind the virtual address, and each Hotmail cluster
can have over 300 identically configured servers. Another major issue
is the potential cost. Although Hotmail uses Microsoft software without
license fees, we must consider this project as a model for real customers.
Use of WLBS requires Advanced Server, but Server provides all the
other features used by Hotmail. Using list prices, the cost comparison
for a farm of 3500 servers is:
n
Using WLBS (hence Advanced Server): $15M+
n
Using LD and Server: $6M+ This does not take
into account any extra PCs necessary to handle WLBS overhead (administrative,
as well as the cycles needed to redirect the load) or the plans by
Cisco to further reduce the cost of LD by building it into their network
switches. When considered in
the context of a large web farm, WLBS has a serious economic disadvantage
that can only be justified by the value of its administrative and
monitoring tools. There is considerable competition in the IP load
balancing market, which drives costs down; the numbers quoted above
are based on the price we paid in mid-1999, around $17,000 per unit.
An existing system that has load balancing in place will presumably
have adequate tools, so the added value of WLBS, in terms of operational
flexibility and superior monitoring, must be considerable if it is
to be economically justified. OS installation
and configuration
Each of several thousand
systems must be converted to the new operating system and application
suite, and this process must be carried out while the service is operating,
and within a short timespan. Required are a mechanism for packaging
the image and a method for delivering it. Among the special requirements:
n
Each server already has a name and static IP address; to fit in with
existing operating practices and configurations, they should retain
the same name and IP address. Using a static address, compared with
DHCP, makes system administration simpler and more transparent. A
machine’s name relates to its physical position within a cluster.
n
It should be possible to convert a machine without physical access.
n
It should be possible to revert systems quickly to FreeBSD in case
of serious problems with the Windows conversion.
n
Downtime for reboots and service restarts should be minimized. Several technologies
were investigated and rejected. In most cases, there were blocking
issues that were seemingly small, but without guarantee of resolution
the team had to adopt a method that they could control. Some of the
issues were:
n
RIS can be used for automatically installing an image from a server
when a machine is initially booted. Drawbacks include: physical access
is required to the machine (to force a network boot), and the system
requires that an IP address be supplied with DHCP (DHCP is not used
at Hotmail, because of the requirement for static IP addresses). It
was impossible to control the name of the new server as required.
In addition RIS was not supported for installing Server, although
it was known to work.
n
AppCenter is intended for this kind of application. However, the initial
release of AppCenter is targeted for small installations. It also
lacks some features needed by application installation and update.
n
Unattended setup performs a standard installation across the network;
because of all the file copying and calculation involved, it is too
slow. The team opted to
extend an existing technology, “kickstart”. This uses the OS existing
on the machine to bootstrap an image, prepared using sysprep, and
then run scripts to perform the remaining configuration tasks that
need to be carried out after the install. The image copy is sufficiently
fast, and the post-install steps are minimal. IIS configuration
It proves to be difficult
to configure IIS in a precisely controlled way. The metabase is obscure
and poorly documented, and produced too many surprises. Furthermore,
a system created using sysprep does not produce a ready-to-run metabase. Consequently, it was
necessary to construct the metabase by using scripts. The scripts
were a mixture of command files that repeatedly call the mdutil
utility, and some special-purpose pieces of scripting code (VBScript
in this case, although any language that supports COM would work).
The scripts are run as part of the mini-setup step that follows construction
of the operating system on the target computer. |