KNOW THE ODDS
Before we hit the
“Top 10” list below, we should remember the real "No. 1
Rule" of troubleshooting: Most problems are
user-originated.
If the network is
up and the computer boots, 90% of all remaining computer
errors start on the space bar side of the keyboard.
In the words of
MVP (Microsoft Most Valuable Professional) Jim Eshelman;
"This isn’t just techy hubris. We computer support
professionals count ourselves among the
'users' mentioned in this rule. I am certain that I have
been personally responsible for at least 90% of the
problems that have cropped up on my own computers. Not
the hardware, not the software, not Windows, and not
Bill Gates’ insidious plan to take over the world — just
li’l ol’ me."
Realizing this up
front (and dispensing with blame and excuses) gives that
extra bit of mental clarity which will make it easier to
find the real cause, and to learn something useful!


1. “HIT & RUN
GREMLINS”
Some computer
problems appear once and then they never appear again.
These “momentary quirks” or “hit-and-run gremlins.” just
happens when you’re using a computer, don’t worry about
them. Stuff like this If a problem doesn’t occur
more than once, then it is no longer a problem.
A second cousin to
these “transient type gremlins” are matters that never
were problems from the beginning. These are situations
where computer users are relying on some measurement or
informational tool on the computer that seems
to be saying that something is wrong, but there never
really was a problem. One good example in Windows 95,
98, and ME is the observation that System Resources are
low, say, around 60%, and the concern that this shows a
problem, yet on questioning, we usually find that, no,
there hasn’t been a problem with the computer (which is
no surprise). Or, in Windows 2000 and XP, a user may
decide to dig through the Event Viewer, find all sorts
of things listed in the logs, and worry about what the
various events mean. I recommend only consulting the
Event logs when there is an actual problem you’re trying
to track.
“A problem
isn’t a problem unless it’s a problem.”
Don’t make a mountain out of a molehill. And don’t let
the hit-and-run gremlins mow you down!


2. THE ANSWER IS
USUALLY IN THE QUESTION
Define the problem
carefully. Describe the actual behavior more so than
perceived technical speculation. “While I was doing
such-and-such, the following happened” is usually the
best way to zero in on the problem. If you are helping
someone else fix a computer problem, having them
describe to you the exact replication steps is the
fastest way to catch user error!


3. CAN YOU REPLICATE
THE PROBLEM?
Good problem
definition includes details of whether the problem can
be reliably replicated on the original computer, and, if
so, whether it can also be replicated on other
computers. Walk through the process step-by-step, and,
if possible, discover every keystroke and mouse click
that is necessary to cause the problem to recur.
In one sense, this
is just a restatement of the two previous
recommendations. nevertheless, it is a step that is all
too frequently skipped. When I am working with an
end-user by telephone or email in particular, if the
problem isn’t clear in my head from the original
description, I often will ask them to walk me through
it, click-by-click and keystroke-by-keystroke.” Much of
the time this will even let me discover a solution in a
program I’ve never seen and know nothing about! Much of
the rest of the time, it opens the door to further
productive questions. Even if you end up having to pass
the problem on to someone else, you have an
exceptionally useful description.


4. RESTART THE
COMPUTER
If a problem
suddenly appears (with Windows, the network connection,
or an application), restart the computer. This one step
takes care of so many things that it’s silly not to try
it.
This reboot trick
is so helpful, in fact, that it has become a standard
joke in the IT world, that a company’s "IT helpdesk" is
the department with the job of advising end-users to
restart their computers.


5. CUT THE PROBLEM IN
HALF
Reboot Windows in
Safe Mode. If the problem persists, you’ve ruled out a
dozen or so possible causes; and if Safe Mode resolves
it, you have a clear path to further troubleshooting
using clean boot troubleshooting
techniques. The “clean boot” method involves
process-of-elimination of a particular list of items.
An old rule of
thumb stated that if a problem persists in Safe Mode,
it’s a hardware problem; otherwise, it’s a software
problem. That rule is too simplistic, but often can
point you in the right direction. It is however more
reliable to say that if a problem persists in Safe Mode,
it is more likely to be a hardware problem or underlying
damage to Windows itself; and if it does not occur in
Safe Mode then it is probably a software or driver
problem. There are still exceptions, but this is a major
clue.
If Safe Mode is
used, startup programs are not launched and a standard
VGA video driver is used. (No protected mode drivers are
launched.)
TO REBOOT IN SAFE MODE you
need to restart the computer and bring up the Boot Menu.
How to do this varies slightly with different versions
of Windows.
-
Windows 95: Wait until the moment the first
(memory check) screen blinks away and the notice of
loading Windows appears. At that moment, press and
hold F8 until the Boot Menu appears. (A blank
diskette in the floppy drive will halt the computer
at this point if you have difficulty finding the
right moment.)
-
Windows 98 or ME: Press and hold the Ctrl
key anytime during the memory check or other
preliminary self-test. (F8 still works as before on
most systems, but not on all; and the Ctrl key
system is much simpler.)
-
Windows 2000 or XP: It’s back to F8. For
Windows 2000, you must press and hold this just as
the row of dancing white rectangles appears at the
bottom of the screen.
-
For Windows XP: (The
white rectangles don’t appear except when
bringing the computer back up from hibernation.)
Press and hold F8 during the initial memory
test, and wait for the boot menu. Another
convenient way to get to Safe Mode in Windows XP
is to launch MSCONFIG from a Run box, select the
BOOT.INI tab, and check the /SAFEBOOT box, then
reboot.
(Remember to change this back later when you
want to return to Normal Mode startups!)


6. BEWARE OF VIRUSES
& PARASITES
By now, most
computer users know that they have to protect against
viruses. You need a good antivirus
program running on your computer, monitoring and
checking as files are accessed, as well as running
periodic scans on all files. You need to use an
up-to-date virus definition file with this Anti-Virus
program (these are quite often updated every day, so
automated updating of definitions is preferable). The
virus protection on your computer should be so stable
that there is rarely any doubt that you are virus-free.
The only room for doubt being whether a new virus snuck
in before your antivirus software’s manufacturer had a
definition file that would catch it. If suspicious, run
your Anti-Virus program to check the system as part of
zeroing in on a problem that suddenly develops on your
computer. You can also try one or more of these free
online virus scanners.
Kaspersky Online Scanner
,TrendMicro
HouseCall ,
Panda ActiveScan,
Symantec Security Check,
WindowSecurity.com TrojanScan
There are also
non-viral invaders that have become as big a problem as
viruses. In fact because people are less aware of these
and less mindful of protecting themselves, these
parasites may even be a greater risk to their
computer’s health. Adware, spyware, browser hijackers,
automatic diallers, and other forms of non-viral
malware, are often intentionally if misguidedly
installed by the user, some are foisted on you without
your knowledge. A large proportion of these are quite
regularly extremely destructive.
And, since the
majority of them are badly written, they frequently
announce themselves unintentionally by breaking some
functionality on the computer. Therefore, checking for
these is an important early step in troubleshooting
computer problems, especially if the problem appears
suddenly. If there is a
serious browser or Windows Explorer problem not related
to a bad or damaged browser install, failing hardware,
or user error, then at least 90% of the time the problem
will be the result of one of these parasites.
Because Internet
Explorer is integrated into the kernel of all Windows
versions after Win95, these “browser problems” can
manifest as general performance degradation or error
conditions in the Windows shell. If you’ve ruled out the
obvious in troubleshooting browser failures, the
eruption of many error messages, inability to launch
programs, or sudden
serious slowing of your computer, checking for parasites
should possibly be one of your first diagnostic step.
Several of these
parasites are intentionally added to the computer by the
user because the program looks like a cool toy. For
example, Hotbar is a popular browser add-on that causes
big problems on most computers. Many people install
Gator (now renamed Claria) to manage online
passwords. People install the insidious and pernicious
IEPlugin to get “faster, smarter web browsing,” and live
to regret it. And so forth. Other parasites are snuck
onto your computer often without your knowledge. An
important early step in all troubleshooting of Windows
problems, therefore, is the isolation and removal of
such parasites.


7. GET THE HISTORY
In one sense, this
step belongs at the beginning, with defining the
problem. I list it here instead because the previous
steps have done all the easy work and, if they haven’t
solved the problem, it’s time to roll up shirt sleeves.
For that, we need to be sure that we have a good
history!
When did the
problem begin? Does it occur consistently or only
sometimes? Is there a pattern? Does it occur if you try
the same task in an alternate way? (Windows usually has
a number of ways to do the same task.)
What changes
(additions, removals, new configurations) were made to
the computer hardware,
software, or operating system prior to the
problem beginning? (Consider all user-installed
utilities and other malware among these changes.)
Has this problem
occurred in the past? What is the solution that was
found?
If the problem can
be traced to a specific point in time, and to a
particular event (say, the installation of a patch,
program, or driver), reverse the change and test to see
if the problem resolves. If you are using Windows ME or
Windows XP, System Restore is a
powerful tool to take you back to a time just before the
problem started, reversing changes to the Registry and
many other kinds of changes on your system.
(Some other Windows versions have other native tools for
recovering an earlier version of the Registry, and
various third-party utilities exist.)


8. ERROR MESSAGES ARE
YOUR FRIENDS
Nobody wants pain;
but pain serves a very useful function most of the time
of letting you know that something is wrong! The same is
true of error messages. Don’t curse them. Praise them!
Don’t say you hate error messages. Say you love’m. You
don’t want computer errors but, when errors occur, error
messages are your “new best friends.”
Often an error message
is the only thing that can tell you what is going on.
Therefore, you want to get the most information from them
that you can. Windows doesn’t provide this by default. (I
suppose Microsoft understands how much computer users
dislike seeing error messages, so more attention has been
put on having the computer correct itself than on providing
the user with diagnostic information.) You have to make a
couple of changes in Windows to get the best information on
your errors.
First, run Dr.
Watson. Launch it from your Startup folder on every
Windows startup. I recommend this for all Windows versions
that have a Dr. Watson, but it will be especially helpful in
Windows 98 and ME. In those two versions,
Dr. Watson is mature enough to be a very helpful diagnostic
program, and you won’t get all of your error message data
without it. Click the Details button on an error message to
get more information. Record the error message verbatim —
exactly and completely.
In Windows XP,
the default is for Windows to restart itself when a
sufficiently serious problem occurs (or restart a component,
such as the Explorer shell, which fails). It doesn’t display
the error message — it just reboots. Disabling the “restart
on system failure” feature may permit the exact cause of
your problem to be isolated: Right-click on My Computer,
click Properties, click the Advanced tab. Under “Startup &
Recovery,” click Settings. Under “System Failure,” uncheck
the box in front of “Automatically restart.”
Know and use any other
error diagnostic information available to you. This will
vary with varying Windows versions. For example, in
Windows 2000 and XP, use the Event Viewer to view
event diagnostics. (Logged errors are marked by a red circle
with a white X.) The fastest way to the Event Viewer is to
launch EventVwr.msc from a Run
box. You can also get there by right-clicking on My
Computer, selecting Manage,
then picking System Tools | Event Viewer.
Another example of useful logged data is the bootlog
which can be created by all Windows versions, and which
often proves helpful in startup (and some shutdown) issues.


9. SCAN YOUR HARD
DRIVE
Especially if the
issue seems to be related to the Windows file system,
the integrity of one or more files, or your ability to
access information on the hard drive, run
ScanDisk in Windows 95/98/ME, or ChkDsk
in Windows 2000/XP.


10. IS IT REALLY
RANDOM?
Computer hangs,
error messages appear, and similar failures that occur
randomnly and unpredictably are usually
hardware failures. The test for randomness is whether
there is a reliable
replication, whether you can identify the
circumstances where the error occurred and replicate
them more or less at will. For example, the following
presenting problem is probably a software problem: “If
I’m running Program X while online and try to paste
text, the program throws me out and I eventually have to
reboot to get it to work right.” However, it is probably
a hardware issue if the problem is, “Sometimes when I
try to paste text from the clipboard, Windows hangs or
throws an error message, but it only happens sometimes
and doesn’t seem to be with any particular program or
any specific circumstances I can see.”
Of course, some
hardware problems are replicable and
nonrandom. However, random problems are almost always
hardware issues. If you are in a support position, try
to identify patterns that the user may have missed, and
to rule out possible contributing factors to narrow the
focus. Windows troubleshooting is often like basic
policework: Build the biggest list of suspects you can,
then eliminate as many as possible and see what remains
on the list!

