ZA200601937B

ZA200601937B - System and methods for automated computer support

Info

Publication number: ZA200601937B
Application number: ZA200601937A
Authority: ZA
Inventors: Hooks David Eugene
Original assignee: Chorus Systems Inc
Priority date: 2003-08-11
Filing date: 2004-08-11
Publication date: 2007-05-30
Also published as: CN1860476A; CN1860476B; CN1856781A; ZA200601938B

Description

SYSTEMS AND METHODS FOR AUTOMATED COMPUTER SUPPORT

CROSS-REFERENCE TO RELATED APPLICATIONS > This application claims the benefit of U.S. Provisional Application No. 60/494,225, filed August 11, 2003, and U.S. Application Serial No. __ /___ , Attorney Docket No. 52270/30-2840, filed herewith, entitled “Systems and Methods for Creation and Use of an

Adaptive Reference Model,” the entirety of both of which -are hereby incorporated by reference=.

FIELD OF THE INVENTION

T he present invention relates generally to systems sand methods for automated computer support.

BACKGROUND

Aas information technology continues to increase ira complexity, problem management costs wil 1 escalate as the frequency of support incidents risses and the skill set requirements for human analysts become more demanding. Conventiornal problem management tools are designed to reduce costs by increasing the efficiency of th e humans performing these support tasks. This is typically accomplished by at least partially =automating the capture of trouble ticket information and by facilitating access to knowledge= bases. While useful, this type of automation has reached the point of diminishing returns a=s it fails to address the fundamental weakness in the support model itself, its dependence on hwumans. able 1 illustrates the distribution of labor costs associated with incident resolution in the conv-entional, human-based support model. The data sshown is provided by Motive

Commumications, Inc. of Austin, Texas (www.motive.corm), a major supplier of help desk softwares. The highest cost items are those associated witlh tasks that require human analysis and/or irateraction (e.g. Diagnosis, Investigation, Resolution).

Table 1:

Sfnple and Repeated Problems (0%)

Networking and Comestiviy | __ 7% "Complex & Dynamic Problems (0%) |__

Wa 2005/020001 PCT/US2004/026186

Conventional software solutions for automated pr-oblem management endeavor to decre. ase these costs and add value across a wide range of service levels. Forrester Research,

Inc. oof Cambridge, MA (www fortester.com) provides a ~useful characterization of these service levels. Forrester Research divides conventional &automated computer support soluti_ons into five service levels, including; (1) Mass-He-aling — solving incidents before they occur; (2) Self-Healing — solving incidents when they oc-cur; (3) Self-Service — solving incidents before a user calls; (4) Assisted Service — solvimg incidents when a user calls; and (5) D esk-side Visit — solving incidents when all else fails. According to Forrester, the cost per imxcident using a conventional self-healing service is Mess than one dollar. However, the cost quickly escalates, reaching more than three hundred dollars per incident if a desk-side visit As eventually required.

The objective of Mass Healing is to solve inciderts before they occur. In conve=ntional systems, this objective is achieved by maki-ng all PC configurations the same, or at a rninimum, ensuring that a problem found on one PC cannot be replicated on any other

PCs. Conventional products typically associated with th_is service level consist of software distri bution tools and configuration management tools. Security products such as anti-virus scanrers, intrusion detection systems, and data integrity checkers are also considered part of this leevel since they focus on preventing incidents from coccurring,.

The conventional products that attempt to addres s this service level operate by constraining the managed population to a small number «of known good configurations and by detecting and eliminating a relatively small number of kmnown bad configurations (e.g. virus signatures). The problem with this approach is that it as=sumes that: (1) all good and bad configurations can be known ahead of time; and (2) once they are known that they remain relati_vely stable. As the complexity of computer and nestworking systems increases, the stability of any particular node in the network tends to deecrease. Both the hardware and software on any particular node is likely to change frequ_ently. For example, many software prodi_1cts are capable of automatically updating themselv~es using software patches accessed over an internal network or the Internet. Since there are an infinite number of good and bad configurations and since they change constantly, these ceonventional self-healing products can never" be more than partially effective. i

Further, virus authors continue to develop more and more clever viruses.

Conventional virus detection and eradication software depends on the ability to identify a known pattern to detect and eradicate a virus. However, as the number and complexity of viruses increases, the resources required to maintain a database of known viruses and fixes for those viruses combined with the resources required to distribute the fixes to the population of nodes on a network becomes overwhelming. In addition, a conventional PC utilizing a Microsoft Windows operating system. includes over 7,000 system files and over 100,000 registry keys all of which are multi-valued. Accordingly, for all practical purposes, an infinite number of good states and an infinite nurmber of bad states may exist, making the task of identifying the bad states more complicated.

The objective of the Self-Healing level is to sense and automatically correct problems before they result in a call to the help desk, ideally before the user is even aware that a problem exists. Conventional Self-Healing tools and utilities have existed since the late 80s when Peter Norton introduced a suite of PC diagnostics and repair tools (www.Symantec.com). These tools also include tools that allow a user to restoreaPCto a restore point set prior to installation of a new product. However, none of the conventional tools work well under real world conditions.

One fundamental problem of these conventional tools is the difficulty in creatinga reference model with sufficient scope, granulari ty, and flexibility to allow “normal” to be reliably distinguished from “abnormal”. Compounding the problem is the fact that the : definition of “normal” must constantly change as new software updates and applications are deployed. This is a formidable technical challemge and one that has yet to be conquered by - any of the conventional tools.

The objective of the Self-Service level is to reduce the volume of help desk calls by providing a collection of automated tools and kxrowledge bases that enable end users to help themselves. Conventional Self-Service products consist of “how to” knowledge bases and collections of software solutions that automate low risk, repetitive support functions such as resetting forgotten passwords. These conventional solutions have a significant downside in that they increase the likelihood of self-inflicted damage. For this reason they are limited to specific types of problems and applications.

The objective of the Assisted Service lewel is to enhance human efficiency by providing an automated infrastructure for managing a service request and by providing capabilities to remotely control a personal computer and to interact with end users.

Conventional Assisted Service products include help desk software, online reference materials, and remote control software.

While the products at this service level are peerhaps the most mature of the conventional products and solutions described hereir, they still fail to fully meet the requirements of users and organizations. Specifically, the ability of these products to automatically diagnose problems is severely limited both in terms of the types of problems that can be correctly identified as well as the accuracy of the diagnosis (often multiple choice).

A Desk-Side Visit becomes necessary when all else fails. This service level includes any “hands-on” activities that may be necessary to restore a computer that cannot be diagnosed/repaired remotely. It also includes trackimg and managing these activities to ensure timely resolution. Of all the service levels, this level is most likely to require significant time from highly trained, and therefore expensive, human resources.

Conventional products at this level consist off specialized diagnostic tools and software products that track and resolve customer problems over time and potentially across multiple customer service representatives.

Thus, what is needed is a paradigm shift, which is necessary to significantly reduce support costs. This shift will be characterized by thes emergence of a new support model in which machines will serve as the primary agents for- making decisions and initiating actions.

SUMMARY

Embodiments of the present invention provicle systems and methods for automated computer support. One method according to one embodiment of the present invention comprises receiving a plurality of snapshots from a plurality of computers, storing the plurality of snapshots in a data store, and creating ar adaptive reference model based at least in part on the plurality of snapshots. The method further comprises comparing at least one of the plurality of snapshots to the adaptive reference rmodel, and identifying at least one anomaly based on the comparison. In another embosdiment, a computer-readable medium (such as, for example random access memory or a ceomputer disk) comprises code for carrying out such a method.

These embodiments are mentioned not to linmit or define the invention, but to provide examples of embodiments of the invention to aid un_derstanding thereof. Hlustrative embodiments are discussed in the Detailed Description, and further description of the invention is provided there. Advantages offered by the various embodiments of the present invention may be further understood by examining this specification.

BRIEF DESCRIPTION OF THE FIGURES

These and other features, aspects, and advantages of the present invention are better understood when the following Detailed Description is read with reference to the accompanying drawings, wherein:

Figure 1 illustrates an exemplary environment for implementation of one embodiment of the present invention;

Figure 2 is a block diagram illustrating a flow of information and actions in one embodiment of the present invention;

Figure 3 is a flow chart illustrating an overall process of anomaly detection in one embodiment of the present invention; and

Figure 4 is a block diagram illustrating conaponents of an adaptive reference model in one embodiment of the present invention; i

Figure 5 is a flow chart illustrating a process of normalizing registry information on a agent in one embodiment of the present invention;

Figure 6 is a flow chart illustrating a method for identifying and responding to an anomaly in one embodiment of the present invention;

Figure 7 is a flow chart illustrating a process for identifying certain types of anomalies in one embodiment of the present invention;

Figure 8 is a flow chart illustrating a process for generating an adaptive reference model in one embodiment of the present invention;

Figure 9 is a flow chart, illustrating a process for proactive anomaly detection in one embodiment of the present invention;

Figure 10 is a flow chart, illustrating a reactive process for anomaly detection in one embodiment of the present invention;

Figure 11 is a screen shot of a user interface for creating an adaptive reference model in one embodiment of the present invention;

Figure 12 is a screen shot of a user interface for managing an adaptive reference model in one embodiment of the present invention;

Figure 13 is a screen shot of a user interface for selecting a snapshot to use for creation of a recognition filter in one embodiment of the present invention;

Figure 14 is a screen shot of a user interface for managing a recognition filter in one embodirment of the present invention;

Figure 15 is a screen shot illustrating a user interface for selecting a “golden system” for use im a policy template in one embodiment of the present inventi=on; and

Figure 16 is a screen shot of a user interface for selecting policy template assets in one embodirment of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention provide systems and romethod for automated computer support. Referring now to the drawings in which like numeerals indicate like element s throughout the several figures, Figure 1 is a block diagram -illustrating an exemplary environmment for implementation of one embodiment of the present iravention. The embodiment shown includes an automated support facility 102. Altlmough the automated support facility 102 is shown as a single facility in Figure 1, it may ceomprise multiple facilities or be incorporated into the site where the managed populati on resides. The automat-ed support facility includes a firewall 104 in communication with a network 106 for providirg security to data stored within the automated support facility 102. The automated support facility 102 also includes a Collector component 108. The Collector component 108 providess, among other features, a mechanism for transferring data in and out of the automated support facility 102. The transfer routine may use a standard protocol such as file transfer protocol (FTP) or hypertext transfer protocol (HTTP) or may use a proprietary protocol. The

Collector component also provides the processing logic necessary to download, decompress, and parsse incoming snapshots. "The automated support facility 102 shown also includes an A-mnalytic component 110 in communication with the Collector component 108. The Analytic ecomponent 110 includes hardwar—e and software for implementing the adaptive reference modeel described herein and storing ®the adaptive reference model in a Database component 112. “The Analytic component 110 extr=acts adaptive reference models and snapshots from a Databasse component 112, analyzess the snapshot in the context of the reference model, identifie=s and filters any anomalies, and transmits response agent(s) when appropriate. The A.nalytic component 110 also pro vides the user interface for the system. ~The embodiment shown also includes a Database component 112 in communication with thes Collector component 108 and the Analytic component 110. The Database compon_ent 112 provides a means for storing data from the agents anad for the processes perform _&d by an embodiment of the present invention. A primary fu nction of the Database component may be to store snapshots and adaptive reference models. It includes a set of database tables as well as the processing logic necessary to automatically manage those tables. The embodiment shown includes only” one Database component 112 and one Analytic component 110. Other embodiments include many Database and or Analytic components 112, 110. One embodiment includes one Database component and multiple Analytic components, allowing multiple support persomnel to share a single database while performing parallel analytical tasks.

An embodiment of the present inventi on provides automated support to a managed population 114 that may comprise a plurality of client computers 116a,b. The managed population provides data to the automated support facility 102 via the network 106.

In the embodiment shown in Figure 1, an Agent component 202 is deployed within each monitored machine 116a, b. The Agent component 202 gathers data from the client 116. At scheduled intervals (e.g., once per day) or in response to a command from the

Analytic component 110, the Agent component 202 takes a detailed snapshot of the state of the machine in which it resides. This snapshot includes a detailed examination of all system files, designated application files, the registry”, performance counters, processes, services, communication ports, hardware configuration, and log files. The results of each scan are then compressed and transmitted in the form of a Snapshot to a Collector component 108.

Each of the servers, computers, and network components shown in Figure 1 comprise : processors and computer-readable media. As is well known to those skilled in the art, an ; embodiment of the present invention may be configured in numerous ways by combining multiple functions into a single computer or alternatively, by utilizing multiple computers to perform a single task.

The processors utilized by an embodiment of the present invention may include, for example, digital logic processors capable of processing input, executing algorithms, and generating output as necessary in support of processes according to the present invention.

Such processors may include a microprocessor, an ASIC, and state machines. Such processors include, or may be in communication with, media, for example computer-readable media, which stores instructions that, when executed by the processor, cause the processor to perform the steps described herein.

Embodiments of computer-readable maedia include, but are not limited to, an electronic, optical, magnetic, or other storage or transmission device capable of providing a processor, such as the processor in communication with a touch-sensitive input device, with computer-readable instructions. Other examples of suitable media include, but are not limited to, =a floppy disk, CD-ROM, magnetic disk, memory chip, FROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medimum from which a computer processor can read instructions. Also, various other forms of computer-readable media may transmit or carry instructio-ns to a computer, including am router, private or public network, or other transmission. device or channel, both wired and =wireless. The instructions may comprise code from any computer-programming language, ®ncluding, for example, C, C#, C++, Visual Basic, Java, and J avaScript.

Figure 2 is a block diagram illustrating a flow of information and actions in one “ embodiment of the present invention. The embodiment shown cormprises an Agent componenct 202. The Agent component 202 is the part of the syste=m that is deployed within each monitored machine. It may perform three major functions. First, it may be responsible for gathering data. The Agent component 202 may perform an extensive scan of the client machine 1 16a,b at scheduled intervals, in response to a command from the Analytic componemt 110, or in response to events of interest detected by thes Agent component 202. 15 . This scan may include a detailed examination of all system files, designated application files, the registry, performance counters, hardware configuration, logs, sunning tasks, services, network ¢ onnections, and other relevant data. The results of each scan are compressed and transmitte=d over network 106 in the form of a “snapshot” to the Collector component 108.

In one embodiment, the Agent component 202 reads every byte of files to be examined and creates a digital signature or hash for each file. The digital signature identifies the exact scontents of each file rather than simply providing metad_ata, such as the size and the creation date. Some conventional viruses change the file header i: nformation in an attempt to fool systems that rely on metadata for detection. Such an embodi-ment is able to successfully detect sucsh viruses.

The scan of the client by the Agent component 202 may b-e resource intensive. In one embodim-ent, a full scan is performed periodically, eg. daily, dur-ing a time when the user is not using the client machine. In another embodiment, the Agent component 202 performs a delta-scar of the client machine, logging only the changes from thhe last scan. In another _embodim_ent, scans by the Agent component 202 are executed on demand, providing a valuable #00] for a technician or support person attempting to renedy an anomaly on the client ma~chine.

Thhe second major function performed by the agent 202 is that of behavior blocking,

The agen=t 202 constantly (or substantially constantly) monitors a.ccess to key system resourcess such as system files and the registry. It is able to selec®ively block access to these resources in real time to prevent damage firom malicious software. While behavior monitoring occurs on an ongoing basis, bechavior blocking is enabled as part of a repair action. For example, if the Analytic component 110 suspects the presence of a virus, it can download a repair action to cause the clierit to block the virus from accessing key information resources within the managed system. The client component 202 provides information fronn the monitoring process as part of the snap shot.

The third major function performed by the Agent component 202 is to provide an execution environment for response agents. Response agents are mobile software components that implement automated procedures to address various types of trouble conditions. For example, if the Analytic component 110 suspects the presence of a virus, it can download a response agent to cause the Agent component 202 to remove the suspicious assets from the managed system. The Agent component 202 may run as a service or other background process on the computer beirag monitored. Because of the scope and granularisty of information provided by an embodiment of the present invention, repair can be performed more accurately than with conventional systems. Although described in terms of a client, the managed population 114 may comprise PC’s workstations, servers, or any other type of computer.

The embodiment shown also includes an adaptive reference model component 206

One difficult technical challenge in building an automated support product is the creation ©fa reference model that can be used to distizaguish between normal and abnormal system states.

The system state of a modern computer i s determined by many multi-valued variables and consequently there are virtually a near-irafinite number of normal and abnormal states. To make matters worse these variables cham ge frequently as new software updates are deploy-ed and as end users communicate. The adaptive reference model 206 in the embodiment sho~wxn analyzes the snapshots from many comp uters and identifies statistically significant pattern_s using a generic data mining algorithm or a proprietary data mining algorithm designed specifically for this purpose. The resulting rule set is extremely rich (hundreds of thousan_ds : of rules) and is customized to the uniques characteristics of the managed population. In the embodiment shown, the process of building a new reference model is completely automat=ic and can be executed periodically to allow the model to adapt to desirable changes such as the planned deployment of a software update. :

Since the adaptive reference model 206 is used for the analysis of statistically significant patterns from a population of machines, in one embodiment, a minimum numbwer of machines are analyzed to ensure the accuracy of the statistical measures. In one embodiment, a minimum population of approximately 50 machine=s is tested to achieve systemically relevant patterns for analysis of the machines. Once a reference is established, samples can be used to determine if anything abnormal is occurrimmg within the entire populatiom or any member of the population.

In another embodiment, the Analytic component 110 calcizalates a set of maturity metrics thmat enable the user to determine when a sufficient number of samples have been accumulated to provide accurate analysis. These maturity metricss indicate the percentage of available relationships at each level of the model that have met pr- edefined criteria correspormding to various levels of confidence (c.g. High, Mediunm, and Low). In one such embodim_ent, the user monitors the metrics and ensures that enough snapshots have been assimilated to create a mature model. In another such embodimemt, the Analytic component 110 assimilates samples until it reaches a predefined maturity goaml set by the user. In either such embsodiment, it is not necessary to assimilate a certain number of samples (e.g. 50).

Thhe embodiment shown in Figure 2 also comprises a Poliecy Template component 208. The= Policy Template component 208 allows the service pro—vider to manually insert rules in tBhe form of “policies” into the adaptive reference model. “Policies are combinations of attributes (files, registry keys, etc.) and values that when applied —to a model, override a portion o fthe statistically generated information in the model. Tkais mechanism can be used to automaate a variety of common maintenance activities such as werifying compliance to security policies and checking to ensure that the appropriate softvovare updates have been installed.

When something goes wrong with a computer, it often impacts a number of different information assets (files, registry keys, etc.). For example, a “Traojan” might install malicious files, add certain registry keys to ensure that those files are executed, and open ports for commun-ication. The embodiment shown in Figure 2 detects thes. € undesirable changes as anomalies by comparing the snapshot from the infected machine with the norm embodied in the adaptive reference model. An anomaly is defined as an unexpectedly present asset, an unexpectedly absent asset, or an asset that has an unknown value=. Anomalies are matched against a_ library of Recognition Filters 216. A Recognition Filter= 216 comprises a particular pattern o f anomalies that indicates the presence of a particular ro-ot cause condition or a generic c=lass of conditions. Recognition Filters 216 also associat-e conditions with a severity indicatio-n, a textual description, and a link to a response agent. Imn another embodiment, a

Recognition Filter 216 can be used to identify and interpret benigzn anomalies. For example, if a user adi ds a new application that the administrator is confident will not cause any problems,

the system according to the presemt invention will still report the new applicatieon as a set of anomalies. If the application is new, then reporting the assets that it adds as armomalies is correct. However, the administrator can use a Recognition Filter 216 to interpmret the anomalies produced by adding th_e application as benign.

In an embodiment of the goresent invention, certain attributes relate to c-ontinuous processes. For example, the perf=ormance data are comprised of various countesrs. These counters measure the occurrence of various events over a particular time perio-d. To determine if the value of such a counter is normal across a population, one embodiment of the present invention computes a me an and standard deviation. An anomaly is deeclared if the value of the counter falls more ttman a certain number of standard deviations avvay from the mean.

In another embodiment, za mechanism handles the case in which the ad aptive reference model 206 assimilates a snapshot containing an anomaly. Once a m_odel achieves the desired maturity level it unde=rgoes a process that removes anomalies that ruay have been assimilated. These anomalies aree visible in a mature model as isolated except—ions to strong relationships. For example, if fil.e A appears in conjunction with file B in 999 machines but in 1 machine file A is present bu-t file B is missing, the process will assume th=at the later relationship is anomalous and it -will be removed from the model. When the nmodel 1s subsequently used for checking, any machine containing file A, but not file B., will be flagged as anomalous.

The embodiment of the imvention shown in Figure 2 also includes a re=sponse agent library 212. The response agent library 212 allows the service provider to aut"hor and store automated responses for specific= trouble conditions. These automated respons=ses are constructed from a collection of scripts that can be dispatched to a managed nachine to perform actions like replacing a file or changing a registry value. Once a trou ble condition has been analyzed and a response agent has been defined, any subsequent occurrence of the same trouble condition should bes corrected automatically.

Figure 3 is a flow chart iRlustrating an overall process of anomaly detection in one embodiment of the present invermtion. In the embodiment shown, the Agent ceomponent (202) performs a snapshot on a periodi_c basis, e.g., once per day 302. This snapshot involves collecting a massive amount of Hata and can take anywhere from a few minutees to hours to execute, depending on the configuration of the client. When the scan is compelete the results are compressed, formatted, and transmitted in the form of a snapshot to a secuare server known as the Collector component 304. The Collector component acts as a central repository for all of the snapshots being submitted from the managed psopulation. Each snapshot is then dec=ompressed, parsed, and stored in various tables in the da-tabase by the Collector cormponent

The detection function (218) uses the data stored in “the adaptive reference model cormponent (206) to check the contents of the snapshot agaimnst hundreds of thousands of statistically relevant relationships that are known to be normal for that managed population 3083. If no anomaly is found 310, the process ends 324.

If an anomaly is found 310, the Recognition Filters -(210) are consulted to determine if thes anomaly matches any known conditions 312. If the ans-wer is yes, then the anomaly is reported according to the condition that has been diagnosed 314. Otherwise, the anomaly is reported as an unrecognized anomaly 316. The Recognition Filter (216) also indicates whether or not an automated response has been authorized : for that particular type of comdition 318.

In one embodiment, the Recognition Filters (216) c=an recognize and consolidate multiple anomalies. The process of matching Recognition WFilters to anomalies is performed after the entire snapshot has been analyzed and all anomali-es associated with that snapshot ha~-ve been detected. If a match is found between a subset of anomalies and a Recognition

Filter, the name of the Recognition Filter will be associatecl with the subset of anomalies in thes output stream. For example, the presence of a virus mi ght generate a set of file an_omalies, process anomalies, and registry anomalies. A FRecognition Filter could be used to consolidate these anomalies so that the user would simply =see a descriptive name relating all thes anomalies to a likely common cause, i.e. a virus.

If automated response has been authorized, then thes response agent library (212) dos wnloads the appropriate response agents to the affected mmachine 320. The Agent co-mponent 202 in the affected machine then executes the ssequence of scripts needed to co=1rect the trouble condition 322. The process shown thenu ends 324.

Embodiments of the present invention substantiallyw reduce the cost of maintaining a powpulation of personal computers and servers. One embodiment accomplishes this objective by automatically detecting and correcting trouble conditioms before they escalate to the help de=sk and by providing diagnostic information to shorten thme time required for a support am alyst to resolve any problems not addressed automatically.

Anything that reduces the frequency at which incidents occur has a significant positive impact on the cost of computer support. One embwodiment of the present invention meonitors and adjusts the state of a managed machine so thaat it is more resistant to threats.

Using Policy Templates, service providers can routinely monitor the security posture of every managed s<ystem, automatically adjusting security settings- and installing software updates to eliminate k=mown vulnerabilities.

In = human-based support model, trouble conditioms are detected by end users, reported to= a help desk, and diagnosed by human experts. This process accrues costs in a number of ways. First, there is cost associated with lost productivity while the end user waits for resolution. Also, there is the cost of data collection, u_sually performed by help desk personnel. Additionally, there is the cost of diagnosis, which requires the services of a trained (exxpensive) support analyst. In contrast, a machire-based support model implemented according to the present invention senses, reports, and diagnoses many software related trouble conditions automatically. The adaptive resference model technology enables detection ef anomalous conditions in the presence of extr—eme diversity and change with a sensitivity» and accuracy not previously possible.

In one embodiment of the present invention, to prevent false positives, the system can be configiared to operate at various confidence levels, aned anomalies that are known to be benign cam be filtered out using Recognition Filters. Rec ognition Filters can also be used to alert the s ervice provider to the presence of specific type-s of undesirable or malicious software.

In_ conventional systems, computer incidents are -usually resolved by humans through the appliczation of a series of trial and error repair actions. These repair actions tend to be of the “sledze hammer” variety, i.e. solutions that affect fawr more than the trouble conditions they weres intended to correct. Multiple choice repair preaocedures and sledgehammer solutions are a conssequence of an inadequate understanding of thes problem and a source of unnecessary cost. Because a system according to the preesent invention has the data to fully characterize the problem, it can reduce the cost of repair- in two ways. First, it can automatically resolve the incident if a Recognition Filtemr has been defined that specifies the required sautomated response. Second, if automatic repair is not possible, the system’s diagnosti_c capabilities eliminate the guesswork inherent in the human-based repair process, reducing execution time and allowing greater precision.

Figure 4 is a block diagram illustrating compon ents of an adaptive reference model in one embodiment of the present invention. Figure 4 is merely exemplary.

The embodiment shown in Figure 4 illustrates a multi-layer, single-silo adaptive references model 402. In the embodiment shown, the sil_o 404 comprises three layers: the value lay=er 406, the cluster layer 408, and the profile lazyer 410.

The value layer 406 tracks the values of asset/value pairs provided by the Agent component (202) described herein across the managed population (114) of Figure 1. ‘When a snagoshot is compared to the adaptive reference model 402, the walue layer 406 of the adaptive refesrence model 402 evaluates the value portion of each asset/value pair contained therein.

Thi_s evaluation consists of determining whether any asset values in the snapshot violates a staistically significant pattern of asset values within the managed population as represented by the adaptive reference model 402.

For example, an Agent (116b) transfers a snapshot that includes a digital signature for a particular system file. During the assimilation process (when. the adaptive reference model is being constructed) the model records the values that it encounters for each asset name and thes number of times that that value is encountered. Thus, for every asset name, the model kn ows the “legal” values that it has seen in the population. Wien the model is used for checking, the value layer 406 determines if the value of each attribute in the snapshot matches one of the “legal” values in the model. For example, in the case of a file, a number of” “legal” values are possible because various versions of the £ile might exist in the managed population. An anomaly would be declared if the model contained one or more file values th-at were statistically consistent and the snapshot contained a file value that did not match amy of the file values in the model. The model can also detect situations where there is no “leegal” value for an attribute. For example, log files don’t hawe a legal value since they change frequently. If no "legal" value exists, then the attribute value in the snapshot will be igznored during checking.

In one embodiment, adaptive reference model 402 implements criteria to ensure than ar anomaly is truly an anomaly and not just a new file variant=. The criteria may include a confidence level. Confidence levels do not stop a unique file from being reported as an amomaly. Confidence levels constrain the relationships used iin the model during the checking process to those relationships that meet certain criteria. The criteria associated with esach level are designed to achieve a certain statistical probabi lity. For example, in one eambodiment, the criteria for the high confidence level are dessigned to achieve a statistical p robability of greater than 90%. If a lower confidence level i_s specified, then additional reclationships that are not as statistically reliable are included @n the checking process. The perocess of considering viable, but less likely, relationships is similar to the human process of speculating when we need to make a decision without all the information that would allow us teo be certain. In a continuously changing environment, the aciministrator may wish to filter out the anomalies associated with low confidence levels, i.e., th_e administrator may wish to eliminate as man-y false positives as possible.

In an embodiment that implements the confidence level, if a user reports that something is wrosng with a machine, but the administrator is un able to see any anomalies at the default conficience level, the administrator can lower the comfidence level, enabling the analysis process to consider relationships that have lower statistical significance and are ignored at higher confidence levels. By reducing the confidence level, the administrator allows the adapt ve reference model 402 to include patterns thamt may not have enough samples to be statistically significant but might provide clues a s to what the problem is. In other words, the administrator is allowing the machine to spectalate.

In anothe=r embodiment, the value layer 406 automatically eliminates asset values from the adaptive reference model 402 if, after assimilating a s=pecified number of snapshots, the asset values Hhave failed to exhibit any stable pattern. For example, many applications generate log filess. The values of log files constantly change armd are rarely the same from machine to machine. In one embodiment, these file values are evaluated initially and then after a specified number of evaluations, they are eliminated frosm the adaptive reference model 402. By eliminating these types of file values from the -model 402, the system eliminates unnecessary comparisons during the detection process 218 and reduces database storage requirements by pruning out low value information.

An embodiment of the present invention is not limited —to eliminating asset values from the adaptiv=e reference model 402. In one embodiment, tEne process also applies to the asset names. Ce=rtain asset names are “unique by nature”, that is they are unique to a particular machi ne but they are a byproduct of normal operaticon. In one embodiment, a separate process. handles unstable asset names. This process ira such an embodiment identifies asset ames that are unique by nature and allows the-im to stay in the model so that they are not reported as anomalies.

The secoend layer shown in Figure 4 is the cluster layer 408. The cluster layer 408 tracks relationshuips between asset names. An asset name can apply to a variety of entities including a file mname, a registry key name, a port number, a pr-ocess name, a service name, a performance coutinter name, or a hardware characteristic. Whe=n a particular set of asset names is generally present in tandem on the machines in a mamaged population (114), the cluster layer 4083 is able to flag an anomaly when a member oft the set of asset names is absent.

For example, many applications on a computer ex_ecuting a Microsoft Windows operating system require a multitude of dynamic link libraries (DLL). Each DLL will often depend on one or more other DLL’s. If the first DLL is poresent, then the other DLL’s must be present as well. The cluster layer 408 tracks this depe=ndency and if one of the DLL’s is missing or altered, the cluster layer 408 alerts the administrator that an anomaly has occurred.

The third layer in the adaptive reference model 4CJ2 shown in Figure 4 is the profile layer 410. The profile layer 410 in the embodiment shovvn detects anomalies based on violations of cluster relationships. There are two types of relationships, associative (the clusters appear together) and exclusionary (the clusters mever appear together). The profile layer 410 allows the adaptive reference model to detect rnissing assets not detected by the cluster layer as well as conflicts between assets. The profile layer 410 determines which clusters have strong associative and exclusionary relatiomships with one another. In such an embodiment, if a particular cluster is not detected in a s;aapshot where it would normally be expected by virtue of the presence of other clusters with which it has strong associative relationships, then the profile layer 410 detects the abserce of that cluster as an anomaly.

Likewise, if a cluster is detected in a snapshot where it vavould not normally be expected because of the presence of other clusters with which it h.as strong exclusionary relationships, then the profile layer 410 detects the presence of the firsst cluster as an anomaly. The profile layer 410 allows the adaptive reference model 402 to destect anomalies that would not be detectable at the lower levels of the silo 404.

The adaptive reference model 402 shown in Figrare 4 may be implemented in various ways that are well known to those skilled in the art. By optimizing the processing of the adaptive reference model 402 and by providing sufficiemnt processing and storage resources, an embodiment of the present invention is able to support an unlimited number of managed populations and individual clients. Both the assimilation of 2 new model and the use of the model in checking involve the comparison of hundreds of thousands of attribute names and values. Performing these comparisons using the text starings for the names and values is a very demanding processing task. In one embodiment of the present invention, every unique string in an incoming snapshot is assigned an integer id _entifier. The comparisons are then performed using the integer identifiers rather than the s#rings. Because computers can compare integers much faster than the long strings associated with file names or registry key names, processing efficiency is greatly enhanced.

The adaptive reference model 402 relies on datas from the Agent component (202).

The functionality of the Agent component (202) is desccribed above, which is a functional summary Of the user interface and the Agent component (2022) in one embodiment of the present invention.

Amz embodiment of the present invention is able to compare registry entries across the client machines in a managed population. One difficulty in comparing registry keys across different paachines running a Microsoft Windows operating system derives from the use of a

Global Umique Identifier (“GUID”). A GUID for a particul=ar item on one machine may differ fronm the GUID for the same item on a second machire. Accordingly, an embodiment of the pressent system provides a mechanism for normalizing the GUID’s for comparison

PUIpPOSES.

Figure S is a flow chart illustrating a process of normalizing registry information on 2 client in cene embodiment of the present invention. In the e-mbodiment shown, the GUID’s are first grouped into two groups 502. The first group is fooxr GUID’s that are non-unique (duplicatesd) across machines in the managed population. The second group includes GUID’s that are umnique across machines, i.e., the same key has a different GUID on different . machines within the managed population. The keys for thes second group are next sorted 504.

In this way, the relationship among two or more keys withizn the same machine can be identified. The intent is to normalize such relationships in a way that will allow them to be compared across multiple machines.

THe embodiment shown next creates a hash for the values in the keys 506. This creates a unique signature for all the names, pathnames, an_d other values contained in the key. The= hash is then substituted for the GUID 508. In thilis manner, uniqueness is maintained within the machine, but the same hash appears in every machine so that the relationship can be identified. The relationship allows the adaptive reference model to identify zanomalies within the managed population.

F or example, conventional viruses often change reggistry keys so that the infected machine will run the executable that spreads the virus. Ara embodiment of the present inventior is capable of identifying the changes to the regisstry in one or more machines of the population due to its ability to normalize registry keys.

Figure 6 is a flow chart illustrating a method for id_entifying and responding to an anomaly in one embodiment of the present invention. In tThe embodiment shown, a processor, such as the Collector component (108), receives a pluralitsy of snapshots from a plurality of computezrs 602. Although the following discussion describes the process shown in Figure 6 as being performed by the Analytic component (110), any~ suitable processor may perform the process sshown. The plurality of snapshots may comprise as few as two snapshots from two computers. Alternatively, the plurality of snapshots may comprise thousands of snapshots. “The snapshots comprise data about computers in a popu-lation to be examined. For example, the plurality of snapshots may be received from each of the computers in communication with an organization’s local area network. Each snapsh ot comprises a collection of asset/value pairs that represent the state of a computer a-t a particular point in time.

As the Collector component (108) receives the snapshots, it stores them 604. Storing the snapshots may comprise storing them in a data store, such as in database (112) or in memory (not shown). The snapshots may be stored terriporarily or permanently. Also, in one embodiment of the present invention, the entire snapshot is stored in a data store. In another embodiment, onty the portions of the snapshot that haves changed from a prior version are stored (i.e., a delta snapshot).

The Analytic component (110) utilizes the data in the plurality of snapshots to create an adaptive reference model 606. Each of the snapsho- ts comprises a plurality of assets, which comprise a plurality of pairs of asset names and asset values. An asset isan attribute of a computer, such as a file name, a registry key name-, a performance parameter, or a communication port. The assets reflect a state of a concaputer, actual or virtual, within the population of computers analyzed. An asset value is thme state of an asset ata particular point in time. For example, for a file, the value may comprise an MD35 hash that represents the contents of the file; for a registry key, the value may comprise a text string that represents the data assigned to the key.

The adaptive reference model also comprises a plurality of assets. The assets of the adaptive reference model may be compared to the assets of a snapshot to identify anomalies and for other purposes. In one embodiment, the adaptive reference model comprises a collection of data about various relationships between aassets that characterize one or more normal computers at a particular point in time.

In one embodiment, the Analytic component (1 10) identifies a cluster of asset names.

A cluster comprises one or more non-overlapping groups of asset names that appear together.

The Analytic component (110) may also attempt to identify relationships among the clusters.

For example, the Analytic component (110) may compute a matrix of probabilities that predict, given the existence of a particular cluster in a snapshot, the likelihood of the existence of any other cluster in the snapshot. Probabil ities that are based on a large number of snapshots and are either very high (e.g. greater than 95%) or very low (e.g. less than 5%) can be used by the model to detect anomalies. Probabilities that are based on a small number of snapshots, (i.e. a number that is not statistically significant) or that are neither very high nor very low are not used to detect anomalies.

The aadaptive reference model may comprise a confidence criterion for determining when a relati_onship can be used to test a snapshot. For example, the confidence criterion may comprise a minimum threshold for a number of snapshots contaimed in the adaptive reference model. If thee threshold is not exceeded, the relationship will not be used. The adaptive reference may also or instead comprise a minimum threshold for= a number of snapshots contained in the adaptive reference model that include the relationship, utilizing the relationship only if the threshold is exceeded. In one embodiment, the adaptive reference model comperises 2 maximum threshold for a ratio of the number of different asset values to the number eof snapshots containing the asset values. The adaptive reference model may comprise on_e or more minimum and maximum thresholds assoc-iated with numeric asset values.

Each of the plurality of assets in the adaptive reference model or in a snapshot may be associated vith an asset type. The asset type may comprise, for- example, a file, a registry key, a performance measure, a service, a hardware component, & running process, a log, and a communication port. Other asset types may also be utilized by : embodiments of the present invention. Mn order to conserve space, the asset names and asse®t values may be compressed.

For instances, in one embodiment of the present invention, the Collector component (108) identifies thme first occurrence of an asset name or asset value in one of the plurality of snapshots and generates an identifier associated with that first osccurrence. Subsequently, if the Collector component (108) identifies a second occurrence o f'the asset name or asset value, the Collector component (108) associates the identifier with the second asset name and asset value. The identifier and asset name or asset value can theen be stored in an index, while only the identifier is stored with the data in the adaptive reference model or snapshot. In this way, space necessitated to store frequently repeated asset name:s or values is minimized.

The= adaptive reference model may be automatically gererated. In one embodiment, the adaptivee reference model is generated automatically and then manually revised to account for knowlecige of technical support personnel or others. Figures 11 is a screen shot of a user interface foer creating an adaptive reference model in one embodiment of the present invention. In the embodiment shown, a user selects the snapsh: ots to be included in the model by moving them from the Machine Selection Menu window 11 02 to the Machines in Task window 11 04. When the user completes the selection process sand clicks the Finish button 1106 an automated task is created that causes the model to be generated. Once the model has

“been created, the user can use another interface screen to manage it. Figure 12 is a screen shot of a user interface for managing an adaptive re ference model in one embodiment of the present invention.

Referring again to Figure 6, once the adapti ve reference model has been created, the

Analytic component (110) compares at least one of the plurality of snapshots to the adaptive reference model 608. For example, the Collector component (108) may receive and store in the Database component (112) one hundred snapshots. The Analytic component (110) uses the one hundred snapshots to create an adaptive reference model. The Analytic component (110) then begins comparing each of the snapshots in the plurality of snapshots to the adaptive reference model. At some time later the Collector component (108) may receive 100 new snapshots from the Agent components, which can then be used by the Analytic component to generate a revised version of the adaptive reference model and to identify anomalies.

In one embodiment of the present invention, the comparison of one or more snapshots to an adaptive reference model comprises examining relationships among asset names. For instance, the probability of existence for a first asset name may be high when a second asset name is present. In one embodiment, the comparison comprises determining whether all of the asset names in a snapshot exist within the adaptive reference model and are consistent with a plurality of high probability relationships among asset names.

Referring still to Figure 6, in one embodirment, the Analytic component (110) compares the snapshot to the adaptive reference model in order to identify any anomalies that may be present on a computer 610. An anomaly is an indication that some portion of a snapshot deviates from normal as defined by the adaptive reference model. For example, an asset name or value may deviate from the normal asset name and asset value expected in particular situation as defined by an adaptive reference model. The anomaly may or may not signal that a known or new trouble or problem condition exists on or in relation to the computer with which the snapshot is associated. A condition is a group of anomalies that are related. For example, a group of anomalies may be related because they arise from a single root cause. For example, an anomaly may indicate the presence of a particular application on a computer when that application is not generally present on the other computers within a given population. Recognition of anomalies may also be used for functions such as capacity balancing. For instance, by evaluating performace measures of several servers, the Analytic component (110) is able to determine when to trigger the automatic deployment and configuration of a new server to address changing demands.

A conditioen comprises a group of related anomalies. For example, a group of anomalies may be= related because they arise from a single root cause, su ch as installation of an application program or the presence of a “worm.” A condition may c=omprise a condition class. The condition class allows various conditions to be grouped with one another.

In the embodiment shown in Figure 6, if an anomaly is found, the Analytic component (110) attempts to match the anomaly to a recognition filter izm order to diagnose a condition 612. T he anomaly may be identified as a benign anomaly in Order to eliminate noise during anal ysis, i.¢., in order to avoid obscuring real trouble condiitions because of the presence of anomalies that are the result of normal operating processes. A check isa comparison of a ssnapshot to an adaptive reference model. A check may be automatically performed. The eoutput of a check may comprise a set of anomalies and conditions that have been detected. Ir one embodiment, the anomaly is matched to a plurali—ty of recognition filters. A recognition filter comprises a signature of a condition or of a class of conditions.

For example a re=cognition filter may comprise a collection of pairs of a_sset names and values that, when taken together, represent the signature of a condition that is edesirable to recognize, such as the prese=nce of a worm. A generic recognition filter may provi_de a template for creating more speecific filters. For example, a recognition filter that is amdapted to search for worms in general may be adapted to search for a specific worm.

In one ermbodiment of the present invention, a recognition filter= comprises at least one of* an asset namee associated with the condition, an asset value associateed with the condition, a combination oF asset name and asset value associated with the condit=ion, a maximum threshold associzated with an asset value and with the condition, and a rminimum threshold associated with =an asset value and with the condition. Asset name/valie pairs from a snapshot may bes compared to the name/value pairs from the recognitionn filter to find a match and diagnose a condition. The name/value matching may be exact or the recognition filter may comprise a wildcard, allowing a partial value to be entered in the mxrecognition filter and then matched with the snapshot. A particular asset name and/or value may be matched to a plurality of recognition filters in order to diagnose a condition.

A recogrition filter may be created in various ways. For examyple, in one embodiment of the present irmvention, a user copies the anomalies from a machine wvhere the condition of interest is presemt. The anomalies may be presented in an anomaly summary from which they can be selected . and copied to the filter. In another embodiment, a usemr enters a wildcard character in a fi_lter definition. For example, one piece of spyware callled Gator generates thousands of registry keys that start with the string “hklm\software\gator\”. An embodiment of the present invention may provide a wildcard mechanism to efficiently deal with this situation. The wildcard character may be, for example, the percent sign (%), and may be used before a text string, after a text string, or in the middle of a text string. Continuing the

Gator example, if the user enters the string “hklm\software\gator\%” in the filter body, then any key starting with “hkml\software\gator” will be recognized by the filter. The user may wish to construct a filter for a condition that has not yet been experienced in the managed population. For example, a filter for a virus based on publicly available information on the

Internet rather than an actual instance of the virus within the managed population. To address this situation the user enters the relevant information directly into a filter.

Figure 13 is a screen shot of a user interface for selecting a snapshot to use for creation of a recognition filter in one embodiment of the present invention. A user accesses the screen shot shown to select snapshots to be used to create the recognition filter. Figure 14 is a screen shot of a user interface for creating or editing a recognition filter in one embodiment of the present invention. In the embodiment shown, assets from the snapshot selected in the interface illustrated in Figure 13 are displayed in the Data Source window 1402. The user selects these assets and copies them to the Source window 1404 to create the recognition filter.

In one embodiment, the match between a recognition filter and a set of anomalies is associated with a quality measure. For example, an exact match of all of the asset names and asset values in the recognition filter with asset names and asset values in the set of anomalies may be associated with a higher quality measure than a match of a subset of the asset names and asset values in the recognition filter with asset names and asset values in the set of anomalies.

The recognition filter may comprise other attributes as well. For example, in one embodiment, the recognition filter comprises a control flag for determining whether to include the asset name and the asset value in the adaptive reference model. In another embodiment, the recognition filter comprises one or more textual descriptions associated witka one or more conditions. In yet another embodiment, the recognition filter comprises a severity indicator that indicates the severity of a condition in terms of, for example, how much damage it may cause, how difficult it may be to remove, or some other suitable measure.

The recognition filter may comprise fields that are administrative in nature. For example, in one embodiment, the recognition filter comprises a recognition filter identifier, a_ creator name, and an update date-time.

Still referring to Figure 6, the Analytic component (110) nmext responds to the condition 614. Responding to the condition may comprise, for exzample, generating a notification, such as an email to a support technician, submitting =a trouble ticket to a problem management system, requesting permission to take an action, for instance, asking for confirmation from a support technician to install a patch, and removing the condition from at least one of the plurality of computers. Removing the condition mmay comprise, for example, causing aa response agent to be executed in any of the plurality of computers affected by the conditiorm. The condition may be associated with an automatic response. The steps of diagnosirg 612 and responding to conditions 614 may be repeate=d for each condition. Also, the proce=ss of finding anomalies 610 may be repeated for each iradividual snapshot.

Tr the embodiment shown in Figure 6, the Analytic compwonent (110) next determines whether additional snapshots are to be analyzed 616. If so, the steps of comparing the snapshot: to the adaptive reference model 608, finding anomalies- 610, matching the anomalies to areco gnition filter to diagnose a condition 612, and respondirg to the condition 614 are repeated for each snapshot. Once all of the snapshots have been analyzed, the process ends 618.

I one embodiment of the present invention, once the Aralytic component (110) has identifie-d a condition, the Analytic component (110) attempts tos determine which of the plurality~ of computers within a population are affected by the coondition. For example, the

Analytic component (110) may examine the snapshots to identify a particular set of anomalises. The Analytic component (110) may then cause a ressponse to the condition to be executecd on behalf of each of the affected computers. For example, in one embodiment, an

Agent ceomponent (202) resides on each of the plurality of compouters. The Agent component (202) ge=nerates the snapshot that is evaluated by the Analytic component (110). In one such embodirment, the Analytic component (110) utilizes the Agent component (202) to execute a response program if the Analytic component (110) identifies a condition on one of the computezrs. In diagnosing a condition, the Analytic component (110) may or may not be able to identa fy a root cause of a condition.

Wigure 7 is a flow chart illustrating a process for identifywing certain types of anomalies in one e=mbodiment of the present invention. In the embodiment shown, the Analytic compomment (110) evaluates snapshots for a plurality of computers 702. These snapshots can be base snapshots that comprise the complete state of the compruter or delta snapshots that comprisse the changes in the state of the computer since the last base snapshot. The Analytic compornent (110) uses the snapshots to create an adaptive reference model 704. Note that when using delta snapshots for this purpose, the Analytic component must first reconstitute the equivalent of a base snapshot by applying the changes described in the delta snapshot to the most recent base snapshot. Thue Analytic component (110) subsequently receivess a second snapshot (base or delta) for at leasst one of the plurality of computers 706. The snapsshot may be created based on various events, such as the passage of a predetermined amount Of time, the installation of a new program, or some other suitable event.

The Analytic component 110) compares the second snapshot to the adaptives reference model to attempt and detect anomalies. Various types of anomalies may exist on a computer. In the embodiment sh own, the Analytic component (1 10) first attempts t-o identify asset names that are unexpectedly absent 710. For example, all or substantially all ofthe computers within a population m ay include a particular file. The existence of the fi le is noted in the adaptive reference model by the presence of an asset name. If the file is unexxpectedly absent from one of the computers within the population, i.e., the asset name is not feound, some condition may be affecting the computer on which the file is missing. If the amsset name is unexpectedly absent, the absence is identified as an anomaly 712. For example, aan entry identifying the computer, date, amd unexpectedly absent asset may be entered in a dl ata store.

The Analytic component (110) next attempts to identify asset names that ares unexpectedly present 714. The presence of an unexpected asset name, such as a fil=e name or registry entry, may indicate the presence of a trouble condition, such as a computer- worm.

An asset name is unexpectedly present if it has never been seen before or if it has n_ever been seen before in the context in whi ch it is found. If the asset name is unexpectedly preesent, the presence is identified as an anomaly 720.

The Analytic component (110) next attempts to identify an unexpected asse=t value 718. For example, in one embocliment, the Analytic component (110) attempts to i_dentify a string asset value that is unknown for the asset name associated with it. In another embodiment, the Analytic component (110) compares a numerical asset to minirmu-—n or maximum thresholds associated with the corresponding asset name. In embodimerts of the present invention, the thresholds may be set automatically based upon the mean aned standard deviation for asset values within. a population. According to the embodiment shower, if an unexpected asset value is detected, it is identified as an anomaly 720. The process then ends 722.

Although the process in Figure 7 is shown as a serial process, the comparis=on of a snapshot to the adaptive reference model and the identification of anomalies may occur in parallel. Also, each of the steps depicted may be repeated numerous times. Furtheer, either delta snapshots or base snapshots can be compared to the adaptive reference model during gach cycle. .

Once an analysis has been completed, the Analytic component (1 10) may generate a result, such as an anomaly report. This report may further be provided to a user. For instance, the Analytic component (110) may generate a wveb page comprising the results of a comparison of a snapshot with an adaptive reference mociel. Embodiments of the present invention may provide a means for performing automate: d security audits, file and registry integrity checking, anomaly-based virus detection, and automated repair.

Figure 8 is a flow chart illustrating a process for zgenerating an adaptive reference model in one embodiment of the present invention. In the embodiment shown, the Analytic component (110) accesses a plurality of snapshots from =a plurality of computers via the

Database component. Each of the snapshots comprises a plurality of pairs of asset names and asset values. The Analytic component (110) automatically creates an adaptive reference model that is based, at least in part, on the snapshots.

The adaptive reference model may comprise any~ of a number of attributes, relationships, and measures of the various asset names asnd values. In the embodiment shown in Figure 8, the Analytic component (110) first finds onee or more unique asset names and then determines the number of times each unique asset ame occurs within the plurality of snapshots 804. For example, a file for a basic operatingz system driver may occur on substantially all the computers within a population. The file name is a unique asset name; it will appear only once within a snapshot but will likely Occur in substantially all of the snapshots.

In the embodiment shown, the Analytic compon_ent (110) next determines the unique asset values associated with each asset name 806. For example, the file name asset for the driver described in relation to step 804 will likely have the same value for every occurrence of the file name asset. In contrast, the file value for a log file will likely have as many different values as occurrences, i.c., a log file on any particular computer will contain a different number of entries from every other computer fin a population.

Since the population may be very large, in the exmbodiment shown in Figure 8, if the number of unique values associated with an asset name exceeds a threshold 808, the determination is halted 810. In other words, in the exarnmple of the log file described above, whether or not the computer is in a normal state does neot depend on a log file having a consistent value. The log file contents are expected to vary on each computer. Note however that the presence or abasence of the log file may be stored in the adaptive reference model as an indication of normalcy or of an anomaly.

In the embodiruent shown in Figure 8, the Analytic compomnent (110) next determines the unique string asse® values associated with each asset name 812. For example, in one embodiment, there ares only two types of asset values, strings and numbers. File hashes and registry key values ares examples of strings; a performance counter value is an example of a number.

The Analytic component (110) next determines a statistical measure associated with unique numerical valiaes associated with an asset name 814. For eexample, in one embodiment, the Analytic component (110) captures a performance measure, such as memory paging. If ore computer in a population often pages mermory, it may be an - indication that a rogues program is executing in the background ard requiring substantial memory resources. Flowever, if every or a sizeable number of co-mputers in a population often page memory, it may indicate that the computers are generally lacking in memory resources. In one emTodiment, the Analytic component (110) destermines a mean and a standard deviation fom numerical values associated with a unique asset name. In the memory example, if the meastare of memory paging for one computer fallss far outside the statistical mean for the populati_on, an anomaly may be identified.

In one embod-iment of the present invention, the adaptive reference model may be modified by applyingz a policy template. A policy template is a ceollection of asset/value pairs that are identified ana applied to an adaptive reference model to establish a norm that reflects “a specific policy. For- example, the policy template may comprises a plurality of pairs of asset ‘names and asset valuess that is expected to be present in a normal computer. In one embodiment, applyin_g the policy template comprises modifying ®the adaptive reference model so that the pairs of as=set names and asset values present in the policy template appear to have been present in each of the plurality of snapshots, i.e., appear to toe the normal state of a computer in the popu lation.

Figure 15 is a screen shot illustrating a user interface for selecting a “golden system” for use in a policy termaplate in one embodiment of the present inv~ention. As described above, the user first selects tlhe golden system on which the policy template is to be based. Figure 16 is a screen shot of a u_ser interface for selecting policy template asssets in one embodiment of the present invention_. As with the user interface for creating recognition filters. The user selects assets from a MData Source window 1602 and copies them -to a contents window, the

Template contents window 1604.

Figure 9 is a flow chart, illustrating a process for proactive anomaly detection in one embodiment of the present invention. In the embodinnent shown, when analysis occurs, the

Analytic component (110) establishes a connection tos the database (112) that stores snapshots to be analyzed 902. In the embodiment shown, only ©ne database is utilized. However, in other embodiments, data from multiple databases may be analyzed. ~ Before diagnostic checks are executed, one or~ more reference models are created 904.

Reference models are updated periodically, e.g., once per week, to ensure that the : information that they contain remains current. One embodiment of the present invention provides a task scheduler that allows model creation “to be configured as a completely automated procedure. :

Once a reference model has been created it can be processed in various ways to enable different types of analysis. For example, it is possible to define a policy template 906 as described above. For example, a policy template xmay require that all machines in a managed population have anti-virus software installesd and operational. Once a policy template has been applied to 2a model, diagnostic che-cks against that model will include a test for policy compliance. Policy templates can be used in a variety of applications including automated security audits, performance threshold cheecking, and Windows update management. A policy template comprises the set of assets and values that will be forced into the model as the norm. In one embodiment, the template editing process is based on a "golden system" approach. A golden system is one that exhibits the assets and values that a user wishes to incorporate into the template. The useer locates the snapshot that corresponds to the golden system and then selects each asset/valume pair that the user wishes to include in the template.

In the process shown in Figure 9, the policy ®emplate is then applied to 2 model to modify its definition of normal 908. This allows thes model to be shaped in ways that allow it to check for compliance against user-defined policiess as described herein.

A model may also be converted 910. The co-nversion process alters a reference model. For example, in one embodiment, the conversion process removes from the model any information assets that are unique, i.e. any assetzs that occur in one and only one snapshot.

When a check is executed against a converted mode 1 all unique information assets will be reported as anomalies. This type of check is useful im surfacing previously unknown trouble - conditions that exist at the time the Agent componerts are first installed. Converted models are useful in establishing an initial baseline since they expose unique characteristics. For this reason converte d models are sometimes called baseline mocRels in embodiments of the present invention.

In another embodiment, the model building process vemoves from the model any information ass-ets that match a recognition filter, ensuring t=hat known trouble conditions do not get incorporated into the model. When the system is fir-st installed the managed population quit-e often contains a number of known trouble conditions that have not yet been noticed. It is important to discover these conditions and remove them from the model since otherwise, they= will be incorporated into the adaptive referesnce model as part of the normal state for a machine. ‘

The Agzent component (202) takes a snapshot of thes state of each managed machine on a scheduled basis 924. The snapshot is transmitted and entered into the database as a snapshot. Snagpshots may also be generated on demand or —in response to a specific event such as application JAnstallation.

In the proactive problem management process shovevn, a periodic check of the latest snapshots agai mst an up-to-date reference model is performed 912. The output of a periodic check is a set Of anomalies, which are displayed to a user ams results 914. The results also include any cosnditions that are identified as a result of mat=ching the anomalies to recognition filters. Recogrmition filters may be defined as described above 916. The anomalies are passed through the reecognition filters for interpretation resulting ina set of conditions. Conditions can range in se=verity from something as benign as a Windeows update to something as serious as a Trojan.

The trouble conditions that can occur in a computesr change as the hardware and software components that make up that computer evolve. Consequently, there is a continuous ne-ed to define and share new recognition filters as new combinations of anomalies are discovered. Recognition filters can be thought of as a very detailed and "structured wa=y to document trouble conditions and as suchh they represent an important mechanism toe facilitate collaboration. The embodiment sThown comprises a mechanism for exporting recognition filters to an XML file and importing recognition filters from an XML file.

Once econditions are identified, reports documentirg the results of a proactive check are generated 920. The reports may comprise, for exampHe, a summary description of all conditions detected or a detailed description of a particulaar condition.

Figures 10 is a flow chart, illustrating a reactive preocess in one embodiment of the present inven-tion. In the process shown in Figure 10, it iss assumed that an adaptive reference model has already been created. The process shows begins when a user calls a help desk to report a problem 1002. In the traditional help desk gparadigm the next step would be to verbally collect information about the symptoms being experienced by the user. In contrast, in the embodiment of the present invention shown, &the next step is to run a diagnostic check of the suspect machine against the most recent snap shot 1003. If this does not produce an immediate diagnosis of a problem condition, three possibilities may exist: (1) The condition has occurred since the last snapshot was taken; (2) he condition is new and is not being recognized by its filters; or (3) The condition is outside the scope of analysis, €.8. a hardware problem.

If it is suspected that the trouble condition nas occurred since the last snapshot was taken then the user may cause the Agent component (202) on the client machine to take another snapshot 1006. Once the resulting snapshot is available, a new diagnostic check can be executed 1004.

If it is suspected that the trouble condition &s new, the analyst may execute a compare function that provides a breakdown of the changes in the state of a machine over a specific window of time such as new applications that may have been installed 1008. The user may also view a detailed representation of the state of a_ machine at various points in time 1010. If the analyst identifies a new trouble condition, the Laser can identify the set of assets as a recognition filter for subsequent analyses 1012.

While conventional products have focused on enhancing the efficiency of the human- based support model, embodiments of the present -invention are designed around a different paradigm, a machine-based support model. This fundamental difference in approach manifests itself most profoundly in the areas of da-ta collection and analysis. Since a machine rather than a human will perform much of the analysis of the data collected, the data collected can be voluminous. For example, in one embodin-ient, the data collected from a single machine, referred to as the “health check” or snapzshot for the machine, includes values for hundreds of thousands of attributes. The ability to collect a large volume of data provides embodiments of the present invention with asignifficant advantage over conventional systems in terms of the number and variety of conditions that can be detected.

Another embodiment of the present invention provides a powerful analytic capability.

The foundation for high value analysis in such an embodiment is the ability to accurately distinguish between normal and abnormal conditi«ons. For example, one system according to the present invention synthesizes its reference mo-del automatically by mining statistically significant relationships from the snapshot data thwat it collects from its clients. The resulting

“adaptive” reference model defines what is normal for that particular managed population at that particular moment in time.

One embodiment of the present invention combines the data collection and adaptive analysis features described above. In such an embodiment, the superior data collection capabilities combined with the amalytic power of the adaptive reference model translate into & number of significant competitive advantages, including the capability of providing automatic protection against security threats by conducting daily security audits and checking for software updates to eliminate vulnerabilities. Such an embodiment may also be capable of proactively scanning all managed systems on a routine basis to {ind problems before they result in lost productivity or calls to the help desk.

An embodiment of the pxesent invention implementing the adaptive reference model capabilities is also able to detect previously unknown trouble conditions. Further, such an embodiment is automatically synthesized and maintained, requiring little or no vendor updates to be effective. Such am embodiment is automatically customized to a particular managed population enabling it to detect failure modes unique to that population.

An additional advantages of an embodiment of the present invention is that in the event that a trouble condition cannot be resolved automatically, such an embodiment can provide a massive amount of structured technical information to facilitate the job of the support analyst.

One embodiment of the present invention provides the capability of automatically repairing an identified problem. Such an embodiment, when combined with the adaptive reference model of the previously described embodiment, is uniquely capable of automated repair because of its ability to identify all aspects of a trouble condition.

Embodiments of the present invention also provide many advantages over conventional systems and methods in terms of the service levels described herein. For example, in terms of the Mass-Healing service level, it is considerably less expensive to : prevent an incident than itis to resolve an incident once damage has occurred. Embodimemts of the present invention substantially increase the percentage of incidents that can be detected/prevented without the need for human intervention and in a manner that embraces the diverse and dynamic nature of computers in real world environments.

Further, an embodiment of the present invention is able to address the Self-Healing service level by automatically detecting and repairing both known and unknown anomalies

An embodiment implementing the adaptive reference model described herein is uniquely suited to automatic detection and repair. The automatic service and repair also helps to eliminate or at least minimize the need for Self-Service and Desk-side Visits.

Embodiments of the present invention provide ad. vantages at the Assisted Service level by providing superior diagnostic capabilities and extensive information resources. An embodiment collects and analyzes massive amounts of end-user data, facilitating a variety of needs associated with the human-based support model iracluding: security audits, configuration audits, inventory management, performance analysis, trouble diagnosis.

The foregoing description of embodiments of the invention has been presented only for the purpose of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Numerous modifications and adaptations thereof will be apparent to those skilled in the art withowat departing from the spirit and scope of the present invention.

Claims

That which is ¢ laimed:

1. A methend comprising: receivirag a plurality of snapshots from a plurality of conputers; storing the plurality of snapshots in a data store; creating an adaptive reference model based at least in paart on the plurality of snapshots; comparing at least one of the plurality of snapshots to thze adaptive reference model; and identifywing at least one anomaly based on the comparison.

2. The method of claim 1, further comprising matching thes at least one anomaly to at least one recognition filter to diagnose a condition on at least ore of the plurality of computers.

3. The mesthod of claim 2, further comprising responding t.o the condition.

4. The method of claim 3, wherein responding to the cond-ition comprises at least one of: generating a notification; submit ting a trouble ticket to a problem management sy~stem; requesting permission to take an action; and removing the condition from at least onc of the plurality of computers.

5. The me=thod of claim 4, wherein removing the conditiora comprises causing a repair program to be executed on the at least one of the plurality of computers affected by the condition.

6. The meethod of claim 2, further comprising: determining which of the plurality of computers are affected by the condition; and causingg a response to the condition to be executed on beehalf of each of the plurality of computers affected by the condition.

7. The meethod of claim 2, wherein diagnosing the conditieon comprises identifying a root cause of the atc least one anomaly.

8. The meethod of claim 2, wherein the at least one recognition filter comprises at least one of: an asse=t name associated with the condition; an asse=t value associated with the condition; a combination of asset name and asset value associated . with the condition; a maxi mum threshold associated with an asset value an:d with the condition; and a minimmum threshold associated with an asset value aned with the condition.

9. The method of claim 8, wherein the at least one recognition filter further comprises a wildcard.

100. The method of claim 8, wherein the at least one recognition filter comprises a control flag for determining whether to include the asset na¥rme and the asset value in the adaptive rexference model.

11. The method of claim 2, wherein the at least one recognition filter comprises a textual d_escription of the condition.

12 The method of claim 2, wherein the at least one recognition filter further comprises a seeverity indicator associated with the condition.

13. The method of claim 2, wherein the at least one recognition filter further comprises: a recognition filter identifier; a creator name; and an update date-time. 1 4 The method of claim 2, wherein the at least one recognition filter is associated with a 151k to an automated response for the condition.

L5.

The method of claim 2, wherein the condition comprises a condition class.

16. The method of claim 2, wherein matching the at least one anomaly to the at least one recognition filter comprises identifying an asset narme associated with the at least one anomaly that is present in a plurality of recognition filters.

£7. The method of claim 2, wherein the at least one recognition filter comprise a plurality ofrecognition filters and wherein matching the at least one anomaly to the at least one recognition filter comprises identifying an asset value in the at least one anomaly that matches an asset value in the plurality of recognition filters.

18. The method of claim 2, further comprising determining a quality of a match between the at least one recognition filter and the at least on.e anomaly.

M9. The method of claim 1, wherein each of the: plurality of snapshots comprises a plurality of assets associated with one of the plurality of computers.

20. The method of claim 19, wherein each of thie plurality of assets comprises an asset rame and an asset value.

1. The method of claim 19, wherein each of thie plurality of assets is associated with an asset type.

22. The method of claim 21, wherein the asset type comprises one of a file, a registry key, a performance measure, a service, a hardware component, a running process, a log, and a communication port.

23. The method of claim 1, wherein the at least one anormaly comprises at least one of the following: an asset name that is unexpectedly present; an asset name that is unexpectedly absent; a string asset value that is unknown for an asset nam e associated with the string asset value; a numerical asset value that is less than a minimum threshold or more than a maximum threshold for an asset name associated with the n—umerical asset value.

24. The method of claim 1, wherein comparing at least One of the plurality of snapshots to the adaptive reference model comprises: generating a result; and providing the result to a user.

25. The method of claim 1, wherein each of the pluralitsy of snapshots comprises a plurality of pairs of asset names and asset values, and wherezin comparing the at least one of the plurality of snapshots to the adaptive reference model caomprises: comparing each asset name to the adaptive references model to identify an unexpectedly present asset name; comparing each asset name to the adaptive referencee model to identify an unexpectedly absent asset name; and comparing each asset value to the adaptive referenc-e model to identify an unknown or unusual asset value.

26. The method of claim 25, wherein comparing the at Teast one of the plurality of snapshots to the adaptive reference model further comprise=s determining whether an asset value falls below a minimum threshold or exceeds a maxin—ium threshold.

27. The method of claim 25, wherein comparing the at “least one of the plurality of snapshots to the adaptive reference model further comprisess determining whether all of the asset names in the at least one of the plurality of snapshots exist within the adaptive reference model and are consistent with a plurality of high probabilit-y relationships among asset names.

WO» 2005/020001 PCT/US2004/026186

28. “The method of claim 25, wherein comparing the at least one of the plurality of snapshoets to the adaptive reference model further comprises determining a confidence level associated with an analysis scope measure and a diagnostic accuracy measure.

29. “The method of claim 1, wherein the plurality of snapshots comprises: a base snapshot associated with one of the plurality’ of computers, the base snapshot comprising a plurality of asset names and asset values that represent the state of the one of the plumrality of computers at a particular point in time; and a delta snapshot associated with the one of the plurality of computers, the delta snapshet comprising a plurality of asset names and asset values that have changed since the base smmapshot was created.

30. The method of claim 1, wherein the plurality of sn.apshots are created by a software agent tThat resides on each of plurality of computers.

31. The method of claim 30, further comprising collecting asset information for one of the plurality of snapshots according to at least one of: a predetermined schedule and an occurrence of a predetermined event.

32. The method of claim 1, further comprising: identifying a first occurrence of an asset name or asset value in one of the plurality of snapshots; generating an identifier associated with the first occurrence of the asset name or asset value; associating the identifier with the first occurrence of the asset name or asset value; identifying a second occurrence of the asset names or asset value in the plurality of asset/walue pairs; and associating the identifier with the second occurrence of the asset name or asset value.

33. A computer-readable medium on which is encoded program code, the program code comprising: program code for receiving a plurality of snapshots from a plurality of computers; program code for storing the plurality of snapshotsin a data store; program code for creating an adaptive reference rnodel based at least in part on the plurality of snapshots; program code for comparing at least one of the plurality of snapshots to the adaptive reference model; and program code for identifying at least one anomaly based on the comparison.

34. The computer-readable medium of claim 33, further comprising program code for matching the at least one anomaly to at least one recognition filtesr to diagnose a condition on at least one of the plurality of computers.

35. The computer-readable medium of claim 34, further comprising program code for responding to the condition.

36. The computer-readable medium of claim 35, wherein prosgram code for responding to the condition comprises at least one of: program code for generating a notification; program code for submitting a trouble ticket to a problena management system, program code for requesting permission to take an actiorm; and program code for removing the condition from at least ome of the plurality of computers. .

37. The computer-readable medium of claim 36, wherein preogram code for removing the condition comprises program code for causing a repair program to be executed on the at least one of the plurality of computers affected by the condition.

38. The computer-readable medium of claim 34, further cormprising: program code for determining which of the plurality of computers are affected by the condition; and program code for causing a response to the condition to be executed on behalf of each of the plurality of computers affected by the condition.

39. The computer-readable medium of claim 34, wherein pr-ogram code for diagnosing the condition comprises program code for identifying a root catase of the at least one anomaly.

40. The computer-readable medium of claim 34, wherein parogram code for matching the at least one anomaly to the at least one recognition filter comprises program code for identifying an asset name associated with the at least one anorrmaly that is presentin a plurality of recognition filters.

4]. The computer-readable medium of claim 34, wherein the at least one recognition filter comprise a plurality of recognition filters and wherein prograna code for matching the at least one anomaly to the at least one recognition filter comprises program code for identifying an : asset value in the at least one anomaly that matches an asset vaslue in the plurality of recognition filters.

42. The: computer-readable medium of claim 34, further comprising program code for determining a quality of a match between the at least one recogni tion filter and the at least one anomaly.

43. The computer-readable medium of claim 33, wherein program code for comparing at least one of the plurality of snapshots to the adaptive reference m_odel comprises: progzram code for generating a result; and program code for providing the result to a user.

44. The computer-readable medium of claim 33, wherein each of the plurality of snapshots comprises a plurality of pairs of asset names and asset walues, and wherein program co«de for comparing the at least one of the plurality of snapshots to the adaptive reference model comprises: program code for comparing each asset name to the adaptive reference model to identify an wanexpectedly present asset name; progoram code for comparing each asset name to the adaptive reference model to : identify an vanexpectedly absent asset name; and program code for comparing each asset value to the adapti ve reference model to identify an vanknown or unusual asset value.

45. The <omputer-readable medium of claim 44, wherein program code for comparing the at least one Of the plurality of snapshots to the adaptive reference smodel further comprises program code for determining whether an asset value falls below a minimum threshold or exceeds a maximum threshold.

46. The computer-readable medium of claim 44, wherein progzam code for comparing the at least one of the plurality of snapshots to the adaptive reference rmodel further comprises program cod e for determining whether all of the asset names in thes at least one of the plurality of snapshots exist within the adaptive reference model and are consistent with a plurality of leigh probability relationships among asset names.

47. The computer-readable medium of claim 44, wherein program code for comparing the at least one o f the plurality of snapshots to the adaptive reference nriodel further comprises program code for determining a confidence level associated with am analysis scope measure and a diagnostic accuracy measure.

48. The computer-readable medium of claim 33, wherein the pl urality of snapshots are created by a software agent that resides on each of plurality of computers.

49. The computer-readable medium of claim 48, further comprising program code for collecting asset information for one of the plurality of snapshots according t o at least one of: a predeteramined schedule and an occurrence of a predetermined event.

50. The computer-readable medium of claim 33, further comprising: prrogram code for identifying a first occurrence of an asset name or assset value in one of the plurality of snapshots; program code for generating an identifier associated with the first occurrence of the asset name or asset value; p rogram code for associating the identifier with the first occurrence of the asset name or asset value; program code for identifying a second occurrence of the asset name o r asset value in the plurality of asset/value pairs; and program code for associating the identifier with the second occurrence of the asset name or asset walue.

51. Ax method of detecting abnormal system states in computers, comprisin_g: receiving snapshots from a plurality of computers within a population of computers, wherein andividual snapshots include data indicating a state of a respective cormputer; automatically generating an adaptive reference model comprising a rulee set customized to character-istics of the population of computers, the rule set being developed by identifying patterns among the snapshots from the plurality of computers such that the adaptive reference model is indicative of normal states in the computers within the population; an d comparing a snapshot from at least one of the computers to the adaptiwe reference model to determmine whether an anomaly is present in the state of the least one of the ccomputers.

52. The method of claim 51, further comprising: comparing the anomaly to a recognition filter to diagnose a trouble condition on the at least one of the computers; and in. the event of a trouble condition, generating an automated response to the trouble condition...

53. The method of claim 52, wherein the recognition filter comprises a particular pattern of anomalies that indicates the presence of a particular root cause condition or a generic class of condition s.

. Amemded 25 May 2007

54. The method of cl aim 51, further comprising: ’ comparing a plurality of anomalies associated with a particular snapshot with a recognition filter to diagmose a trouble condition; and diagnosing a trouble condition on the at least one of the computer in resp onse to at least a subset of the plurality of anomalies matching information in the recognition filter.

55. The method of claim 51, wherein individual snapshots include data associated with at least one of: system files, application files, a registry entry, a performance counter, a process, a communication port, a hardware configuration, a log file, a running task, services, and network connections.

56. The method of cl aim 51, further comprising: manually inserting rules into the rule set of the adaptive reference model to augment or override rules of the rule set automatically generated from the snapshots.

57. The method of claim 51, wherein the adaptive reference model is generated to include a value layer that determines whether an asset value contained in a snapshot is anoamalous.

58. The method of claim 51, wherein the adaptive reference model is generated to include a cluster layer that tracks relationships between assets and identifies an anomaly in response to an asset being unexpectedly” absent from or present in a set of assets in a snapshot.

59. The method of claim 51, wherein the adaptive reference model is generated to include a profile layer that identifies anomalies in response to violation of relationships of clusters of assets in a snapshot.

60. A system for detecting abnormal system states in computers, comprising: a plurality of software agents respectively residing on a plurality of cormputers within a population of computers, the software agents generating snapshots that include data indicating the state of the respective computers; and an analytic component operable to automatically generate an adaptive reference model comprising a rule set customized to characteristics of the population of computers, the rule set being developed by identifying patterns among the snapshots from the plurality of computers such that the adaptive reference model is indicative of normal states in the computers within the population, wherein the analytic component compares a snapshot from at least one of the computers to the adaptive reference model to determine whether an anomaly is present in the Amend ed 25 May 2007 state of the least one of the computers.

61. The system of claimm 60, wherein the analytic component compares the anomaly t 0 a recognition filter to diagnose a trouble condition on the at least one of the computers

62. The system of claim 61, wherein the analytic component compares the trouble condition to a response agent library and generates an automated response to the trouble condition.

63. The system of clainn 61, wherein the recognition filter comprises a particular patterrm of anomalies that indicates the presence of a particular root cause condition or a generic class of conditions.

64. The system of claim 60, wherein the analytic component compares a plurality of anomalies associated with a particular snapshot with a recognition filter to diagnose a trou ble condition; and diagnoses a trouble «<ondition on the at least one of the computer in response to at least a subset of the plurality of anomalies matching information in the recognition filter.

65. The system of clairm 60, wherein individual snapshots include data associated with at least one of: system files, application files, a registry entry, a performance counter, a process, a communication port, a hard ware configuration, a log file, a running task, services, and network connections. ‘

66. The system of claimm 60, wherein the analytic component manually inserts rules into athe rule set of the adaptive reference model to augment or override rules of the rule set automatica_lly generated from the snapshot s.

67. The system of claim 60, wherein the adaptive reference model includes a value layer tkaat determines whether an asset value contained in a snapshot is anomalous.

68. The system of claimm 60, wherein the adaptive reference model includes a cluster layer that tracks relationships betsween assets and identifies an anomaly in response to an asset bei ng unexpectedly absent from or present in a set of assets in a snapshot.

69. The system of claimm 60, wherein the adaptive reference model includes a profile layer that identifies anomalies im response to violation of relationships of clusters of assets in_ a snapshot. ‘

70. A computer readable medium storing instructions, that when executed by a computeer, cause the computer to perforam functions of: receiving snapshots from a plurality of computers within a population of computers, Amended 25 May 207 wherein individual snapshots include data indicating a state of a respective compwuter; automatically generating an adaptive reference model comprising a rule s«et customized to characteristics of the population of computers, the rule set being developed by identifying patterns among the snapshots from the plurality of computers such that the ad aptive reference model is indicative of normal states in the computers within the population; and comparing a snapshot from at least one of the computers to the adaptive reference model to determine whether an anomaly is present in the state of the least one of the cormputers.

71. The computer readable medium of claim 70, further comprising instructions, that when executed, cause the computer to compare the anomaly to a recognition filter to di agnose a trouble condition on the at least one of the computers, and, in the event of a troubsle condition, to generate an automated respon se to the trouble condition.

72. The computer readable medium of claim 70, wherein the instructions fo r comparing the snapshot to the adaptive reference model include instructions for determining vhether an asset value contained in a snapshot is anomalous.

73. The computer readable medium of claim 70, wherein the instructions fo r comparing the snapshot to the adaptive reference model include instructions for tracking relaticonships between assets and identifying an anomaly in response to an asset being unexpectedly absent from or present in a set of assets in a snapshot.

74. The computer readable medium of claim 70, wherein the instructions fo -r comparing the snapshot to the adaptive reference model include instructions for identifyimmg anomalies in response to violation of relationships of clusters of assets in a snapshot. Amend_ed 25 May 2007 41 N