US7921410B1 - Analyzing and application or service latency - Google Patents

Analyzing and application or service latency Download PDF

Info

Publication number
US7921410B1
US7921410B1 US11/784,611 US78461107A US7921410B1 US 7921410 B1 US7921410 B1 US 7921410B1 US 78461107 A US78461107 A US 78461107A US 7921410 B1 US7921410 B1 US 7921410B1
Authority
US
United States
Prior art keywords
latency
transaction
data
normal
components
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/784,611
Inventor
Julie A. Symons
Ira Cohen
Gerald T. Wade
John M. Green
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Micro Focus LLC
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US11/784,611 priority Critical patent/US7921410B1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GREEN, JOHN M., COHEN, IRA, SYMONS, JULIE A., WADE, GERALD T.
Application granted granted Critical
Publication of US7921410B1 publication Critical patent/US7921410B1/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Assigned to ENTIT SOFTWARE LLC reassignment ENTIT SOFTWARE LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARCSIGHT, LLC, ATTACHMATE CORPORATION, BORLAND SOFTWARE CORPORATION, ENTIT SOFTWARE LLC, MICRO FOCUS (US), INC., MICRO FOCUS SOFTWARE, INC., NETIQ CORPORATION, SERENA SOFTWARE, INC.
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARCSIGHT, LLC, ENTIT SOFTWARE LLC
Assigned to MICRO FOCUS LLC reassignment MICRO FOCUS LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ENTIT SOFTWARE LLC
Assigned to NETIQ CORPORATION, MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), BORLAND SOFTWARE CORPORATION, SERENA SOFTWARE, INC, MICRO FOCUS (US), INC., MICRO FOCUS SOFTWARE INC. (F/K/A NOVELL, INC.), ATTACHMATE CORPORATION reassignment NETIQ CORPORATION RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718 Assignors: JPMORGAN CHASE BANK, N.A.
Assigned to MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC) reassignment MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC) RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0577 Assignors: JPMORGAN CHASE BANK, N.A.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/87Monitoring of transactions

Definitions

  • Monitoring transaction or job latency is one measure for determining the health of an application or service tasked with performing the transaction (or job).
  • latency is a time delay between the moment a task is initiated and the moment the same task is completed.
  • the task may be a transaction, a job, or a component of such a transaction or job.
  • a transaction latency is response time of the transaction, i.e., the time delay between the moment the transaction is initiated by an application (or service) and the moment such a transaction is completed by the application (or service).
  • FIG. 1 illustrates a block diagram system wherein one or more embodiments may be practiced.
  • FIG. 3 illustrates a method for monitoring and analyzing a transaction latency, in accordance with one embodiment.
  • IT information technology, or IT, encompasses all forms of technology, including but not limited to the design, development, installation, and implementation of hardware and software information or computing systems and software applications, used to create, store, exchange and utilize information in its various forms including but not limited to business data, conversations, still images, motion pictures and multimedia presentations technology and with the design, development, installation, and implementation of information systems and applications.
  • IT distributed environments may be employed, for example, by Internet Service Providers (ISP), web merchants, and web search engines to provide IT applications and services to users.
  • ISP Internet Service Providers
  • web merchants web merchants
  • web search engines web search engines to provide IT applications and services to users.
  • FIG. 1 illustrates a block diagram of a system 100 for monitoring and analyzing transaction or job latencies of an IT application or service, wherein an embodiment may be practiced.
  • various embodiments are discussed herein with reference to an application and a transaction performed by such an application.
  • the system 100 is operable to automatically induce a model of normality for a transaction latency, automatically produce a ranked list of components for abnormal occurrences, based on the degree of abnormality of each component, and automatically adapt to changes in the normality model.
  • the system 100 may be separate from or incorporated into the distributed system(s) that it monitors.
  • the system 100 includes a data collection module 110 and a latency analysis module 120 .
  • one or more data collectors are employed for the data collection module 110 .
  • a data collector is one or more software programs, software applications or software modules.
  • a software program, application, or module includes one or more machine-coded routines, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the data collector is used to monitor and measure the latency of transactions or jobs that are submitted to an IT application or service as implemented in a distributed system, such as an IT data center or an IT network system. Thus, it monitors the distributed system (not shown) to obtain the latency metrics (data measurements), which includes latency metrics of individual components that contribute to the total latency of a transaction or job.
  • the data collector is operable to measure total response time of a transaction and also break down the total response time into the following components: network time, connection time, server time, and transfer time that correspond to the transaction components.
  • Each of the components may include measurable sub-components.
  • server time is made up of time spent in the web server, time spent in the application server, and time spent in the database server.
  • Examples of possible data collectors include but are not limited to: HP Asset and OpenView softwares from Hewlett Packard Company of Palo Alto, Calif., BMC Discovery Express from BMC Software, Inc. of Houston, Tex.; and those data collectors available in the VMware CapacityPlanner software and CDAT software from IBM Corporation of Amonk, N.Y.
  • the latency analysis module 120 is also one or more software programs, software applications or software modules. It is operable through automation to statistically characterize normal component latencies of transactions or jobs that are performed by an application/service in a distributed system, to adapt to changes in such characterized normal behavior over time, and to recognize statistically significant changes in component latencies. To that extent, the latency analysis module 120 is operable to receive or provide a definition of normality 130 for latency of some unit of work, such as a transaction or job.
  • FIG. 2 illustrates a block diagram of a computerized system 200 that is operable to be used as a platform for implementing the system 100 , or any one of the modules 110 and 120 therein.
  • the computer system 200 includes one or more processors, such as processor 202 , providing an execution platform for executing software.
  • the computerized system 200 includes one or more single-core or multi-core processors of any of a number of computer processors, such as processors from Intel, AMD, and Cyrix.
  • a computer processor may be a general-purpose processor, such as a central processing unit (CPU) or any other multi-purpose processor or microprocessor.
  • CPU central processing unit
  • a computer processor also may be a special-purpose processor, such as a graphics processing unit (GPU), an audio processor, a digital signal processor, or another processor dedicated for one or more processing purposes. Commands and data from the processor 202 are communicated over a communication bus 204 or through point-to-point links with other components in the computer system 200 .
  • GPU graphics processing unit
  • audio processor audio processor
  • digital signal processor digital signal processor
  • the computer system 200 also includes a main memory 206 where software is resident during runtime, and a secondary memory 208 .
  • the secondary memory 208 may also be a computer-readable medium (CRM) that may be used to store software programs, applications, or modules that implement the modules 110 and 120 ( FIG. 1 ) and the method 300 ( FIG. 3 , as described below).
  • the main memory 206 and secondary memory 208 (and an optional removable storage unit 214 ) each includes, for example, a hard disk drive and/or a removable storage drive 212 representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc., or a nonvolatile memory where a copy of the software is stored.
  • the secondary memory 408 also includes ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM), or any other electronic, optical, magnetic, or other storage or transmission device capable of providing a processor or processing unit with computer-readable instructions.
  • the computer system 200 includes a display 220 connected via a display adapter 222 , user interfaces comprising one or more input devices 218 , such as a keyboard, a mouse, a stylus, and the like. However, the input devices 218 and the display 220 are optional.
  • a network interface 230 is provided for communicating with other computer systems via, for example, a network.
  • FIG. 3 illustrates a flow chart diagram of a method 300 for monitoring and analyzing a latency, or response time, of an IT application transaction, in accordance with one embodiment.
  • the method 300 is discussed in the context of the system 100 illustrated in FIG. 1 .
  • inputs are collected for the latency monitoring and analysis, the inputs collected include monitored latency data of a transaction of interest as performed by an application in a distributed system, a definition of normality for the transaction latency, and a latency-ranking policy or rule. Each of these inputs is described below.
  • the transfer time indicates the time it takes for data to be transferred to the source of the transaction request as a result of the processing of the transaction.
  • the data collected in each sample or trace for each latency component includes a measurement that is collected once per each predefined time interval, an average of multiple measurements collected per each predefined time interval, or any other suitable statistics about the measurement for each latency component per each predefined time interval.
  • the transaction latency L 1 may include other latency components, and each latency component may include contributing subcomponents therein.
  • the latency analysis module 120 then receives the collected transaction latency data from the data collection module 110 .
  • the method 300 continues at 312 , wherein the latency analysis module 120 determines whether the collected transaction latency data is normal or abnormal based on the predefined definition of normality. This determination is made for each collected sample of the transaction latency data.
  • a data sample is determined to be normal, it is added to a training window.
  • the latency analysis module 120 proceeds to determine whether there is a sufficient amount of training data (e.g., number of data samples) in the training window to compute statistics about the normality of the latency components in the latency transaction data. Thus, testing for sufficient amount of training data may be delayed until there is abnormal latency data to analyze.
  • the sufficiency of the training window may be empirically set by a user based on one or more desired criteria, such as whether the training data in the training window is consistent for normal behavior patterns of each latency component of interest or whether there is enough training data for generating a normal distribution for each latency component.
  • the latency analysis module 120 proceeds to statistically compute a normal latency for each latency component of interest in the latency transaction data. In one embodiment, this is achieved by computing a normal distribution of each latency component based on the received data samples in the training window and the mean value and standard deviation value in the normal distribution. The range of normal latency values for each latency component is then based on the mean and standard deviation values of the normal distribution of such a component as desired. For example, in a standard normal distribution, 68% of the values lie within one standard deviation of the mean, 95% within two standard deviations, and 99% within three (3) standard deviations.
  • a latency component is considered normal if its value ranges within one, two, or three standard deviations as desired.
  • Alternative embodiments are contemplated wherein the range of normal latency values for each latency component is based on any other desired statistics about the normal distribution of the latency component, such as percentiles of the normal distribution, or about any other desired variable, such as time, that is associated with the latency component.
  • the data sample collected and determined to be abnormal at 312 is then compared against these statistical computations to rank the latency components in the new data sample based on their degree of abnormality in accordance with the latency-ranking policy collected at 310 .
  • the latency components in an abnormal data sample collected for analysis are of the same respective types as those latency components in the data samples of the training window in order to perform the comparison.
  • the degree of abnormality may be set as desired by the user, as based on the latency-ranking policy, and depends on the amount or percent of difference (increase or decrease) from its normal latency calculated at 318 .
  • the latency analysis module 120 continuously executes the method 300 to receive transaction latency data samples and provide a moving training window at 314 as new data samples are collected and received.
  • the latency analysis module 120 e.g., as specified by the user
  • each transaction latency data sample includes an indication as to whether it is normal or abnormal based on a determination external to the system 100 .
  • the determination of whether each data sample is normal at 312 is merely based on whether such a data sample carry a normal or abnormal indication, and the alternative embodiment proceeds in accordance to the remainder of the method 300 .
  • the methods and systems as described herein are operable to provide automated analysis of transaction or job latencies and specifically pinpoint problematic latency components in each transaction latency, based on the aforementioned component ranking, so that corrective actions may be performed in the monitored distributed system to rectify the problems in the pinpointed latency components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method for analyzing a latency of a transaction performance is provided. The method includes receiving first transaction latency data which includes a transaction latency of a transaction and a first plurality of latency components that contribute to the transaction latency, receiving a definition of normality for the transaction latency, determining whether the transaction latency is normal or abnormal based at least on the definition of normality, upon the determining that the transaction latency is abnormal, determining whether there is a sufficient amount of the first transaction latency data based on a predefined criterion, upon the determining that the amount of the first transaction latency data is sufficient, computing a normal latency for each of the first plurality of latency components; and ranking the first plurality of latency components based on a degree of abnormality of each of the first plurality of latency components, which is based on the computed normal latency for the each latency component.

Description

BACKGROUND
Monitoring transaction or job latency is one measure for determining the health of an application or service tasked with performing the transaction (or job). As referred herein, latency is a time delay between the moment a task is initiated and the moment the same task is completed. The task may be a transaction, a job, or a component of such a transaction or job. Thus, for example, a transaction latency is response time of the transaction, i.e., the time delay between the moment the transaction is initiated by an application (or service) and the moment such a transaction is completed by the application (or service). Once longer than normal latency is observed of a transaction, there is a desire to isolate the cause or primary component that is contributing to the longer latency in order to rectify the problem. However, the typical methods of looking at single measures of normal and abnormal latencies makes it difficult to accurately assess the problem because such measures are not deterministic and are affected by noise and other external influences.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
FIG. 1 illustrates a block diagram system wherein one or more embodiments may be practiced.
FIG. 2 illustrates a block diagram of a computerized system wherein one or more system components may be practiced, in accordance with one embodiment.
FIG. 3 illustrates a method for monitoring and analyzing a transaction latency, in accordance with one embodiment.
DETAILED DESCRIPTION
For simplicity and illustrative purposes, the principles of the embodiments are described by referring mainly to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent however, to one of ordinary skill in the art, that the embodiments may be practiced without limitation to these specific details. In other instances, well known methods and structures have not been described in detail so as not to unnecessarily obscure the embodiments.
Described herein are methods and systems for determining the health or status of an information technology (IT) application or service by monitoring transaction or job latencies of the application (or service), determining normal-latency behaviors of components in the transaction latencies, and identifying those components that contribute most to instances when the transaction latencies are deemed abnormal or unhealthy. The methods and systems as described herein are also operable to monitor a transaction or job latency of an IT application (or service), statistically characterizing normal latencies of components of the transaction latency, automatically recognizing or identifying statistically significant changes in the component latencies, and adapting to changes in such normal-latency behaviors over time. As referred herein, and as understood in the art, information technology, or IT, encompasses all forms of technology, including but not limited to the design, development, installation, and implementation of hardware and software information or computing systems and software applications, used to create, store, exchange and utilize information in its various forms including but not limited to business data, conversations, still images, motion pictures and multimedia presentations technology and with the design, development, installation, and implementation of information systems and applications. IT distributed environments may be employed, for example, by Internet Service Providers (ISP), web merchants, and web search engines to provide IT applications and services to users.
System
FIG. 1 illustrates a block diagram of a system 100 for monitoring and analyzing transaction or job latencies of an IT application or service, wherein an embodiment may be practiced. For simplification purposes, various embodiments are discussed herein with reference to an application and a transaction performed by such an application. However, it should be understood that any discussion regarding an application is also applicable to a service, and any discussion regarding a transaction is also applicable to a job or any other tasks performed by an application or service. The system 100 is operable to automatically induce a model of normality for a transaction latency, automatically produce a ranked list of components for abnormal occurrences, based on the degree of abnormality of each component, and automatically adapt to changes in the normality model. The system 100 may be separate from or incorporated into the distributed system(s) that it monitors.
The system 100 includes a data collection module 110 and a latency analysis module 120. In one embodiment, one or more data collectors are employed for the data collection module 110. A data collector is one or more software programs, software applications or software modules. As referred herein, a software program, application, or module includes one or more machine-coded routines, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. The data collector is used to monitor and measure the latency of transactions or jobs that are submitted to an IT application or service as implemented in a distributed system, such as an IT data center or an IT network system. Thus, it monitors the distributed system (not shown) to obtain the latency metrics (data measurements), which includes latency metrics of individual components that contribute to the total latency of a transaction or job. For example, the data collector is operable to measure total response time of a transaction and also break down the total response time into the following components: network time, connection time, server time, and transfer time that correspond to the transaction components. Each of the components may include measurable sub-components. For example, server time is made up of time spent in the web server, time spent in the application server, and time spent in the database server. Examples of possible data collectors include but are not limited to: HP Asset and OpenView softwares from Hewlett Packard Company of Palo Alto, Calif., BMC Discovery Express from BMC Software, Inc. of Houston, Tex.; and those data collectors available in the VMware CapacityPlanner software and CDAT software from IBM Corporation of Amonk, N.Y.
In one embodiment, the latency analysis module 120 is also one or more software programs, software applications or software modules. It is operable through automation to statistically characterize normal component latencies of transactions or jobs that are performed by an application/service in a distributed system, to adapt to changes in such characterized normal behavior over time, and to recognize statistically significant changes in component latencies. To that extent, the latency analysis module 120 is operable to receive or provide a definition of normality 130 for latency of some unit of work, such as a transaction or job. It is also operable to receive or provide a normality detection policy 140 for: a) characterizing the normal and abnormal latency for each component of the unit of work, in light of the definition of normality; and b) ranking the work components by their degree of abnormality by comparing the latency measures of each component during an abnormal instance to the latency measures of the same component during times of characterized normal latency.
FIG. 2 illustrates a block diagram of a computerized system 200 that is operable to be used as a platform for implementing the system 100, or any one of the modules 110 and 120 therein. The computer system 200 includes one or more processors, such as processor 202, providing an execution platform for executing software. Thus, the computerized system 200 includes one or more single-core or multi-core processors of any of a number of computer processors, such as processors from Intel, AMD, and Cyrix. As referred herein, a computer processor may be a general-purpose processor, such as a central processing unit (CPU) or any other multi-purpose processor or microprocessor. A computer processor also may be a special-purpose processor, such as a graphics processing unit (GPU), an audio processor, a digital signal processor, or another processor dedicated for one or more processing purposes. Commands and data from the processor 202 are communicated over a communication bus 204 or through point-to-point links with other components in the computer system 200.
The computer system 200 also includes a main memory 206 where software is resident during runtime, and a secondary memory 208. The secondary memory 208 may also be a computer-readable medium (CRM) that may be used to store software programs, applications, or modules that implement the modules 110 and 120 (FIG. 1) and the method 300 (FIG. 3, as described below). The main memory 206 and secondary memory 208 (and an optional removable storage unit 214) each includes, for example, a hard disk drive and/or a removable storage drive 212 representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc., or a nonvolatile memory where a copy of the software is stored. In one example, the secondary memory 408 also includes ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM), or any other electronic, optical, magnetic, or other storage or transmission device capable of providing a processor or processing unit with computer-readable instructions. The computer system 200 includes a display 220 connected via a display adapter 222, user interfaces comprising one or more input devices 218, such as a keyboard, a mouse, a stylus, and the like. However, the input devices 218 and the display 220 are optional. A network interface 230 is provided for communicating with other computer systems via, for example, a network.
Process
FIG. 3 illustrates a flow chart diagram of a method 300 for monitoring and analyzing a latency, or response time, of an IT application transaction, in accordance with one embodiment. For illustrative purposes only and not to be limiting thereof, the method 300 is discussed in the context of the system 100 illustrated in FIG. 1.
At 310, inputs are collected for the latency monitoring and analysis, the inputs collected include monitored latency data of a transaction of interest as performed by an application in a distributed system, a definition of normality for the transaction latency, and a latency-ranking policy or rule. Each of these inputs is described below.
In one embodiment, the data collection module 110 is employed to monitor and collect the transaction latency data. The collection of the transaction latency data includes a plurality of samples or traces, each collected over a predetermined or predefined timer interval (e.g., 5-minute intervals) for a given transaction and is represented by {Tn: L1, c1, c2, c3, . . . , cn}, where Tn denotes each particular time interval n, L1 denotes the collected transaction latency at time Tn, and c1 . . . cn denote the latency components of interest that contribute to the transaction latency L1 at time Tn. For example, the latency of a transaction as performed by an application in a distributed system is caused by at least a network time (c1), a connection time (c2), a server time (c3), and a transfer time (c4). The network time indicates the accumulated time for data to traverse throughout the network of the distributed system in the performance of the transaction by the application. The connection time indicates the accumulated time for the application to complete connections (e.g., handshaking protocols) to various hardware elements (e.g., servers, databases) in the distributed system in order to perform and complete the transaction. The server time indicates the accumulated time for the various hardware elements in the distributed system to perform respective tasks as assigned by the application. The transfer time indicates the time it takes for data to be transferred to the source of the transaction request as a result of the processing of the transaction. Embodiments are contemplated wherein the data collected in each sample or trace for each latency component includes a measurement that is collected once per each predefined time interval, an average of multiple measurements collected per each predefined time interval, or any other suitable statistics about the measurement for each latency component per each predefined time interval. Also, it should be understood that the transaction latency L1 may include other latency components, and each latency component may include contributing subcomponents therein. The latency analysis module 120 then receives the collected transaction latency data from the data collection module 110.
The definition of normality for the transaction latency is a predefined definition received by the latency analysis module 120. In one embodiment, this definition provides a threshold value for determining whether each received transaction latency is considered normal. For example, the definition of normality provides a threshold value of 2 seconds, wherein a latency or response time of less than 2 seconds for a given transaction is considered normal and greater than or equal to 2 seconds is considered abnormal or problematic. The definition of normality may be user defined and user input to the latency analysis module 120. However, alternative embodiments are contemplated wherein the definition of normality for the transaction latency is provided to the latency analysis module 120 based on other techniques, such as based on historical data of the distributed system. As referred herein, a user is any entity, human or otherwise, that is authorized to access the system 100, operate the system 100, modify the system 100, or perform any combination thereof. An example of a human user is a system operator or administrator. An example of an automated user is a hardware or software module operable to collect historical data of the distributed system performing the given transaction and calculate the definition of normality.
The latency-ranking policy is a predefined policy received by the latency analysis module 120. In one embodiment, this policy provides instructions on how to rank the latency components of each abnormal transaction latency based on their degree of abnormality. Examples of a latency-ranking policy include standard deviations from the mean (or norm) of each latency component, actual or relative distance from the mean, percentage change from the mean, etc.
Referring back to FIG. 3, the method 300 continues at 312, wherein the latency analysis module 120 determines whether the collected transaction latency data is normal or abnormal based on the predefined definition of normality. This determination is made for each collected sample of the transaction latency data.
At 314, if a data sample is determined to be normal, it is added to a training window.
At 316, however, if a data sample is determined to be abnormal, the latency analysis module 120 proceeds to determine whether there is a sufficient amount of training data (e.g., number of data samples) in the training window to compute statistics about the normality of the latency components in the latency transaction data. Thus, testing for sufficient amount of training data may be delayed until there is abnormal latency data to analyze. The sufficiency of the training window may be empirically set by a user based on one or more desired criteria, such as whether the training data in the training window is consistent for normal behavior patterns of each latency component of interest or whether there is enough training data for generating a normal distribution for each latency component. For example, a training window having 100 samples of transaction latency data collected over 100 time intervals is deemed sufficient for a statistical computation about the normality of the latency components. If there is not sufficient training data in the training window, the method 300 is repeated again at 310 to continue collecting additional samples of the transaction latency data until there is sufficient training data in the training window as determined at 316.
At 318, once there is sufficient training data in the training window, the latency analysis module 120 proceeds to statistically compute a normal latency for each latency component of interest in the latency transaction data. In one embodiment, this is achieved by computing a normal distribution of each latency component based on the received data samples in the training window and the mean value and standard deviation value in the normal distribution. The range of normal latency values for each latency component is then based on the mean and standard deviation values of the normal distribution of such a component as desired. For example, in a standard normal distribution, 68% of the values lie within one standard deviation of the mean, 95% within two standard deviations, and 99% within three (3) standard deviations. Thus, a latency component is considered normal if its value ranges within one, two, or three standard deviations as desired. Alternative embodiments are contemplated wherein the range of normal latency values for each latency component is based on any other desired statistics about the normal distribution of the latency component, such as percentiles of the normal distribution, or about any other desired variable, such as time, that is associated with the latency component.
At 320, once the normal latency of each latency component of interest is statistically computed, the data sample collected and determined to be abnormal at 312 is then compared against these statistical computations to rank the latency components in the new data sample based on their degree of abnormality in accordance with the latency-ranking policy collected at 310. It should be noted that the latency components in an abnormal data sample collected for analysis are of the same respective types as those latency components in the data samples of the training window in order to perform the comparison. The degree of abnormality may be set as desired by the user, as based on the latency-ranking policy, and depends on the amount or percent of difference (increase or decrease) from its normal latency calculated at 318. For example, for a latency-ranking policy based on standard deviations from the mean, if a first latency component has a value in the collected abnormal data sample that is within three standard deviations of the mean and a second latency component has a value in the collected abnormal data sample that is within two standard deviations of the mean, the first latency component is ranked at a higher abnormality level than the first latency component. Thus, the first latency component is deemed to be a bigger contributing factor to the overall abnormal latency transaction sample than the second latency component.
In one embodiment, the latency analysis module 120 continuously executes the method 300 to receive transaction latency data samples and provide a moving training window at 314 as new data samples are collected and received. Referring back to the example wherein there are 100 data samples in the training window, the latency analysis module 120 (e.g., as specified by the user) may discard the oldest five, or any desired number, normal samples in the training windows to make room for five new normal data samples, wherein the normal latency for each latency component of interest is computed again at 318 so that up-to-date ranking of the latency components is continuously performed for better accuracy of the latency analysis.
In an alternative method to the method 300, the collected inputs at 310 do not include the definition of normality. Instead, each transaction latency data sample includes an indication as to whether it is normal or abnormal based on a determination external to the system 100. Thus, in the alternative method, the determination of whether each data sample is normal at 312 is merely based on whether such a data sample carry a normal or abnormal indication, and the alternative embodiment proceeds in accordance to the remainder of the method 300.
Accordingly, the methods and systems as described herein are operable to provide automated analysis of transaction or job latencies and specifically pinpoint problematic latency components in each transaction latency, based on the aforementioned component ranking, so that corrective actions may be performed in the monitored distributed system to rectify the problems in the pinpointed latency components.
What has been described and illustrated herein is an embodiment along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Those skilled in the art will recognize that many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims—and their equivalents—in which all terms are meant in their broadest reasonable sense unless otherwise indicated.

Claims (20)

1. A method for analyzing a latency of a transaction performance, comprising:
receiving first transaction latency data which includes a transaction latency of a transaction and a first plurality of latency components that contribute to the transaction latency;
receiving a definition of normality for the transaction latency;
determining whether the transaction latency is normal or abnormal based at least on the definition of normality;
upon the determining that the transaction latency is abnormal, determining whether there is a sufficient amount of the first transaction latency data based on a predefined criterion;
upon the determining that the amount of the first transaction latency data is sufficient, computing a normal latency for each of the first plurality of latency components; and
ranking the first plurality of latency components based on a degree of abnormality of each of the first plurality of latency components, which is based on the computed normal latency for the each latency component.
2. The method of claim 1, further comprising:
receiving a policy for ranking the first plurality of latency components based on the degree of abnormality.
3. The method of claim 2, wherein ranking the first plurality of latency components comprises:
ranking the first plurality of latency components based on the policy for ranking.
4. The method of claim 1, wherein receiving the first transaction latency data comprises:
receiving a plurality of transaction latency data samples for a plurality of predefined time intervals, each sample includes therein the transaction latency and the first plurality of latency components monitored at each predefined time interval.
5. The method of claim 1, wherein receiving the definition of normality includes:
receiving a predefined threshold for determining a normality of the transaction latency data.
6. The method of claim 1, wherein receiving the policy for ranking the first plurality of latency components based on the degree of abnormality comprises:
receiving an instruction to rank each of the first plurality of latency components based on an amount of deviation of the each latency component from the computed normal latency of the each latency component.
7. The method of claim 1, wherein determining whether there is a sufficient amount of the transaction latency data comprises:
determining whether the first transaction latency data is sufficient for computing a normal latency for the at least one latency component therein as the predefined criterion.
8. The method of claim 4, wherein determining whether the transaction latency is normal or abnormal includes determining whether the transaction latency of each of the plurality of transaction latency data samples is normal or abnormal; and the method further comprises:
upon the determining that the transaction latency of one of the plurality of transaction latency data samples is normal, adding the one transaction latency data sample to a training window for the determining of whether there is a sufficient amount of the first transaction latency data.
9. The method of claim 1, wherein computing the normal latency for each of the first plurality of latency components comprises:
computing the normal latency based at least on a mean value of a normal distribution for each of the first plurality of latency components.
10. The method of claim 1, further comprising:
determining how much contribution of each of the first plurality of latency components to the abnormal transaction latency of the second transaction latency data based on the ranking.
11. A method for analyzing a latency of a transaction performance, comprising:
receiving first transaction latency data which includes:
a) a transaction latency of a transaction;
b) a first indication that the transaction latency is normal or abnormal; and
c) a first plurality of latency components that contribute to the transaction latency;
determining whether the transaction latency is normal or abnormal based on the first indication in the first transaction latency data;
upon the determining that the transaction latency is abnormal, determining whether there is a sufficient amount of the transaction latency data in a training window;
upon the determining that the amount of the transaction latency data is sufficient in the training window, computing a normal latency for each of the first plurality of latency components based on the data in the training window; and
ranking the first plurality of latency components based on a degree of abnormality of each of the first plurality of latency components, which is based on the computed normal latency for the each latency component.
12. The method of claim 11, wherein receiving the first transaction latency data comprises:
receiving a first plurality of transaction latency data samples, each monitored at a predefined time interval and includes therein the transaction latency and the first plurality of latency components monitored at the predefined time interval.
13. The method of claim 12, wherein determining whether the transaction latency is normal comprises:
determining whether each of the first plurality of transaction latency data samples is normal based on the first indication in the each of the first plurality of transaction data samples.
14. The method of claim 13, wherein determining whether there is a sufficient amount of the first transaction latency data in a training window comprises:
determining whether there is a sufficient number of samples in the first plurality of transaction latency data samples that have therein the first indication of normal.
15. The method of claim 14, further comprising:
moving the training window by replacing a predetermined number of oldest data samples therein with a corresponding number of new data samples received subsequent to the receiving the first transaction latency data.
16. The method of claim 15, wherein computing a normal latency for each of the first plurality of latency components comprises:
computing a normal latency for each of the first plurality of latency components based on the data in the moving training window.
17. The method of claim 16, wherein the degree of abnormality of one of the second plurality of latency components is further based on a value of the latency component in reference to a computed normal latency of the one latency component as based on a latency-ranking policy.
18. The method of claim 17, wherein the first transaction latency data further includes at least one latency sub-component of one of the first plurality of latency components, the at least one latency sub-component contributes to the latency of both the one latency component and the transaction latency in the first transaction latency data.
19. A computer readable medium on which is encoded computer-executable programming code that includes computer execution instructions to:
receive first transaction latency data which includes a transaction latency of a transaction and a first plurality of latency components that contribute to the transaction latency;
receive a definition of normality for the transaction latency;
determine whether the transaction latency is normal or abnormal based at least on the definition of normality;
determine whether there is a sufficient amount of the first transaction latency data based on a predefined criterion upon the determining that the transaction latency is abnormal;
compute a normal latency for each of the first plurality of latency components upon the determining that the amount of the first transaction latency data is sufficient; and
rank the first plurality of latency components based on a degree of abnormality of each of the first plurality of latency components, which is based on the computed normal latency for the each latency component.
20. The computer-readable medium of claim 19, wherein the computer execution instructions to receive the first transaction latency data include:
computer-execution instructions to receive a plurality of transaction latency data samples for a plurality of predefined time intervals, each sample includes therein the transaction latency and the first plurality of latency components monitored at each predefined time interval.
US11/784,611 2007-04-09 2007-04-09 Analyzing and application or service latency Active 2030-02-02 US7921410B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/784,611 US7921410B1 (en) 2007-04-09 2007-04-09 Analyzing and application or service latency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/784,611 US7921410B1 (en) 2007-04-09 2007-04-09 Analyzing and application or service latency

Publications (1)

Publication Number Publication Date
US7921410B1 true US7921410B1 (en) 2011-04-05

Family

ID=43805955

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/784,611 Active 2030-02-02 US7921410B1 (en) 2007-04-09 2007-04-09 Analyzing and application or service latency

Country Status (1)

Country Link
US (1) US7921410B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014138894A1 (en) * 2013-03-15 2014-09-18 Imagine Communications Corp. Systems and methods for controlling branch latency within computing applications
US9647916B2 (en) 2012-10-27 2017-05-09 Arris Enterprises, Inc. Computing and reporting latency in priority queues
WO2019046996A1 (en) * 2017-09-05 2019-03-14 Alibaba Group Holding Limited Java software latency anomaly detection
US10346292B2 (en) * 2013-11-13 2019-07-09 Microsoft Technology Licensing, Llc Software component recommendation based on multiple trace runs
US11463361B2 (en) 2018-09-27 2022-10-04 Hewlett Packard Enterprise Development Lp Rate adaptive transactions

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5872976A (en) * 1997-04-01 1999-02-16 Landmark Systems Corporation Client-based system for monitoring the performance of application programs
US6061722A (en) * 1996-12-23 2000-05-09 T E Network, Inc. Assessing network performance without interference with normal network operations
US6374371B1 (en) * 1998-03-18 2002-04-16 Micron Technology, Inc. Method and apparatus for monitoring component latency drifts
US20020120727A1 (en) * 2000-12-21 2002-08-29 Robert Curley Method and apparatus for providing measurement, and utilization of, network latency in transaction-based protocols
US20030023716A1 (en) * 2001-07-25 2003-01-30 Loyd Aaron Joel Method and device for monitoring the performance of a network
US20030056200A1 (en) * 2001-09-19 2003-03-20 Jun Li Runtime monitoring in component-based systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061722A (en) * 1996-12-23 2000-05-09 T E Network, Inc. Assessing network performance without interference with normal network operations
US5872976A (en) * 1997-04-01 1999-02-16 Landmark Systems Corporation Client-based system for monitoring the performance of application programs
US6374371B1 (en) * 1998-03-18 2002-04-16 Micron Technology, Inc. Method and apparatus for monitoring component latency drifts
US20020120727A1 (en) * 2000-12-21 2002-08-29 Robert Curley Method and apparatus for providing measurement, and utilization of, network latency in transaction-based protocols
US20030023716A1 (en) * 2001-07-25 2003-01-30 Loyd Aaron Joel Method and device for monitoring the performance of a network
US20030056200A1 (en) * 2001-09-19 2003-03-20 Jun Li Runtime monitoring in component-based systems

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Myung-Sup Kim et al., "A Flow-based Method for Abnormal Network Traffic Detection", Apr. 2004. *
Sujata Benerjee et al., "Network Latency Optimizations in Distributed Database Systems", Feb. 1998. *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9647916B2 (en) 2012-10-27 2017-05-09 Arris Enterprises, Inc. Computing and reporting latency in priority queues
WO2014138894A1 (en) * 2013-03-15 2014-09-18 Imagine Communications Corp. Systems and methods for controlling branch latency within computing applications
US9182949B2 (en) 2013-03-15 2015-11-10 Imagine Communications Corp. Systems and methods for controlling branch latency within computing applications
US10346292B2 (en) * 2013-11-13 2019-07-09 Microsoft Technology Licensing, Llc Software component recommendation based on multiple trace runs
WO2019046996A1 (en) * 2017-09-05 2019-03-14 Alibaba Group Holding Limited Java software latency anomaly detection
US11463361B2 (en) 2018-09-27 2022-10-04 Hewlett Packard Enterprise Development Lp Rate adaptive transactions

Similar Documents

Publication Publication Date Title
US8095830B1 (en) Diagnosis of system health with event logs
US8156377B2 (en) Method and apparatus for determining ranked causal paths for faults in a complex multi-host system with probabilistic inference in a time series
US8069370B1 (en) Fault identification of multi-host complex systems with timesliding window analysis in a time series
US20100324869A1 (en) Modeling a computing entity
US8230262B2 (en) Method and apparatus for dealing with accumulative behavior of some system observations in a time series for Bayesian inference with a static Bayesian network model
US8051162B2 (en) Data assurance in server consolidation
US8291263B2 (en) Methods and apparatus for cross-host diagnosis of complex multi-host systems in a time series with probabilistic inference
US7444263B2 (en) Performance metric collection and automated analysis
US7502971B2 (en) Determining a recurrent problem of a computer resource using signatures
US20170104658A1 (en) Large-scale distributed correlation
US8224624B2 (en) Using application performance signatures for characterizing application updates
Jiang et al. Efficient fault detection and diagnosis in complex software systems with information-theoretic monitoring
US20020116441A1 (en) System and method for automatic workload characterization
US20140195860A1 (en) Early Detection Of Failing Computers
US7184935B1 (en) Determining and annotating a signature of a computer resource
WO2008098631A2 (en) A diagnostic system and method
US8250408B1 (en) System diagnosis
US10360140B2 (en) Production sampling for determining code coverage
US20050049901A1 (en) Methods and systems for model-based management using abstract models
US20090307347A1 (en) Using Transaction Latency Profiles For Characterizing Application Updates
US20050107997A1 (en) System and method for resource usage estimation
WO2012142144A2 (en) Assessing application performance with an operational index
US7921410B1 (en) Analyzing and application or service latency
Zheng et al. Hound: Causal learning for datacenter-scale straggler diagnosis
US9397921B2 (en) Method and system for signal categorization for monitoring and detecting health changes in a database system

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SYMONS, JULIE A.;COHEN, IRA;WADE, GERALD T.;AND OTHERS;SIGNING DATES FROM 20070402 TO 20070409;REEL/FRAME:019212/0772

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

AS Assignment

Owner name: ENTIT SOFTWARE LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP;REEL/FRAME:042746/0130

Effective date: 20170405

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., DELAWARE

Free format text: SECURITY INTEREST;ASSIGNORS:ENTIT SOFTWARE LLC;ARCSIGHT, LLC;REEL/FRAME:044183/0577

Effective date: 20170901

Owner name: JPMORGAN CHASE BANK, N.A., DELAWARE

Free format text: SECURITY INTEREST;ASSIGNORS:ATTACHMATE CORPORATION;BORLAND SOFTWARE CORPORATION;NETIQ CORPORATION;AND OTHERS;REEL/FRAME:044183/0718

Effective date: 20170901

FEPP Fee payment procedure

Free format text: 7.5 YR SURCHARGE - LATE PMT W/IN 6 MO, LARGE ENTITY (ORIGINAL EVENT CODE: M1555); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

AS Assignment

Owner name: MICRO FOCUS LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:ENTIT SOFTWARE LLC;REEL/FRAME:050004/0001

Effective date: 20190523

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12

AS Assignment

Owner name: MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0577;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:063560/0001

Effective date: 20230131

Owner name: NETIQ CORPORATION, WASHINGTON

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: MICRO FOCUS SOFTWARE INC. (F/K/A NOVELL, INC.), WASHINGTON

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: ATTACHMATE CORPORATION, WASHINGTON

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: SERENA SOFTWARE, INC, CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: MICRO FOCUS (US), INC., MARYLAND

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: BORLAND SOFTWARE CORPORATION, MARYLAND

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131