CN107786616A - Main frame intelligent monitor system based on high in the clouds - Google Patents

Main frame intelligent monitor system based on high in the clouds Download PDF

Info

Publication number
CN107786616A
CN107786616A CN201610788477.XA CN201610788477A CN107786616A CN 107786616 A CN107786616 A CN 107786616A CN 201610788477 A CN201610788477 A CN 201610788477A CN 107786616 A CN107786616 A CN 107786616A
Authority
CN
China
Prior art keywords
data
module
resource
physical
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610788477.XA
Other languages
Chinese (zh)
Inventor
黄红娟
刘苏苏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Blue Union Data And Application Research Institute Co Ltd
Original Assignee
Jiangsu Blue Union Data And Application Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Blue Union Data And Application Research Institute Co Ltd filed Critical Jiangsu Blue Union Data And Application Research Institute Co Ltd
Priority to CN201610788477.XA priority Critical patent/CN107786616A/en
Publication of CN107786616A publication Critical patent/CN107786616A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0213Standardised network management protocols, e.g. simple network management protocol [SNMP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention proposes the main frame intelligent monitor system based on high in the clouds, the system includes foreground presentation layer, background service layer physical host collection of resources module, virtual resource migrates monitoring module, analyze linkage strategy module, the data stock of the system uses NoSQL, physical server monitoring Foreground Data is shown under cloud platform shows in graphical form, the physical host installs cloud computing software, red line represents the physical memory utilization rate of main frame, blue line represents the CPU usage of main frame, we can also set time started and end time to inquire about, it is novel in design, it is a good design, there is very much market promotion prospect.

Description

Main frame intelligent monitor system based on high in the clouds
Technical field
The present invention relates to technical field of software development, more particularly to the main frame intelligent monitor system based on high in the clouds.
Background technology
Enterprise's private clound monitoring management platform is the core system for realizing enterprise IT resource informationization management, it is intended that it Can installation requirements implement IT planning strategies, IT resources operation exception point can be found in advance by it, so as to we make it is appropriate Processing, the normal operation of safeguards system.The final reliabilty and availability for improving cloud platform, ensure the use cloud meter that user trusts The resource of calculation.
Configuration item attribute storehouse would generally be established in CMDB, a series of attribute (i.e. configuration item category defined in attribute library Property), such as model, U numbers, memory size, virtualization type etc., and specify that each attribute Value Types (numerical value, text, Enumerate) and constraint rule.Attribute in attribute library can be grouped to maintain easily and manage, for example, base attribute group, Asset Attributes group, specification set of properties, technical indicator group etc..Attribute in configuration item attribute storehouse is available for multiple configuration item templates to make With, and dynamic expansion can be needed according to business at any time.
Key technical index storehouse would generally be established in CMDB, a series of crucial skill defined in key technical index storehouse Art index, such as:CPU usage, database concurrency connection number, business transaction amount etc., and specify that each key technology refers to Target index classification, unit, data source, default sample frequency and ways of presentation etc..Crucial skill in key technical index storehouse Art index is available for multiple configuration item templates to use, and can need dynamic expansion according to business at any time.
Key technical index includes base values and polymerization index, and key technical index is divided into base values item and gathered After closing index item, namely the hierarchical relationship between index is established, a base values item there can be multiple related polymerizations Index item.
It polymerize the data of index item typically in the data of base values item by aggregation strategy (such as maximum, minimum Value, average value, section ratio etc.) it is calculated., can also be in some others polymerization for some complicated polymerization index item Further it is calculated on index item by polymerization, you can to allow multilayer polymeric.Refer to based on key technical index is distinguished After marking item and polymerization index item, namely the hierarchical relationship between index is established, a base values item there can be multiple phases The polymerization index item of pass.The dimension being polymerize according to technical indicator, polymerization index can be further subdivided into following two class:Time aggregation And service aggregating.
For most of technical indicator, generally only need based on the time or polymerization calculating is carried out based on service, but for certain It a little technical indicators, then may need to carry out polymerization calculating in 2 dimensions of time and service, and be carried out in 2 dimensions poly- The type of total calculation is also different.If a technical indicator value both needs to carry out polymerization calculating based on the time, need to be based on again Service carries out polymerization calculating, then needs to set the order between two converging operations, for some technical indicators, converging operation Order it is different, the intension of its value calculated also will be completely different.Therefore, if a polymerization index needs simultaneously The polymerization for carrying out two dimensions of time and service calculates, it is also necessary to specifies the polymerization sequence of both.
Initial data segmentation and slicing is sent into aggregation engine and carries out data by monitoring management platform by built-in aggregation engine Polymerization, greatly improves the polymerization to mass data, and provide a variety of expanded configuration modes.Initial data is supported to gather The statistics such as conjunction calculate, and the data of collection can further be calculated according to time dimension or service dimension, according to the time Dimension supports polymerization cycle definition, supports the polymerization such as summation, maximum, minimum, average, counting, section ratio to calculate, according to service It polymerize for servicing strongly connected technical indicator (such as the node data of application server cluster polymerize);Also support both to have needed Time aggregation needs the situation of service aggregating again, and can be with Adjustable calculation order.
Monitoring management platform realizes the various dimensions diagnostic analysis function based on Data-Link and event chain using regulation engine.Will The diagnosis of page configuration and analysis rule, with reference to the relation between configuration item, the mode of regular Dynamic Execution improves rule Versatility and variational requirement.By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and failure Diagnosis.Can be classified and not carry out alarming and managing, for example, seriously, mistake, warning, information etc.;Judgment model includes monodrome and judged (such as Be more than, less than etc.), sampled probability judge that (such as have in continuous 10 times 8 inferior), sampling interval judge (such as to account for whole day sampling ratio Rate etc.) etc. various ways, to eliminate accidental point, the similar repeated events of merger;The and combinations of multiple analysis linkage strategies are provided; According to configuration strategy while O&M event can be produced include operation management component;Automation can be triggered according to configuration strategy Flow carries out emergency processing.
In summary, the defects of existing for prior art, it is accordingly required in particular to the main frame intelligent monitor system based on high in the clouds, with Solve the deficiencies in the prior art.
The content of the invention
It is an object of the invention to provide the main frame intelligent monitor system based on high in the clouds, the convenient performance to main frame is supervised Survey, automaticity is excellent.
The present invention for solve its technical problem the technical scheme adopted is that
Main frame intelligent monitor system based on high in the clouds, the system include foreground presentation layer, background service layer physical host Collection of resources module, virtual resource migration monitoring module, analysis linkage strategy module, the data stock of the system use NoSQL;
Foreground presentation layer mainly includes system administration, strategy configures, data check module;
Background service layer is matched somebody with somebody including data acquisition, data receiver Queue module, analysis linkage strategy, event handling, resource Put management, data receiver Queue module, data acquisition module;
Physical host collection of resources module monitors number using the physical host under Pull type collection cloud computing platforms According to monitoring server will by the server monitoring data under the management of api interface active inquiry cloud control centre, cloud control centre Arrange data and return to monitoring server;
Virtual resource migration monitoring module often changes in application program to the demand of resource, and infrastructure must be able to Enough very short time endoadaptation resource requirement changes, this just needs a kind of simple automatic, configurable without needing keeper excessively to intervene Way to manage, virtual machine Autonomic Migration Framework function is that distributed resource scheduling needs, it can Continuous optimization cloud computing put down Platform, virtual machine is migrated between more physical servers automatically, balanced more physical servers load, moved in virtual machine During shifting, its operation system normally externally provides clothes, will not damage any data and business continuance;
After analysis linkage strategy module monitored device uploads monitoring data, NoSQL databases are arrived by processing storage In, analysis module will be analyzed data, according to the strategy of setting, trigger various different disposal flows.
Further, system data flowing is upward by processing from underlying network finally to be shown and call by foreground Displaying, it is nethermost be that data adopt layer, including physical server data acquisition, the collection of resource dynamic migration, network equipment number According to acquisition module, physical server acquisition module is divided into the data acquisition and traditional platform data acquisition of cloud platform again, this be by Under the collection of physical server and traditional platform there is bigger difference in the collection of physical server under cloud platform, thing under cloud platform Manage server and one virtual platform efficiently simplified is installed by bare metal, to realize resource pool, business cloud platform provides On from the safety of first floor system and it is stable unless, all it is not recommended that monitoring Agent in cloud base layer system installation third party in user, push away Recommending is obtained from cloud control centre, and the physical server data acquisition of traditional platform, by Agent patterns, Agent patterns can prop up More monitor control indexs are held, such as process monitoring, using monitoring, and data accuracy is high;It can also use without Agent patterns, lead to SNMP, SSH, Syslog etc., the collection of resource dynamic migration, primarily to the dynamic location information of resource under cloud platform is gathered, Cloud computing technology is by physical resource pool so that virtual function in more physical hosts dynamic migration, it is necessary to understand in real time empty The position of plan machine thus need to monitor, support VMware and RHEV platforms most of collection, network equipment data acquisition pass through SNMP type collections;
, it is necessary to upload onto the server to handle after the completion of data acquisition, in order to tackle substantial amounts of I/O request, number has been abstracted , can be because of the demand to substantial amounts of data summarization to server according to receiving queue module, the received team's module of crossing of monitoring data will Initial data has been delivered to data memory module, and data memory module supports relational data and non-relational data storage, this In monitoring data be all stored in non-relational database, resources configuration management module is the core of monitor supervision platform, to monitor In platform physical equipment can abstract representation for more than one or multiple configuration items combination, configuration item is designed with configuration item category Relation, key technical index, configuration view module between property, configuration item, configuration item attribute is for example:Model, U numbers, memory size, Virtualize type etc., and specify that the Value Types (numerical value, text, enumerating) and constraint rule of each attribute, configuration item with The relation of configuration item, including:Host, connection, using, belong to, manage, and according to business dynamic can be needed to expand at any time Exhibition, a series of key technical index defined in key technical index storehouse, such as:CPU usage, database concurrency connection Number, business transaction amount etc., a series of resource view defined in resource view storehouse, such as computer room view, rack view, resource Pond view, configuration item relational view, capacity view etc., while specify that the component relationship of each resource view, displaying framework And constraint rule;
By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and fault diagnosis, interlocking module bag Monodrome judge module is included, as internal memory is more than 80%, consecutive sample values are handled for judging single key index in continuous n times Value is more than or less than some threshold values, and after setting key index analysis strategy, tactful threshold values, by trigger event, therefore has everywhere Event processing module, monitor event can be created, and a variety of notice approach can be provided, including mail, short message, wechat etc.;
The superiors are presentation layer, and the alternation of bed of system, including system administration, and strategy configuration, data check three big moulds Block, system administration include the submodules such as user management, rights management again, and strategy configuration includes acquisition strategies, monitoring strategies etc., number It is investigated that seeing in module, user can check the current state of various kinds of equipment, the event of triggering, historical information, and energy specified time Section generating run state report.
Further, the REST API used when virtual resource migrates the acquisition method of monitoring module by needing to provide collection Interface IP address, the username and password that connecting interface certification needs, SSL safety certification certificates, if ignore the visa peace of certificate Quan Xing, it is the monitoring information of the current acquirement of a physical server per a line in the result set of return, Section 1 is physics clothes Configuration item title of the business device in CMDB, Section 2 is CPU usage, and Section 3 is memory usage;
In enterprise's private clound, by opening VMware vSphere DRS (Distributed Resource Scheduler) function, loaded by across main frame autobalance, according to service priority Adjustable calculation resource, during low-load Main frame is closed to reduce energy consumption, in balanced load on host computers, vMotion technologies can be used;
VMware vMotion refer to real-time migration of virtual machine function, entirely can will transported in the case of non-stop-machine Capable virtual machine is moved to another physical server from a physical server, and virtual machine can retain its network identity and company Connect, can be by the movable internal memory of the high speed network transmission virtual machine and accurate so that it is guaranteed that realize seamless transition process Execution state, allow the virtual machine from the vSphere main frames of source operation be switched to and run on target vSphere main frames;
When physical host failure, under VMware HA protection, the virtual machine of the physical host will be in physical host collection Reruned in other main frames in group, it is necessary to know when physical host failure, influence those virtual machines, it is necessary to monitor VMotion information in VMware environment.
Further, analyzing the analysis linkage Engine objects of linkage strategy resume module includes base values, including:CPU's Instantaneous value;Also polymerization index, wherein CPU daily mean are included, analysis engine supports pattern there are 4 kinds:
1) monodrome judgment model, such as:Equipment state!The per day load of=OK, CPU>=45%;
2) continuous sampling point probabilistic determination pattern is supported, such as:The instantaneous utilization rates of CPU contact 10 times in 8 times>=80%, should Contacted 3 times with server thread>=25;
3) sampling interval ratio judgment model, such as:Store utilization rate>=60% accounts for whole month sampling ratio>=50%, CPU Utilization rate>=80% accounts for whole day sampling ratio>=30%;
4) advanced combination judgment model, such as:Table space state!=Normal AND table space states!=Backup, file System utilization rate>=85%AND remaining spaces<=10G;
The processing action that analysis engine is supported has three kinds;
The first creates the alarm of different stage, and warning level includes Fatal, Error, Warning, Info;
Second, O&M flow events are created, are docked with O&M workflow management platform, the O&M time is closed with alarm formation Ring;
The third triggering automation mechanized operation, is docked, automation mechanized operation forms closed loop, example with alarm with automatic management platform Such as, memory usage is worked as>=85% accounts for all sampling ratios>=50%, it is automatic to expand the 20% of internal memory;
For the monitoring data of physical server, the analysis linkage strategy of setting is as follows:
CPU usage>=50% and read-around ratio>=3
Memory usage>=75% and read-around ratio>=3
When the data in the CPU collections of physical host continuous 3 times both greater than 50%, accused by short message and mail he It is alert, and event is generated in monitor supervision platform, keeper's processing is reminded, the analysis linkage strategy for internal memory is arranged to continuously here 3 times internal memory surpasses 75%;
After physical host data acquisition interface, the host data that is collected from interface routine is defined in monitor supervision platform Key technical index, the type and display content of index, data retention time, frequency acquisition, chart display type can be defined;
In acquisition strategies configuration, acquisition strategies title is defined, the acquisition mode that acquisition interface is supported in monitor supervision platform has: bat、HeartBeat、HTTP、Java、JDBC、JMX、Log、Mail、PING、RmoteSSH、SHELL、SnmpGet、 SnmpTrap, SnmpWalk, Syslog, Telnet, VBS, Web Services, wmic grab type, collection script, at data Manage script.
It is an advantage of the current invention that physical server monitoring Foreground Data show and shown in graphical form under cloud platform, this Physical host installs cloud computing software, and red line represents the physical memory utilization rate of main frame, and blue line represents that the CPU of main frame is used Rate, we can also set time started and end time to inquire about, novel in design, be a good design, have very much Market promotion prospect.
Brief description of the drawings
Describe the present invention in detail with reference to the accompanying drawings and detailed description:
Fig. 1 is that the present invention proposes system business module map;
Fig. 2 is that physical server gathers Organization Chart under cloud computing platform of the present invention;
Embodiment
In order that the technical means, the inventive features, the objects and the advantages of the present invention are easy to understand, tie below Diagram and specific embodiment are closed, the present invention is expanded on further.
As shown in figure 1, the main frame intelligent monitor system based on high in the clouds, the system include foreground presentation layer, background service Layer physical host collection of resources module, virtual resource migration monitoring module, analysis linkage strategy module, the data stock of the system Using NoSQL;
Foreground presentation layer mainly includes system administration, strategy configures, data check module;
Background service layer is matched somebody with somebody including data acquisition, data receiver Queue module, analysis linkage strategy, event handling, resource Put management, data receiver Queue module, data acquisition module;
Referring to Fig. 2, physical host collection of resources module is using the physics master under Pull type collection cloud computing platforms Machine monitoring data, monitoring server pass through the server monitoring data under the management of api interface active inquiry cloud control centre, cloud control Center processed returns to monitoring server by data are arranged;
Virtual resource migration monitoring module often changes in application program to the demand of resource, and infrastructure must be able to Enough very short time endoadaptation resource requirement changes, this just needs a kind of simple automatic, configurable without needing keeper excessively to intervene Way to manage, virtual machine Autonomic Migration Framework function is that distributed resource scheduling needs, it can Continuous optimization cloud computing put down Platform, virtual machine is migrated between more physical servers automatically, balanced more physical servers load, moved in virtual machine During shifting, its operation system normally externally provides clothes, will not damage any data and business continuance;
After analysis linkage strategy module monitored device uploads monitoring data, NoSQL databases are arrived by processing storage In, analysis module will be analyzed data, according to the strategy of setting, trigger various different disposal flows.
Further, system data flowing is upward by processing from underlying network finally to be shown and call by foreground Displaying, it is nethermost be that data adopt layer, including physical server data acquisition, the collection of resource dynamic migration, network equipment number According to acquisition module, physical server acquisition module is divided into the data acquisition and traditional platform data acquisition of cloud platform again, this be by Under the collection of physical server and traditional platform there is bigger difference in the collection of physical server under cloud platform, thing under cloud platform Manage server and one virtual platform efficiently simplified is installed by bare metal, to realize resource pool, business cloud platform provides On from the safety of first floor system and it is stable unless, all it is not recommended that monitoring Agent in cloud base layer system installation third party in user, push away Recommending is obtained from cloud control centre, and the physical server data acquisition of traditional platform, by Agent patterns, Agent patterns can prop up More monitor control indexs are held, such as process monitoring, using monitoring, and data accuracy is high;It can also use without Agent patterns, lead to SNMP, SSH, Syslog etc., the collection of resource dynamic migration, primarily to the dynamic location information of resource under cloud platform is gathered, Cloud computing technology is by physical resource pool so that virtual function in more physical hosts dynamic migration, it is necessary to understand in real time empty The position of plan machine thus need to monitor, support VMware and RHEV platforms most of collection, network equipment data acquisition pass through SNMP type collections;
, it is necessary to upload onto the server to handle after the completion of data acquisition, in order to tackle substantial amounts of I/O request, number has been abstracted , can be because of the demand to substantial amounts of data summarization to server according to receiving queue module, the received team's module of crossing of monitoring data will Initial data has been delivered to data memory module, and data memory module supports relational data and non-relational data storage, this In monitoring data be all stored in non-relational database, resources configuration management module is the core of monitor supervision platform, to monitor In platform physical equipment can abstract representation for more than one or multiple configuration items combination, configuration item is designed with configuration item category Relation, key technical index, configuration view module between property, configuration item, configuration item attribute is for example:Model, U numbers, memory size, Virtualize type etc., and specify that the Value Types (numerical value, text, enumerating) and constraint rule of each attribute, configuration item with The relation of configuration item, including:Host, connection, using, belong to, manage, and according to business dynamic can be needed to expand at any time Exhibition, a series of key technical index defined in key technical index storehouse, such as:CPU usage, database concurrency connection Number, business transaction amount etc., a series of resource view defined in resource view storehouse, such as computer room view, rack view, resource Pond view, configuration item relational view, capacity view etc., while specify that the component relationship of each resource view, displaying framework And constraint rule;
By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and fault diagnosis, interlocking module bag Monodrome judge module is included, as internal memory is more than 80%, consecutive sample values are handled for judging single key index in continuous n times Value is more than or less than some threshold values, and after setting key index analysis strategy, tactful threshold values, by trigger event, therefore has everywhere Event processing module, monitor event can be created, and a variety of notice approach can be provided, including mail, short message, wechat etc.;
The superiors are presentation layer, and the alternation of bed of system, including system administration, and strategy configuration, data check three big moulds Block, system administration include the submodules such as user management, rights management again, and strategy configuration includes acquisition strategies, monitoring strategies etc., number It is investigated that seeing in module, user can check the current state of various kinds of equipment, the event of triggering, historical information, and energy specified time Section generating run state report.
In conventional monitoring systems, it is divided into two according to monitored device and monitoring server communication interaction different mode Class.One kind is Push patterns, and one kind is Pull patterns.In Push patterns, monitored device is actively supervised to monitoring server transmission Data are controlled, therefore the pattern has and is referred to as active monitoring mode;It is that monitoring server occurs to monitored device in Pull patterns After information inquiring request, then from being sent to monitoring server after monitored device gathered data, therefore it is referred to as passive monitoring mould Formula.Push, Pull pattern respectively have advantage and disadvantage, and Push patterns real-time is good, and resource consumption is higher, and autgmentability is strong, complex; Push pattern real-times are poor, but resource consumption is relatively low.
Traditional physical server monitoring resource is gathered mostly by disposing agent, is gathered by monitoring agent monitored Facility information, is then passed to monitoring server, and what we obtained under traditional mode is separate unit physical server information.But for The monitoring of physical server resource under cloud computing platform, then traditional agent patterns can not be used, this will reduce architecture The stability and security of layer.By Web Services interfaces, (we can refer to VMware here with cloud control centre for we VCenter or RHEV-M) connection, obtain physical server CPU, the real-time consumption data of internal memory.We design unification herein Cloud platform under physical server collection of resources module, the resource informations of more physical servers is once gathered, with unified letter Breath form returns to monitor supervision platform.Below the physical services under VMware and RedHat enterprise-level virtual platforms will be introduced respectively Device collection of resources.
Cloud computing platform would generally include the hardware and software of polytype, various structures and a variety of brands, therefore supervise Control platform will can support the diversity of cloud computing platform software and hardware, have good compatibility to various software and hardwares, can be simultaneously Monitor the hardware device and software of isomery.Therefore we need the various modules of abstract design.
Physical server monitoring collection index is as follows under our abstract cloud computing platforms:
The acquisition index of table 1
Acquisition index title: Physical server resource utilization
Configuration item template: { X86 servers }
Configuration item attribute: { OS Type }=ESX/RHEV-H
Gather entrance: { cloud control centre }->{ management net IP address }
Acquisition interface type: [REST API]
Frequency acquisition: Every 5 minutes (can dynamic configuration)
Acquisition method describes, it would be desirable to provides the REST api interfaces address used during collection, connecting interface certification needs The username and password wanted, SSL safety certification certificates, if ignore the visa security of certificate, it is each in the result set of return Row is the monitoring information of the current acquirement of a physical server, and Section 1 is configuration key name of the physical server in CMDB Claim, Section 2 is CPU usage, and Section 3 is memory usage.Such as following table:
The acquisition method information of table 2
After basic data is collected, it will be stored in NoSQL databases by processing, the NoSQL that we use herein The odds ratio of database is faster than the inquiry velocity of traditional database under large batch of data cases.
We first look at that how monitoring resource platform is by VMware Web Services interfaces acquisition physics herein The initial data of server, the JAVA that the JAVA networks dependence class and VMware of quoting exploitation needs are supplied to by we first SDK relies on class, java.net and java.rmi bags are to use network access, and we will connect Web Services interfaces and handle phase Exception is closed, com.vmware.vim25 is the vSphere development kits of VMware officials.
After we refer to the dependence bag of JAVA networks and VMware dependence bag, define related JAVA classes and state member Variable.
We define Web Services access URL character variable url, define and access Web Services interfaces The user name character variable uesrName that certification needs, the code characters variable password that certification needs, and whether prompt Boolean variable help default values are helped to be arranged to false.
We define parameter testing function getConnectionParameters below, are inputted when being called for checking Whether parameter meets the requirements, if parameter is incorrect, output function uses prompting.Its processing procedure from the point of view of us, we expect Input parameter be -- URL url, -- USERNAME username, -- PASSWORD passwd.We define one first Individual integer variable ai, and initialization value is 0, definition character variable param, for recording parameters, character variable val recording parameterses The value of setting.When parameter character set length is more than ai, takes out first element and go to space as param, ai+1 or small When parameter character set length, using second parameter concentrate element take out be used as val, now we calling EqualsIgnoreCase ignorecase comparison functions, if after param ignorecases with " -- help " character strings are equal, So help values are arranged to true by us, and exit while circulations.Exit circulation will perform check input parameter whether be Sky, now worthwhile so to be empty therefore prompt output using function.If not equal to " -- help ", we again this judge; Param after ignorecase with " -- URL " character strings are equal, at the same val value be not with " -- " beginning, and be worth for sky , if it is eligible we url value is arranged to val;As it is ineligible we continue to differentiate, method is similar to url's Discriminant approach, simply character string be exchanged with " -- USERNAME " and " -- PASSWORD ".We put val after the comparison of a wheel For sky, and by ai+2, compare into next round, until handling all parameters.Handle all input parameters, checking variable url, Username, password are not sky, then by parameter detecting, otherwise output function, which uses, prompts.
Host information function, we are to establish to connect with VMware Web Services first, and link variable is assigned to Si, in example function ServiceInstance is connected, we join url, username, password of acquisition above Number is incoming.The root that we are obtained by si variables in VMware after incoming manages object, and is assigned to variable rootFolder, leads to Crossing rootFolder, we begin stepping through search " HostSystem " management object, and are assigned to host complexes variable host_ views。
Host computer system host_views object sets are traveled through, each object in host_view set is a physics The mapping of server.We are circulated using for, the form arranged according to detailed design, are sequentially output the configuration of physical server Item title, the currently used rates of CPU, memory usage.
When obtaining host CPU information, we can not directly obtain total Hz numbers of main frame, be single by obtaining main frame CPU Hz numbers, logic CPU quantity, then they are multiplied, the Hz numbers that CPU is currently consumed can be obtained directly, still will Obtaining CPU utilization rate is:The Hz numbers that single (CPU Hz numbers × logic CPU quantity)/CPU is currently consumed.Obtain host memory During total capacity, the unit that we obtain is byte.And it is that unit is M byte to obtain current consumption figures.Still need to hold to total Value/1024/1024 obtains M byte.Then calculated.At the end of function, we will discharge Web Services connections.
Here is that we define principal function, facilitates monitor supervision platform to call, and we attempt to use first GetConnectionParameters checks whether variable is correct, and correctly we call getHostInfo functions to go to take physics afterwards Server host information, in the event of parameter error, we will be prompted to parameter operation instruction, will be direct if there is other exceptions Print exception stack information.
By gathering script, we will gather back initial data, is being packaged into initial data by data processing script Monitor supervision platform can receive data, store data into database.
First row is Hostname, and second is classified as CPU usage, and the 3rd is classified as memory usage.It is per data line The information of VMware main frames.
In RHEV environment, we install RHEV-H (Hypervisor) on physical host, are established in virtual machine RHEV-M (Manager), traditional monitoring agent modes cannot be used to physical host monitoring resource.
We obtain the resource using status of physical host using Python by connecting RHEV-M REST API, we Certificate is needed to use when connecting RHEV-M, we by RHEVM certificate by issue orders, being saved in local, and be named as rhevm.cer。
The resource information of physical host in RHEV environment is obtained, ovirtsdk is Python api interface class, and we are in generation Code first trip statement is python codes, then quotes the python classes that programming needs to use.
Call api interface be connected to REST interfaces, here we need specify url, username, password, The parameters such as insecure, ca_file, ca_file refer to that we connect the certificate of needs, it is necessary to instruct the particular location of certificate.
Travel through physical host set, output Hostname, CPU usage, memory usage.CPU utilization rate is to pass through System user resource consumption is consumed and drawn in itself plus system, for internal memory utilization rate we will obtain currently used internal memory number Internal memory obtains altogether for amount divided by system.After output is handled, we disconnect REST connections, with release procedure connection resource.
Program exception processing, the try that program starts combine together, and try partial code normal operations are that output is normal Result set, when try partial codes perform exception, exception will be captured, and print abnormal cause.
In order to support the High Availabitity of cloud computing platform and flexibility, and SLA is ensured, the industry in cloud platform Business system is required to freely dynamically migrate between physical host.For example, found in an important operation system operation Its physical host run has hardware alarm, for the continuous availability of operation system, it would be desirable to by the void to operation system Plan machine is migrated to other healthy physical hosts.
Application program is often changed to the demand of resource, and infrastructure, which allows for very short time endoadaptation resource, to be needed Ask change, this just need it is a kind of it is simple automatic, without needing keeper excessively to intervene configurable way to manage.Virtual machine is from moving Shifting function (i.e. distributed resource scheduling) is exactly what we needed, its energy Continuous optimization cloud computing platform, virtual machine is existed automatically Migrated between more physical servers, balanced more physical servers load.
When virtual machine migrates, its operation system normally externally provides clothes, will not damage any data and business continuance.
Virtual resource migration monitoring collection index is as follows under our abstract cloud computing platforms:
The acquisition index information of table 3
Acquisition index title: X86 virtual machine (vm) migration events
Configuration item template: { operating system }
Configuration item attribute: { OS Type }=X86
Gather entrance: { cloud control centre }->{ management net IP address }
Acquisition interface type: [REST API]
Frequency acquisition: Every 10 minutes (can dynamic configuration)
Acquisition method describes, it would be desirable to provides the REST api interfaces address used during collection, connecting interface certification needs The username and password wanted, SSL safety certification certificates, if ignore the visa security of certificate, it is each in the result set of return Row is the monitoring information of the current acquirement of a physical server, and Section 1 is configuration key name of the physical server in CMDB Claim, Section 2 is CPU usage, and Section 3 is memory usage.Such as following table:
The acquisition method information of table 4
Returning result collection sample:
The returning result sample of table 5
In enterprise's private clound, by opening VMware vSphere DRS (Distributed Resource Scheduler) function, loaded by across main frame autobalance, according to service priority Adjustable calculation resource.During low-load Main frame is closed to reduce energy consumption.In balanced load on host computers, vMotion technologies can be used;
VMware vMotion refer to real-time migration of virtual machine function, entirely can will transported in the case of non-stop-machine Capable virtual machine is moved to another physical server from a physical server, and virtual machine can retain its network identity and company Connect, can be by the movable internal memory of the high speed network transmission virtual machine and accurate so that it is guaranteed that realize seamless transition process Execution state, allow the virtual machine from the vSphere main frames of source operation be switched to and run on target vSphere main frames;
When physical host failure, under VMware HA protection, the virtual machine of the physical host will be in physical host collection Reruned in other main frames in group, it is necessary to know when physical host failure, influence those virtual machines, it is necessary to monitor VMotion information in VMware environment.
VMotion events in VMware environment are obtained by VMware Web Services interfaces.
GetVmotionEvents classes are defined, and private variable is set, url is that we are connected to VMwaer vCetner API address, userName are the users of certification, and password is the password of certification user, and si is Web Services connections Service Instance.
GetConnectionParameters functions are defined to detect, whether our input parameter is legal.Inspection parameter Quantity, whether parameter type accord with definition type, is provided if do not met using prompting, before we be described.
Principal function, we call getConnectionParameters (args) function before connection API interface is performed To check whether input parameter closes rule, we create Service Instance si after closing rule.
EventManager objects are created, because EventManager includes all events of system, it would be desirable to put filtering Device obtains vMotion events, and what EventManger was obtained is historical events all in vCenter, and the resource that we pay close attention to Migration and variation occurs only has " VmRelocatedEvent ", " VmMigratedEvent ", " DrsVmMigratedEvent ", " DrsVmPoweredOnEvent ", " VmPoweredOnEvent " this several class, while acquiescence is whole historical events, Wo Mending When take once within 10 minutes, therefore only need to take the event of first 10 minutes every time, and we set the artificial system sheet of initiation of event Body is administrator.
After we set event filtering condition, inquiry obtains event sets events, passes through for cyclic variables events Meet the event information of filter condition in set.Call printEvent events output every event of processing.Behind for circulations Be exactly abnormality processing, if abnormal parameters will be prompted to operation instruction, if other it is abnormal we by output abnormality storehouse, We will be switched off Web Services connections at the end of program, discharge resource.
The title of main frame is found by the OID of physical host object instance in VMware systems, this title be with The title of equipment is consistent in CMDB, so as to it is understood that virtual machine has done vMotion migrations in those main frames.We pass through clothes Pragmatic example si obtains All hosts system object hostsystems, then travels through hostsystems and finds in host computer system Main frame equal with the Oid provided Oid, return to the customized information of main frame.
Event output function, we first judge whether the physical host information of event is empty, if being not sky, from the point of view of us See whether evt objects are " com.vmware.vim25.VmRelocatedEvent " classes, if we determined that in event evt Host Oid, function findHostAnnonByOid is passed to, obtains configuration item coding of the physical host in monitor supervision platform, such as Fruit evt is not " com.vmware.vim25.VmRelocatedEvent " class, and host information is arranged to empty by us, uses character Null represents that such purpose is exported according to the output format of chapter 4 agreement, and next we are evt events to be exported In, virtual machine (vm) migration to that physical server, equally we are also that the Oid that virtual machine in evt is reached to Host is passed to function FindHostAnnonByOid, obtain the configuration item coding of destination server.Next we will export storage migration information, first First judge whether evt is " com.vmware.vim25.VmRelocatedEvent ", if just output storage is deposited from that source Storage, has moved to that target storage information, otherwise just exports " null null ", the time that our last outgoing events occur. Us can be reached to replace with null when not obtaining information, such way be in order to meet the design arranged in chapter 4, So Formatting Output data are the information that follow-up interface handles our acquisitions for convenience.
A line record represents a migration event, and often capable Section 1 is that configuration item of the virtual machine in monitor supervision platform is compiled Code, Section 2 are source physical server codings, and Section 3 is the destination server coding moved to, and Section 4 is that source storage is compiled Code, Section 5 be migrate to target store coding, Section 6 is transit time.
The migration information of virtual machine machine in RHEV environment how is obtained from the point of view of us below, ovirtsdk is Python Api interface class, we are python codes in code first trip statement, then quote the python classes that programming needs to use, mainly It is ovirtsdk and time-triggered protocol class datetime.
Call api interface be connected to REST interfaces, here we need specify url, username, password, The parameters such as insecure, ca_file, ca_file refer to that we connect the certificate of needs, it is necessary to instruct the particular location of certificate.
Our every 10 minutes operation programs inquiries once, find virtual machine in 10 minutes and move to situation, still first take Current time now is obtained, re-defines shift time aDay, combines to obtain the time for needing to inquire about by shift time and current time QTime, then format qTime and obtain the rTmie for meeting querying condition, query function is put into using rTmie as parameter.
We define event_list set, and to be put into the event content for meeting our demands, we inquire about first Code is equal to 32, and event content includes " started ", the event in nearest 10 minutes.The description of time is cut by space, weight The information for the formatting that Combination nova needs into us, vm:{VmName},fromHost:null,toHost:{HostName}, formDS:null,toDS:null,eTime:{ time }, this querying condition obtain, and virtual machine is empty from the information of cold start-up Plan machine when being not keyed up, will not host in any physical host, host is only just understood when opening in certain thing Manage main frame, it would be desirable to the hosted information of the virtual machine of start is recorded, in physical host failure, to confirm coverage.
Next querying condition is that code is equal to 506, and event content includes " restarted ", the event in nearest 10 minutes, The record that virtual machine is restarted event is also very necessary, and sometimes virtual machine is restarted, precisely due to physical host failure, empty Host's physical host of plan machine is changed, and the form for recording content keeps constant.
Next querying condition is that code is equal to 63, and event content includes " Migration ", the event in nearest 10 minutes, This kind of logout is virtual machine (vm) migration to information, and the form for recording content keeps being slightly different, the virtual machine of record from that The physical host of individual host, that target physical main frame is moved to.
Three of the above event information is all recorded in event_list set by we, and in order to handle conveniently, we are by three The kind time carries out time-sequencing, is exported after sequence according to form.
After output, we disconnect, and are abnormality processings in program termination, when above code sends abnormal, Exception will be caught, and the description content through exception exports.
Virtual machine (vm) migration that procedure above obtains is run to initial data.Section 1 is host information, and Section 2 is source thing Manage main frame, Section 3 is target physical main frame, Section 4 be source storage information (if storage migration does not occur will be with null tables Show), Section 5 is target storage information (if migration is not occurred and will be represented with null for storage), and Section 6 is transit time.
The analysis linkage Engine objects of analysis linkage strategy resume module include base values, including:CPU instantaneous value; Also polymerization index, wherein CPU daily mean are included, analysis engine supports pattern there are 4 kinds:
1) monodrome judgment model, such as:Equipment state!The per day load of=OK, CPU>=45%;
2) continuous sampling point probabilistic determination pattern is supported, such as:The instantaneous utilization rates of CPU contact 10 times in 8 times>=80%, should Contacted 3 times with server thread>=25;
3) sampling interval ratio judgment model, such as:Store utilization rate>=60% accounts for whole month sampling ratio>=50%, CPU Utilization rate>=80% accounts for whole day sampling ratio>=30%;
4) advanced combination judgment model, such as:Table space state!=Normal AND table space states!=Backup, file System utilization rate>=85%AND remaining spaces<=10G;
The processing action that analysis engine is supported has three kinds;
The first creates the alarm of different stage, and warning level includes Fatal, Error, Warning, Info;
Second, O&M flow events are created, are docked with O&M workflow management platform, the O&M time is formed closed loop with alarm;
The third triggering automation mechanized operation, is docked, automation mechanized operation forms closed loop, example with alarm with automatic management platform Such as, memory usage is worked as>=85% accounts for all sampling ratios>=50%, it is automatic to expand the 20% of internal memory;
For the monitoring data of physical server, the analysis linkage strategy of setting is as follows:
CPU usage>=50% and read-around ratio>=3
Memory usage>=75% and read-around ratio>=3
When the data in the CPU collections of physical host continuous 3 times both greater than 50%, accused by short message and mail he It is alert, and event is generated in monitor supervision platform, keeper's processing is reminded, the analysis linkage strategy for internal memory is arranged to continuously here 3 times internal memory surpasses 75%;
After physical host data acquisition interface, the host data that is collected from interface routine is defined in monitor supervision platform Key technical index, the type and display content of index, data retention time, frequency acquisition, chart display type can be defined;
In acquisition strategies configuration, acquisition strategies title is defined, the acquisition mode that acquisition interface is supported in monitor supervision platform has: bat、HeartBeat、HTTP、Java、JDBC、JMX、Log、Mail、PING、RmoteSSH、SHELL、SnmpGet、 SnmpTrap, SnmpWalk, Syslog, Telnet, VBS, Web Services, wmic grab type, collection script, at data Manage script.
By gathering script, we will gather back initial data, is being packaged into initial data by data processing script Monitor supervision platform can receive data, store data into database.
We are to use Groovy language to data processing script, and initial data is often gone and cut according to space, obtains 3 Individual character string, three character strings are formed into a Map object, then Map objects are stored in List objects, monitor supervision platform reprocessing List objects, data are stored in database.
Data collection strategy, here we opening for every acquisition index in the acquisition strategies of X86 virtualization controllers is set With whether, collection period etc..
In acquisition strategies, we define the collection of technical indicator, number based on the data of these technical indicators collection According to we can obtain other data by processing basic data, such as:Day peak value, daily mean, all average value etc..
After having the data that technical indicator gathers, we can obtain basic data and aggregation strategy data, for these Data, monitor supervision platform will configure the analysis linkage strategy of automation to these data results, for example CPU usage contacts 3 times and surpassed Cross 50% alarm;Physical memory utilization rate surpasses 75% alarm etc. continuous 3 times.
For virtual resource migration information, as long as having collected data, illustrate just there is virtual machine to be migrated, The affiliated relevant information of monitor supervision platform mid-term resource is not just inconsistent, as upper table describes virtual machine PVVMDC0013, previous place Main frame is B05XDL580D physical hosts, through migrating to B05XDL580A, if B05XDL580D occur accident power-off this The failure of sample, in whole platform in impacted virtual machine should no PVVMDC0013 be only correctly, therefore for whenever void After in plan machine resource migration information gathering, we will establish warning message in monitor supervision platform, and adjust void by auto-programming Plan machine resource belonging relation.
So the analysis linkage strategy of setting is as follows:
Virtual resource migrating data!=empty
Our vMotion technical indicators defined in monitor supervision platform, Criterion Attribute include the coding of technical indicator, title, Frequency acquisition, data retention time etc., it is important to which it is List to specify pointer type, and defines List relevant information, index After gathered data, there is index post processor by data persistence into database.Here main presentation VMware places an order (difference under RHEV environment is mainly acquisition mode, the raw data results gathered under two kinds of environment for data acquisition configuration displaying Form is consistent).
It is JAVA that we, which need to specify acquisition interface, specifies grab type:One-to-one, collection script is following, and (we need Jar is wrapped and passed in monitoring server), gather user name in script, password, url are obtained from platform.
GetVmotionEvents.jar is the JAVA codes that we write, and $ $ ip $ $ are the clouds obtained from monitor supervision platform The IP address of control centre, $ $ username $ $ are the user names that the connection of the cloud control centre that is obtained in monitor supervision platform needs, $ $ password $ $ are the access passwords of cloud control centre configuration, and we are recorded in monitor supervision platform these data, can from the page To be obtained by parametrization.
Data processing script is the initial data of processing collection script, and we establish List objects, and collection script is obtained Data be stored in List objects by processing, be transferred to backstage storage into database.
Parameter mapping is that we obtain url, user, the encrypted message deposited in advance from monitor supervision platform.
We are talked under private clound monitoring resource platform by taking configuration item as an example below, and an x86 server is how to supervise How description is recorded in control platform.
T_CI is configuration item core table, and in monitor supervision platform, the object for the management that is monitored all is configuration item, and configuration item has respectively Kind of template (T_CI_TEMPLATE), so as to the various equipment in adaptation data center, such as:Server, interchanger, rack, storage are set Standby, fire wall etc..Configuration item table structure is as follows:
Table 6T_CI (configuration item) table structure
Above is configuration item table, when have recorded configuration item numbering, configuration item coding, configuration item title, template number, establishment Between, founder, newest maintenance time, newest safeguard people, remarks, configuration item state.Configuration item numbering is major key and external key and not Can be sky, configuration item code Design is English alphabet and numeral composition, and it is regularly arranged according to Platform Designing, and configuration item title is adopted Described with Chinese, template number is designed as external key, and the current configuration item of configuration item state recording is also be off in available State.
Such as:
It may be seen that in the record of the first row, Section 1 is Digital Pipelined number 3538, and Section 2 is English alphabet A06BR720_A, Section 3 are that Chinese describes x86 servers (A06BR720_A), and Section 4 is template number 139, the 5th Item is settling time 2012-11-02 13:51:56, Section 6 is founder's code 179674, and Section 7 is newest modification time 2013-08-12 10:22:29, Section 8 is modification people code 180140, and Section 91, expression is currently upstate, after Face we will look at how to show the association attributes of an x86 server by taking the record as an example.
At cloud center, our all devices are all represented with configuration item, and each equipment is the configuration item template defined One example.It is exactly configuration item table structure below.
Table 7T_CI_TEMPLATE (configuration item template) table structure:
It may be seen that configuration item template contain configuration item numbering, template name, founder, creation time, label, Stereotype, main frame and external key are template number.Physical record is as follows:
It may be seen that configuration item template number is 139, its template name is x86 servers.
Here is configuration item template attributes relation table, and configuration item template and its attribute will be linked by this table.We can To see that configuration item template attributes light table includes template number, template attributes numbering, sequence mark.External key be template number and Template attributes are numbered.
Table 8T_CI_TEMPLATE_PROPERTY_REL (configuration item template attributes relation) table structure
The attribute of x86 Server templates is relatively more, there is each attribute, is specifically those attributes, we will use down The table in face.
Configuration item template attributes table, the literary name section have template attributes numbering, template attributes title, label, type (acquiescence For 0), template attributes value, group, number, remarks, data type, unit.Major key and outer key mapping template attributes numbering.
Table 9T_CI_TEMPLATE_PROPERTY (configuration item template attributes) table structure
Configuration item template number be 139 attribute include production firm, model, Firware, CPU frequency, CPU models, Flash Card Types etc. information.
Monitor supervision platform to cloud center overall information source show, show current X86 resource pools operational capability service condition and Total amount, we are identified with XCU, XCU value we be defined as:Physical cpu sum × Core quantity × dominant frequency (GHz).It is aobvious Show current minicomputer resource;NAS and SAN storage resources;The currently used value and total amount of load balancing resource pool etc..Green represents Resource is sufficient, and yellow represents that resource is nervous, and red represents resource scarcity.Keeper's consideration is prompted to adopt when a certain resource yellow Extended resources are purchased, warned when red expand this resource immediately.
Under cloud platform, physical server monitoring Foreground Data is shown to be shown in graphical form, physical host installation cloud meter Software (Hypervisor) is calculated, red line represents the physical memory utilization rate of main frame, and blue line represents the CPU usage of main frame, We can also set time started and end time to inquire about.
Data acquiescence shows the data of nearest 105 minutes, between we can be set at the beginning of inquiry by querying condition And the end time, to show the datagram of corresponding period.
Virtual machine (vm) migration data acquisition results, we can define initial time and end time, to inquire about collection result. The result of display is the time that we gather, and the virtual resource migration display format defined in chapter 4, virtual machine are matched somebody with somebody Put item coding, source physical server, destination server, source use storage, target use storage, transit time.
General principle, principal character and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the simply explanation described in above-described embodiment and specification is originally The principle of invention, various changes and modifications of the present invention are possible without departing from the spirit and scope of the present invention, these changes Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its Equivalent defines.

Claims (4)

1. the main frame intelligent monitor system based on high in the clouds, it is characterised in that the system includes foreground presentation layer, background service layer Physical host collection of resources module, virtual resource migration monitoring module, analysis linkage strategy module, the data stock of the system adopt Use NoSQL;
Foreground presentation layer mainly includes system administration, strategy configures, data check module;
Background service layer includes data acquisition, data receiver Queue module, analysis linkage strategy, event handling, resource distribution pipe Reason, data receiver Queue module, data acquisition module;
Physical host collection of resources module is using the physical host monitoring data under Pull type collection cloud computing platforms, prison The server monitoring data under server is managed by api interface active inquiry cloud control centre are controlled, cloud control centre will arrange Data return to monitoring server;
Virtual resource migration monitoring module often changes in application program to the demand of resource, and infrastructure allows for very Short time endoadaptation resource requirement changes, this just need it is a kind of it is simple automatic, without needing keeper excessively to intervene configurable pipe Reason mode, virtual machine Autonomic Migration Framework function are that distributed resource scheduling needs, its energy Continuous optimization cloud computing platform, from It is dynamic to migrate virtual machine between more physical servers, balanced more physical servers load, when virtual machine migrates, Its operation system normally externally provides clothes, will not damage any data and business continuance;
After analysis linkage strategy module monitored device uploads monitoring data, by processing storage into NoSQL databases, point Analysis module will be analyzed data, according to the strategy of setting, trigger various different disposal flows.
2. the main frame intelligent monitor system according to claim 1 based on high in the clouds, it is characterised in that the system data flows Be it is upward by processing from underlying network, finally by foreground show call displaying, it is nethermost be that data adopt layer, including Physical server data acquisition, the collection of resource dynamic migration, network equipment data acquisition module, physical server acquisition module is again It is divided into the data acquisition and traditional platform data acquisition of cloud platform, this is due to collection and the tradition of physical server under cloud platform Under platform there is bigger difference in the collection of physical server, and physical server installs a high-efficiency precision by bare metal under cloud platform The virtual platform of letter, to realize resource pool, business cloud platform provide the upper safety from first floor system and stably unless, not It is recommended that monitoring Agent in cloud base layer system installation third party in user, recommendation is obtained from cloud control centre, the thing of traditional platform Server data collection is managed, by Agent patterns, Agent patterns can support more monitor control indexs, such as process monitoring, application Monitoring, and data accuracy is high;It can also use without Agent patterns, lead to SNMP, SSH, Syslog etc., resource dynamic migration is adopted Collection, primarily under collection cloud platform resource dynamic location information, cloud computing technology is by physical resource pool so that virtual Function in more physical hosts dynamic migration, it is necessary in real time understand virtual machine position thus need to monitor, support VMware and Most of the collection of RHEV platforms, network equipment data acquisition passes through SNMP type collections;
, it is necessary to upload onto the server to handle after the completion of data acquisition, in order to tackle substantial amounts of I/O request, data have been abstracted it and have connect Queue module is received, can be because of the demand to substantial amounts of data summarization to server, the received team's module excessively of monitoring data will be original Data delivery has arrived data memory module, and data memory module is supported relational data and non-relational data storage, supervised here Control data have all been stored in non-relational database, and resources configuration management module is the core of monitor supervision platform, in monitor supervision platform Middle physical equipment can abstract representation for more than one or multiple configuration items combination, configuration item is designed with configuration item attribute, matches somebody with somebody Relation, key technical index, configuration view module between item are put, configuration item attribute is for example:It is model, U numbers, memory size, virtual Change type etc., and specify that the Value Types (numerical value, text, enumerating) and constraint rule of each attribute, configuration item and configuration The relation of item, including:Host, connection, using, belong to, manage, and dynamic expansion can be needed according to business at any time, closed A series of key technical index defined in key technology index storehouse, such as:CPU usage, database concurrency connection number, business Trading volume etc., a series of resource view defined in resource view storehouse, such as computer room view, rack view, resource pool view, Configuration item relational view, capacity view etc., while specify that the component relationship, displaying framework and constraint rule of each resource view Then;
By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and fault diagnosis, interlocking module include single It is worth judge module, as internal memory is more than 80%, consecutive sample values are handled for judging that single key index is big in continuous n times intermediate value In or less than some threshold values, after setting key index analysis strategy, tactful threshold values, by trigger event, therefore has event everywhere Processing module, monitor event can be created, and a variety of notice approach can be provided, including mail, short message, wechat etc.;
The superiors are presentation layer, and the alternation of bed of system, including system administration, and strategy configuration, data check three big modules, are Reason under the overall leadership includes acquisition strategies, monitoring strategies etc. including the submodules such as user management, rights management, strategy configuration again, and data are looked into See in module, user can check the current state of various kinds of equipment, the event of triggering, historical information, and energy specified time Duan Sheng Reported into running status.
3. the main frame intelligent monitor system according to claim 1 based on high in the clouds, it is characterised in that virtual resource migration prison The REST api interfaces address used when controlling the acquisition method of module by needing to provide collection, the use that connecting interface certification needs Name in an account book and password, SSL safety certification certificates, if ignore the visa security of certificate, every a line is one in the result set of return The monitoring information of the current acquirement of platform physical server, Section 1 are configuration item title of the physical server in CMDB, second Item is CPU usage, and Section 3 is memory usage;
In enterprise's private clound, by opening VMware vSphere DRS (Distributed Resource Scheduler) work( Can, loaded by across main frame autobalance, according to service priority Adjustable calculation resource, main frame is closed during low-load to drop Low energy consumption, in balanced load on host computers, vMotion technologies can be used;
VMware vMotion refer to real-time migration of virtual machine function, can will be entirely currently running in the case of non-stop-machine Virtual machine is moved to another physical server from a physical server, and virtual machine can retain its network identity and connection, from And ensure to realize seamless transition process, shape can be performed by the movable internal memory of the high speed network transmission virtual machine and accurately State, allow the virtual machine from the vSphere main frames of source operation be switched to and run on target vSphere main frames;
When physical host failure, under VMware HA protection, the virtual machine of the physical host will be in physical host cluster Other main frames in rerun, it is necessary to know when physical host failure, influence those virtual machines, it is necessary to monitor VMware rings VMotion information in border.
4. the main frame intelligent monitor system according to claim 1 based on high in the clouds, it is characterised in that enter to analyze linkage strategy The analysis linkage Engine objects of resume module include base values, including:CPU instantaneous value;Also polymerization index is included, wherein CPU daily mean, analysis engine support pattern to have 4 kinds:
1) monodrome judgment model, such as:Equipment state!The per day load of=OK, CPU>=45%;
2) continuous sampling point probabilistic determination pattern is supported, such as:The instantaneous utilization rates of CPU contact 10 times in 8 times>=80%, using clothes Business device thread contacts 3 times>=25;
3) sampling interval ratio judgment model, such as:Store utilization rate>=60% accounts for whole month sampling ratio>=50%, CPU are used Rate>=80% accounts for whole day sampling ratio>=30%;
4) advanced combination judgment model, such as:Table space state!=Normal AND table space states!=Backup, file system Utilization rate>=85%AND remaining spaces<=10G;
The processing action that analysis engine is supported has three kinds;
The first creates the alarm of different stage, and warning level includes Fatal, Error, Warning, Info;
Second, O&M flow events are created, are docked with O&M workflow management platform, the O&M time is formed closed loop with alarm;
The third triggering automation mechanized operation, is docked with automatic management platform, and automation mechanized operation forms closed loop with alarm, for example, Work as memory usage>=85% accounts for all sampling ratios>=50%, it is automatic to expand the 20% of internal memory;
For the monitoring data of physical server, the analysis linkage strategy of setting is as follows:
CPU usage>=50% and read-around ratio>=3
Memory usage>=75% and read-around ratio>=3
When the data in the CPU collections of physical host continuous 3 times both greater than 50%, alerted by short message and mail he, and Event is generated in monitor supervision platform, reminds keeper's processing, the analysis linkage strategy for internal memory is arranged in continuous 3 times here Deposit and surpass 75%;
The host data that is collected from interface routine, it is crucial defined in monitor supervision platform after physical host data acquisition interface Technical indicator, the type and display content of index, data retention time, frequency acquisition, chart display type can be defined;
In acquisition strategies configuration, acquisition strategies title is defined, the acquisition mode that acquisition interface is supported in monitor supervision platform has:bat、 HeartBeat、HTTP、Java、JDBC、JMX、Log、Mai l、PING、RmoteSSH、SHELL、SnmpGet、SnmpTrap、 SnmpWalk, Syslog, Telnet, VBS, Web Services, wmic grab type, collection script, data processing script.
CN201610788477.XA 2016-08-30 2016-08-30 Main frame intelligent monitor system based on high in the clouds Pending CN107786616A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610788477.XA CN107786616A (en) 2016-08-30 2016-08-30 Main frame intelligent monitor system based on high in the clouds

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610788477.XA CN107786616A (en) 2016-08-30 2016-08-30 Main frame intelligent monitor system based on high in the clouds

Publications (1)

Publication Number Publication Date
CN107786616A true CN107786616A (en) 2018-03-09

Family

ID=61450520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610788477.XA Pending CN107786616A (en) 2016-08-30 2016-08-30 Main frame intelligent monitor system based on high in the clouds

Country Status (1)

Country Link
CN (1) CN107786616A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429755A (en) * 2018-03-21 2018-08-21 深圳天源迪科信息技术股份有限公司 Basic network security information dynamic management platform and method
CN109274557A (en) * 2018-11-14 2019-01-25 江苏鸿信***集成有限公司 Intelligent CMDB management and cloud host monitor method under a kind of cloud environment
CN109412829A (en) * 2018-08-30 2019-03-01 华为技术有限公司 A kind of prediction technique and equipment of resource distribution
CN109450686A (en) * 2018-11-12 2019-03-08 北京交通大学 A kind of network resource management system and method based on pervasive network
CN109460223A (en) * 2018-11-14 2019-03-12 沈阳林科信息技术有限公司 A kind of API gateway management system and its method
CN109728938A (en) * 2018-12-11 2019-05-07 国云科技股份有限公司 A kind of method of assessment system service level
CN109901912A (en) * 2019-03-01 2019-06-18 厦门容能科技有限公司 A method of recommending the configuration of cloud host
CN109933476A (en) * 2019-03-20 2019-06-25 浪潮商用机器有限公司 A kind of display methods and device of OpenPOWER server performance
CN110290075A (en) * 2019-04-17 2019-09-27 李士锋 A kind of method for managing resource and system of police cloud computing platform
CN110515701A (en) * 2019-08-28 2019-11-29 杭州数梦工场科技有限公司 A kind of thermomigration process and device of virtual machine
CN111026336A (en) * 2019-12-26 2020-04-17 中国建设银行股份有限公司 Automatic operation and maintenance method and operation and maintenance system of SAN storage system
CN111061612A (en) * 2019-12-12 2020-04-24 天地伟业技术有限公司 Embedded system state monitoring method
CN111198854A (en) * 2019-12-27 2020-05-26 南京金绿汇成信息科技有限公司 Data state tracking method of multi-source data acquisition device
CN111414129A (en) * 2019-01-07 2020-07-14 阿里巴巴集团控股有限公司 System and method for configuring FPGA control data based on cloud and electronic equipment
CN112204521A (en) * 2018-05-25 2021-01-08 微软技术许可有限责任公司 Processor feature ID response for virtualization
CN112350855A (en) * 2020-10-26 2021-02-09 浪潮云信息技术股份公司 Configuration-based cloud center management method
CN112395152A (en) * 2019-08-19 2021-02-23 阿里巴巴集团控股有限公司 Server resource monitoring method and device
CN113010491A (en) * 2021-02-24 2021-06-22 光大兴陇信托有限责任公司 Cloud-based data management method and system
CN114826968A (en) * 2022-07-01 2022-07-29 锐盈云科技(天津)有限公司 Enterprise intelligent cloud monitoring system
CN114979158A (en) * 2022-05-23 2022-08-30 深信服科技股份有限公司 Resource monitoring method, system, equipment and computer readable storage medium

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429755A (en) * 2018-03-21 2018-08-21 深圳天源迪科信息技术股份有限公司 Basic network security information dynamic management platform and method
CN112204521A (en) * 2018-05-25 2021-01-08 微软技术许可有限责任公司 Processor feature ID response for virtualization
CN109412829A (en) * 2018-08-30 2019-03-01 华为技术有限公司 A kind of prediction technique and equipment of resource distribution
CN109412829B (en) * 2018-08-30 2020-11-17 华为技术有限公司 Resource allocation prediction method and equipment
CN109450686B (en) * 2018-11-12 2020-11-03 北京交通大学 Network resource management system and method based on pervasive network
CN109450686A (en) * 2018-11-12 2019-03-08 北京交通大学 A kind of network resource management system and method based on pervasive network
CN109274557A (en) * 2018-11-14 2019-01-25 江苏鸿信***集成有限公司 Intelligent CMDB management and cloud host monitor method under a kind of cloud environment
CN109460223A (en) * 2018-11-14 2019-03-12 沈阳林科信息技术有限公司 A kind of API gateway management system and its method
CN109728938A (en) * 2018-12-11 2019-05-07 国云科技股份有限公司 A kind of method of assessment system service level
CN111414129B (en) * 2019-01-07 2023-05-05 阿里巴巴集团控股有限公司 Cloud-based FPGA control data configuration system and method and electronic equipment
CN111414129A (en) * 2019-01-07 2020-07-14 阿里巴巴集团控股有限公司 System and method for configuring FPGA control data based on cloud and electronic equipment
CN109901912A (en) * 2019-03-01 2019-06-18 厦门容能科技有限公司 A method of recommending the configuration of cloud host
CN109933476A (en) * 2019-03-20 2019-06-25 浪潮商用机器有限公司 A kind of display methods and device of OpenPOWER server performance
CN110290075A (en) * 2019-04-17 2019-09-27 李士锋 A kind of method for managing resource and system of police cloud computing platform
CN112395152A (en) * 2019-08-19 2021-02-23 阿里巴巴集团控股有限公司 Server resource monitoring method and device
CN112395152B (en) * 2019-08-19 2022-04-12 阿里巴巴集团控股有限公司 Server resource acquisition method and acquisition system
CN110515701A (en) * 2019-08-28 2019-11-29 杭州数梦工场科技有限公司 A kind of thermomigration process and device of virtual machine
CN111061612A (en) * 2019-12-12 2020-04-24 天地伟业技术有限公司 Embedded system state monitoring method
CN111026336A (en) * 2019-12-26 2020-04-17 中国建设银行股份有限公司 Automatic operation and maintenance method and operation and maintenance system of SAN storage system
CN111198854A (en) * 2019-12-27 2020-05-26 南京金绿汇成信息科技有限公司 Data state tracking method of multi-source data acquisition device
CN112350855A (en) * 2020-10-26 2021-02-09 浪潮云信息技术股份公司 Configuration-based cloud center management method
CN112350855B (en) * 2020-10-26 2023-03-31 浪潮云信息技术股份公司 Configuration-based cloud center management method
CN113010491A (en) * 2021-02-24 2021-06-22 光大兴陇信托有限责任公司 Cloud-based data management method and system
CN113010491B (en) * 2021-02-24 2023-10-03 光大兴陇信托有限责任公司 Cloud-based data management method and system
CN114979158A (en) * 2022-05-23 2022-08-30 深信服科技股份有限公司 Resource monitoring method, system, equipment and computer readable storage medium
CN114979158B (en) * 2022-05-23 2024-04-09 深信服科技股份有限公司 Resource monitoring method, system, equipment and computer readable storage medium
CN114826968A (en) * 2022-07-01 2022-07-29 锐盈云科技(天津)有限公司 Enterprise intelligent cloud monitoring system

Similar Documents

Publication Publication Date Title
CN107786616A (en) Main frame intelligent monitor system based on high in the clouds
CN105843904B (en) For the monitoring warning system of database runnability
CN104463492B (en) A kind of operation management method of power system cloud emulation platform
US8175863B1 (en) Systems and methods for analyzing performance of virtual environments
Coutinho et al. Elasticity in cloud computing: a survey
US10762452B2 (en) System and method for designing and executing control loops in a cloud environment
CN110809017A (en) Data analysis application platform system based on cloud platform and micro-service framework
CN109714192A (en) A kind of monitoring method and system monitoring cloud platform
CN107943668A (en) Computer server cluster daily record monitoring method and monitor supervision platform
WO2023142054A1 (en) Container microservice-oriented performance monitoring and alarm method and alarm system
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
CN108092813A (en) Data center&#39;s total management system server hardware Governance framework and implementation method
CN106201754A (en) Mission bit stream analyzes method and device
CN105323111A (en) Operation and maintenance automation system and method
CN108777637A (en) A kind of data center&#39;s total management system and method for supporting server isomery
CN109471845A (en) Blog management method, server and computer readable storage medium
EP4020218B1 (en) Analyzing large-scale data processing jobs
CN103295155A (en) Security core service system monitoring method
CN105490864A (en) Business module monitoring method based on OSGI
WO2023138014A1 (en) Intelligent operation and maintenance system oriented to computing-network integration scenario and use method thereof
CN109165228A (en) Smart grid Dispatching Control System real-time data base monitoring system and method
CN107704362A (en) A kind of method and device based on Ambari monitoring big data components
Metsch et al. Apex lake: a framework for enabling smart orchestration
Kocsis et al. Measurement-based identification of infrastructures for trustworthy cyber-physical systems
Zurkowski et al. Towards Self-Organizing Cloud Polyglot Database Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180309