CN107786616A - Main frame intelligent monitor system based on high in the clouds - Google Patents
Main frame intelligent monitor system based on high in the clouds Download PDFInfo
- Publication number
- CN107786616A CN107786616A CN201610788477.XA CN201610788477A CN107786616A CN 107786616 A CN107786616 A CN 107786616A CN 201610788477 A CN201610788477 A CN 201610788477A CN 107786616 A CN107786616 A CN 107786616A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- resource
- physical
- platform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
- H04L43/045—Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/563—Data redirection of data network streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/0213—Standardised network management protocols, e.g. simple network management protocol [SNMP]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention proposes the main frame intelligent monitor system based on high in the clouds, the system includes foreground presentation layer, background service layer physical host collection of resources module, virtual resource migrates monitoring module, analyze linkage strategy module, the data stock of the system uses NoSQL, physical server monitoring Foreground Data is shown under cloud platform shows in graphical form, the physical host installs cloud computing software, red line represents the physical memory utilization rate of main frame, blue line represents the CPU usage of main frame, we can also set time started and end time to inquire about, it is novel in design, it is a good design, there is very much market promotion prospect.
Description
Technical field
The present invention relates to technical field of software development, more particularly to the main frame intelligent monitor system based on high in the clouds.
Background technology
Enterprise's private clound monitoring management platform is the core system for realizing enterprise IT resource informationization management, it is intended that it
Can installation requirements implement IT planning strategies, IT resources operation exception point can be found in advance by it, so as to we make it is appropriate
Processing, the normal operation of safeguards system.The final reliabilty and availability for improving cloud platform, ensure the use cloud meter that user trusts
The resource of calculation.
Configuration item attribute storehouse would generally be established in CMDB, a series of attribute (i.e. configuration item category defined in attribute library
Property), such as model, U numbers, memory size, virtualization type etc., and specify that each attribute Value Types (numerical value, text,
Enumerate) and constraint rule.Attribute in attribute library can be grouped to maintain easily and manage, for example, base attribute group,
Asset Attributes group, specification set of properties, technical indicator group etc..Attribute in configuration item attribute storehouse is available for multiple configuration item templates to make
With, and dynamic expansion can be needed according to business at any time.
Key technical index storehouse would generally be established in CMDB, a series of crucial skill defined in key technical index storehouse
Art index, such as:CPU usage, database concurrency connection number, business transaction amount etc., and specify that each key technology refers to
Target index classification, unit, data source, default sample frequency and ways of presentation etc..Crucial skill in key technical index storehouse
Art index is available for multiple configuration item templates to use, and can need dynamic expansion according to business at any time.
Key technical index includes base values and polymerization index, and key technical index is divided into base values item and gathered
After closing index item, namely the hierarchical relationship between index is established, a base values item there can be multiple related polymerizations
Index item.
It polymerize the data of index item typically in the data of base values item by aggregation strategy (such as maximum, minimum
Value, average value, section ratio etc.) it is calculated., can also be in some others polymerization for some complicated polymerization index item
Further it is calculated on index item by polymerization, you can to allow multilayer polymeric.Refer to based on key technical index is distinguished
After marking item and polymerization index item, namely the hierarchical relationship between index is established, a base values item there can be multiple phases
The polymerization index item of pass.The dimension being polymerize according to technical indicator, polymerization index can be further subdivided into following two class:Time aggregation
And service aggregating.
For most of technical indicator, generally only need based on the time or polymerization calculating is carried out based on service, but for certain
It a little technical indicators, then may need to carry out polymerization calculating in 2 dimensions of time and service, and be carried out in 2 dimensions poly-
The type of total calculation is also different.If a technical indicator value both needs to carry out polymerization calculating based on the time, need to be based on again
Service carries out polymerization calculating, then needs to set the order between two converging operations, for some technical indicators, converging operation
Order it is different, the intension of its value calculated also will be completely different.Therefore, if a polymerization index needs simultaneously
The polymerization for carrying out two dimensions of time and service calculates, it is also necessary to specifies the polymerization sequence of both.
Initial data segmentation and slicing is sent into aggregation engine and carries out data by monitoring management platform by built-in aggregation engine
Polymerization, greatly improves the polymerization to mass data, and provide a variety of expanded configuration modes.Initial data is supported to gather
The statistics such as conjunction calculate, and the data of collection can further be calculated according to time dimension or service dimension, according to the time
Dimension supports polymerization cycle definition, supports the polymerization such as summation, maximum, minimum, average, counting, section ratio to calculate, according to service
It polymerize for servicing strongly connected technical indicator (such as the node data of application server cluster polymerize);Also support both to have needed
Time aggregation needs the situation of service aggregating again, and can be with Adjustable calculation order.
Monitoring management platform realizes the various dimensions diagnostic analysis function based on Data-Link and event chain using regulation engine.Will
The diagnosis of page configuration and analysis rule, with reference to the relation between configuration item, the mode of regular Dynamic Execution improves rule
Versatility and variational requirement.By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and failure
Diagnosis.Can be classified and not carry out alarming and managing, for example, seriously, mistake, warning, information etc.;Judgment model includes monodrome and judged (such as
Be more than, less than etc.), sampled probability judge that (such as have in continuous 10 times 8 inferior), sampling interval judge (such as to account for whole day sampling ratio
Rate etc.) etc. various ways, to eliminate accidental point, the similar repeated events of merger;The and combinations of multiple analysis linkage strategies are provided;
According to configuration strategy while O&M event can be produced include operation management component;Automation can be triggered according to configuration strategy
Flow carries out emergency processing.
In summary, the defects of existing for prior art, it is accordingly required in particular to the main frame intelligent monitor system based on high in the clouds, with
Solve the deficiencies in the prior art.
The content of the invention
It is an object of the invention to provide the main frame intelligent monitor system based on high in the clouds, the convenient performance to main frame is supervised
Survey, automaticity is excellent.
The present invention for solve its technical problem the technical scheme adopted is that
Main frame intelligent monitor system based on high in the clouds, the system include foreground presentation layer, background service layer physical host
Collection of resources module, virtual resource migration monitoring module, analysis linkage strategy module, the data stock of the system use NoSQL;
Foreground presentation layer mainly includes system administration, strategy configures, data check module;
Background service layer is matched somebody with somebody including data acquisition, data receiver Queue module, analysis linkage strategy, event handling, resource
Put management, data receiver Queue module, data acquisition module;
Physical host collection of resources module monitors number using the physical host under Pull type collection cloud computing platforms
According to monitoring server will by the server monitoring data under the management of api interface active inquiry cloud control centre, cloud control centre
Arrange data and return to monitoring server;
Virtual resource migration monitoring module often changes in application program to the demand of resource, and infrastructure must be able to
Enough very short time endoadaptation resource requirement changes, this just needs a kind of simple automatic, configurable without needing keeper excessively to intervene
Way to manage, virtual machine Autonomic Migration Framework function is that distributed resource scheduling needs, it can Continuous optimization cloud computing put down
Platform, virtual machine is migrated between more physical servers automatically, balanced more physical servers load, moved in virtual machine
During shifting, its operation system normally externally provides clothes, will not damage any data and business continuance;
After analysis linkage strategy module monitored device uploads monitoring data, NoSQL databases are arrived by processing storage
In, analysis module will be analyzed data, according to the strategy of setting, trigger various different disposal flows.
Further, system data flowing is upward by processing from underlying network finally to be shown and call by foreground
Displaying, it is nethermost be that data adopt layer, including physical server data acquisition, the collection of resource dynamic migration, network equipment number
According to acquisition module, physical server acquisition module is divided into the data acquisition and traditional platform data acquisition of cloud platform again, this be by
Under the collection of physical server and traditional platform there is bigger difference in the collection of physical server under cloud platform, thing under cloud platform
Manage server and one virtual platform efficiently simplified is installed by bare metal, to realize resource pool, business cloud platform provides
On from the safety of first floor system and it is stable unless, all it is not recommended that monitoring Agent in cloud base layer system installation third party in user, push away
Recommending is obtained from cloud control centre, and the physical server data acquisition of traditional platform, by Agent patterns, Agent patterns can prop up
More monitor control indexs are held, such as process monitoring, using monitoring, and data accuracy is high;It can also use without Agent patterns, lead to
SNMP, SSH, Syslog etc., the collection of resource dynamic migration, primarily to the dynamic location information of resource under cloud platform is gathered,
Cloud computing technology is by physical resource pool so that virtual function in more physical hosts dynamic migration, it is necessary to understand in real time empty
The position of plan machine thus need to monitor, support VMware and RHEV platforms most of collection, network equipment data acquisition pass through
SNMP type collections;
, it is necessary to upload onto the server to handle after the completion of data acquisition, in order to tackle substantial amounts of I/O request, number has been abstracted
, can be because of the demand to substantial amounts of data summarization to server according to receiving queue module, the received team's module of crossing of monitoring data will
Initial data has been delivered to data memory module, and data memory module supports relational data and non-relational data storage, this
In monitoring data be all stored in non-relational database, resources configuration management module is the core of monitor supervision platform, to monitor
In platform physical equipment can abstract representation for more than one or multiple configuration items combination, configuration item is designed with configuration item category
Relation, key technical index, configuration view module between property, configuration item, configuration item attribute is for example:Model, U numbers, memory size,
Virtualize type etc., and specify that the Value Types (numerical value, text, enumerating) and constraint rule of each attribute, configuration item with
The relation of configuration item, including:Host, connection, using, belong to, manage, and according to business dynamic can be needed to expand at any time
Exhibition, a series of key technical index defined in key technical index storehouse, such as:CPU usage, database concurrency connection
Number, business transaction amount etc., a series of resource view defined in resource view storehouse, such as computer room view, rack view, resource
Pond view, configuration item relational view, capacity view etc., while specify that the component relationship of each resource view, displaying framework
And constraint rule;
By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and fault diagnosis, interlocking module bag
Monodrome judge module is included, as internal memory is more than 80%, consecutive sample values are handled for judging single key index in continuous n times
Value is more than or less than some threshold values, and after setting key index analysis strategy, tactful threshold values, by trigger event, therefore has everywhere
Event processing module, monitor event can be created, and a variety of notice approach can be provided, including mail, short message, wechat etc.;
The superiors are presentation layer, and the alternation of bed of system, including system administration, and strategy configuration, data check three big moulds
Block, system administration include the submodules such as user management, rights management again, and strategy configuration includes acquisition strategies, monitoring strategies etc., number
It is investigated that seeing in module, user can check the current state of various kinds of equipment, the event of triggering, historical information, and energy specified time
Section generating run state report.
Further, the REST API used when virtual resource migrates the acquisition method of monitoring module by needing to provide collection
Interface IP address, the username and password that connecting interface certification needs, SSL safety certification certificates, if ignore the visa peace of certificate
Quan Xing, it is the monitoring information of the current acquirement of a physical server per a line in the result set of return, Section 1 is physics clothes
Configuration item title of the business device in CMDB, Section 2 is CPU usage, and Section 3 is memory usage;
In enterprise's private clound, by opening VMware vSphere DRS (Distributed Resource
Scheduler) function, loaded by across main frame autobalance, according to service priority Adjustable calculation resource, during low-load
Main frame is closed to reduce energy consumption, in balanced load on host computers, vMotion technologies can be used;
VMware vMotion refer to real-time migration of virtual machine function, entirely can will transported in the case of non-stop-machine
Capable virtual machine is moved to another physical server from a physical server, and virtual machine can retain its network identity and company
Connect, can be by the movable internal memory of the high speed network transmission virtual machine and accurate so that it is guaranteed that realize seamless transition process
Execution state, allow the virtual machine from the vSphere main frames of source operation be switched to and run on target vSphere main frames;
When physical host failure, under VMware HA protection, the virtual machine of the physical host will be in physical host collection
Reruned in other main frames in group, it is necessary to know when physical host failure, influence those virtual machines, it is necessary to monitor
VMotion information in VMware environment.
Further, analyzing the analysis linkage Engine objects of linkage strategy resume module includes base values, including:CPU's
Instantaneous value;Also polymerization index, wherein CPU daily mean are included, analysis engine supports pattern there are 4 kinds:
1) monodrome judgment model, such as:Equipment state!The per day load of=OK, CPU>=45%;
2) continuous sampling point probabilistic determination pattern is supported, such as:The instantaneous utilization rates of CPU contact 10 times in 8 times>=80%, should
Contacted 3 times with server thread>=25;
3) sampling interval ratio judgment model, such as:Store utilization rate>=60% accounts for whole month sampling ratio>=50%, CPU
Utilization rate>=80% accounts for whole day sampling ratio>=30%;
4) advanced combination judgment model, such as:Table space state!=Normal AND table space states!=Backup, file
System utilization rate>=85%AND remaining spaces<=10G;
The processing action that analysis engine is supported has three kinds;
The first creates the alarm of different stage, and warning level includes Fatal, Error, Warning, Info;
Second, O&M flow events are created, are docked with O&M workflow management platform, the O&M time is closed with alarm formation
Ring;
The third triggering automation mechanized operation, is docked, automation mechanized operation forms closed loop, example with alarm with automatic management platform
Such as, memory usage is worked as>=85% accounts for all sampling ratios>=50%, it is automatic to expand the 20% of internal memory;
For the monitoring data of physical server, the analysis linkage strategy of setting is as follows:
CPU usage>=50% and read-around ratio>=3
Memory usage>=75% and read-around ratio>=3
When the data in the CPU collections of physical host continuous 3 times both greater than 50%, accused by short message and mail he
It is alert, and event is generated in monitor supervision platform, keeper's processing is reminded, the analysis linkage strategy for internal memory is arranged to continuously here
3 times internal memory surpasses 75%;
After physical host data acquisition interface, the host data that is collected from interface routine is defined in monitor supervision platform
Key technical index, the type and display content of index, data retention time, frequency acquisition, chart display type can be defined;
In acquisition strategies configuration, acquisition strategies title is defined, the acquisition mode that acquisition interface is supported in monitor supervision platform has:
bat、HeartBeat、HTTP、Java、JDBC、JMX、Log、Mail、PING、RmoteSSH、SHELL、SnmpGet、
SnmpTrap, SnmpWalk, Syslog, Telnet, VBS, Web Services, wmic grab type, collection script, at data
Manage script.
It is an advantage of the current invention that physical server monitoring Foreground Data show and shown in graphical form under cloud platform, this
Physical host installs cloud computing software, and red line represents the physical memory utilization rate of main frame, and blue line represents that the CPU of main frame is used
Rate, we can also set time started and end time to inquire about, novel in design, be a good design, have very much
Market promotion prospect.
Brief description of the drawings
Describe the present invention in detail with reference to the accompanying drawings and detailed description:
Fig. 1 is that the present invention proposes system business module map;
Fig. 2 is that physical server gathers Organization Chart under cloud computing platform of the present invention;
Embodiment
In order that the technical means, the inventive features, the objects and the advantages of the present invention are easy to understand, tie below
Diagram and specific embodiment are closed, the present invention is expanded on further.
As shown in figure 1, the main frame intelligent monitor system based on high in the clouds, the system include foreground presentation layer, background service
Layer physical host collection of resources module, virtual resource migration monitoring module, analysis linkage strategy module, the data stock of the system
Using NoSQL;
Foreground presentation layer mainly includes system administration, strategy configures, data check module;
Background service layer is matched somebody with somebody including data acquisition, data receiver Queue module, analysis linkage strategy, event handling, resource
Put management, data receiver Queue module, data acquisition module;
Referring to Fig. 2, physical host collection of resources module is using the physics master under Pull type collection cloud computing platforms
Machine monitoring data, monitoring server pass through the server monitoring data under the management of api interface active inquiry cloud control centre, cloud control
Center processed returns to monitoring server by data are arranged;
Virtual resource migration monitoring module often changes in application program to the demand of resource, and infrastructure must be able to
Enough very short time endoadaptation resource requirement changes, this just needs a kind of simple automatic, configurable without needing keeper excessively to intervene
Way to manage, virtual machine Autonomic Migration Framework function is that distributed resource scheduling needs, it can Continuous optimization cloud computing put down
Platform, virtual machine is migrated between more physical servers automatically, balanced more physical servers load, moved in virtual machine
During shifting, its operation system normally externally provides clothes, will not damage any data and business continuance;
After analysis linkage strategy module monitored device uploads monitoring data, NoSQL databases are arrived by processing storage
In, analysis module will be analyzed data, according to the strategy of setting, trigger various different disposal flows.
Further, system data flowing is upward by processing from underlying network finally to be shown and call by foreground
Displaying, it is nethermost be that data adopt layer, including physical server data acquisition, the collection of resource dynamic migration, network equipment number
According to acquisition module, physical server acquisition module is divided into the data acquisition and traditional platform data acquisition of cloud platform again, this be by
Under the collection of physical server and traditional platform there is bigger difference in the collection of physical server under cloud platform, thing under cloud platform
Manage server and one virtual platform efficiently simplified is installed by bare metal, to realize resource pool, business cloud platform provides
On from the safety of first floor system and it is stable unless, all it is not recommended that monitoring Agent in cloud base layer system installation third party in user, push away
Recommending is obtained from cloud control centre, and the physical server data acquisition of traditional platform, by Agent patterns, Agent patterns can prop up
More monitor control indexs are held, such as process monitoring, using monitoring, and data accuracy is high;It can also use without Agent patterns, lead to
SNMP, SSH, Syslog etc., the collection of resource dynamic migration, primarily to the dynamic location information of resource under cloud platform is gathered,
Cloud computing technology is by physical resource pool so that virtual function in more physical hosts dynamic migration, it is necessary to understand in real time empty
The position of plan machine thus need to monitor, support VMware and RHEV platforms most of collection, network equipment data acquisition pass through
SNMP type collections;
, it is necessary to upload onto the server to handle after the completion of data acquisition, in order to tackle substantial amounts of I/O request, number has been abstracted
, can be because of the demand to substantial amounts of data summarization to server according to receiving queue module, the received team's module of crossing of monitoring data will
Initial data has been delivered to data memory module, and data memory module supports relational data and non-relational data storage, this
In monitoring data be all stored in non-relational database, resources configuration management module is the core of monitor supervision platform, to monitor
In platform physical equipment can abstract representation for more than one or multiple configuration items combination, configuration item is designed with configuration item category
Relation, key technical index, configuration view module between property, configuration item, configuration item attribute is for example:Model, U numbers, memory size,
Virtualize type etc., and specify that the Value Types (numerical value, text, enumerating) and constraint rule of each attribute, configuration item with
The relation of configuration item, including:Host, connection, using, belong to, manage, and according to business dynamic can be needed to expand at any time
Exhibition, a series of key technical index defined in key technical index storehouse, such as:CPU usage, database concurrency connection
Number, business transaction amount etc., a series of resource view defined in resource view storehouse, such as computer room view, rack view, resource
Pond view, configuration item relational view, capacity view etc., while specify that the component relationship of each resource view, displaying framework
And constraint rule;
By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and fault diagnosis, interlocking module bag
Monodrome judge module is included, as internal memory is more than 80%, consecutive sample values are handled for judging single key index in continuous n times
Value is more than or less than some threshold values, and after setting key index analysis strategy, tactful threshold values, by trigger event, therefore has everywhere
Event processing module, monitor event can be created, and a variety of notice approach can be provided, including mail, short message, wechat etc.;
The superiors are presentation layer, and the alternation of bed of system, including system administration, and strategy configuration, data check three big moulds
Block, system administration include the submodules such as user management, rights management again, and strategy configuration includes acquisition strategies, monitoring strategies etc., number
It is investigated that seeing in module, user can check the current state of various kinds of equipment, the event of triggering, historical information, and energy specified time
Section generating run state report.
In conventional monitoring systems, it is divided into two according to monitored device and monitoring server communication interaction different mode
Class.One kind is Push patterns, and one kind is Pull patterns.In Push patterns, monitored device is actively supervised to monitoring server transmission
Data are controlled, therefore the pattern has and is referred to as active monitoring mode;It is that monitoring server occurs to monitored device in Pull patterns
After information inquiring request, then from being sent to monitoring server after monitored device gathered data, therefore it is referred to as passive monitoring mould
Formula.Push, Pull pattern respectively have advantage and disadvantage, and Push patterns real-time is good, and resource consumption is higher, and autgmentability is strong, complex;
Push pattern real-times are poor, but resource consumption is relatively low.
Traditional physical server monitoring resource is gathered mostly by disposing agent, is gathered by monitoring agent monitored
Facility information, is then passed to monitoring server, and what we obtained under traditional mode is separate unit physical server information.But for
The monitoring of physical server resource under cloud computing platform, then traditional agent patterns can not be used, this will reduce architecture
The stability and security of layer.By Web Services interfaces, (we can refer to VMware here with cloud control centre for we
VCenter or RHEV-M) connection, obtain physical server CPU, the real-time consumption data of internal memory.We design unification herein
Cloud platform under physical server collection of resources module, the resource informations of more physical servers is once gathered, with unified letter
Breath form returns to monitor supervision platform.Below the physical services under VMware and RedHat enterprise-level virtual platforms will be introduced respectively
Device collection of resources.
Cloud computing platform would generally include the hardware and software of polytype, various structures and a variety of brands, therefore supervise
Control platform will can support the diversity of cloud computing platform software and hardware, have good compatibility to various software and hardwares, can be simultaneously
Monitor the hardware device and software of isomery.Therefore we need the various modules of abstract design.
Physical server monitoring collection index is as follows under our abstract cloud computing platforms:
The acquisition index of table 1
Acquisition index title: | Physical server resource utilization |
Configuration item template: | { X86 servers } |
Configuration item attribute: | { OS Type }=ESX/RHEV-H |
Gather entrance: | { cloud control centre }->{ management net IP address } |
Acquisition interface type: | [REST API] |
Frequency acquisition: | Every 5 minutes (can dynamic configuration) |
Acquisition method describes, it would be desirable to provides the REST api interfaces address used during collection, connecting interface certification needs
The username and password wanted, SSL safety certification certificates, if ignore the visa security of certificate, it is each in the result set of return
Row is the monitoring information of the current acquirement of a physical server, and Section 1 is configuration key name of the physical server in CMDB
Claim, Section 2 is CPU usage, and Section 3 is memory usage.Such as following table:
The acquisition method information of table 2
After basic data is collected, it will be stored in NoSQL databases by processing, the NoSQL that we use herein
The odds ratio of database is faster than the inquiry velocity of traditional database under large batch of data cases.
We first look at that how monitoring resource platform is by VMware Web Services interfaces acquisition physics herein
The initial data of server, the JAVA that the JAVA networks dependence class and VMware of quoting exploitation needs are supplied to by we first
SDK relies on class, java.net and java.rmi bags are to use network access, and we will connect Web Services interfaces and handle phase
Exception is closed, com.vmware.vim25 is the vSphere development kits of VMware officials.
After we refer to the dependence bag of JAVA networks and VMware dependence bag, define related JAVA classes and state member
Variable.
We define Web Services access URL character variable url, define and access Web Services interfaces
The user name character variable uesrName that certification needs, the code characters variable password that certification needs, and whether prompt
Boolean variable help default values are helped to be arranged to false.
We define parameter testing function getConnectionParameters below, are inputted when being called for checking
Whether parameter meets the requirements, if parameter is incorrect, output function uses prompting.Its processing procedure from the point of view of us, we expect
Input parameter be -- URL url, -- USERNAME username, -- PASSWORD passwd.We define one first
Individual integer variable ai, and initialization value is 0, definition character variable param, for recording parameters, character variable val recording parameterses
The value of setting.When parameter character set length is more than ai, takes out first element and go to space as param, ai+1 or small
When parameter character set length, using second parameter concentrate element take out be used as val, now we calling
EqualsIgnoreCase ignorecase comparison functions, if after param ignorecases with " -- help " character strings are equal,
So help values are arranged to true by us, and exit while circulations.Exit circulation will perform check input parameter whether be
Sky, now worthwhile so to be empty therefore prompt output using function.If not equal to " -- help ", we again this judge;
Param after ignorecase with " -- URL " character strings are equal, at the same val value be not with " -- " beginning, and be worth for sky
, if it is eligible we url value is arranged to val;As it is ineligible we continue to differentiate, method is similar to url's
Discriminant approach, simply character string be exchanged with " -- USERNAME " and " -- PASSWORD ".We put val after the comparison of a wheel
For sky, and by ai+2, compare into next round, until handling all parameters.Handle all input parameters, checking variable url,
Username, password are not sky, then by parameter detecting, otherwise output function, which uses, prompts.
Host information function, we are to establish to connect with VMware Web Services first, and link variable is assigned to
Si, in example function ServiceInstance is connected, we join url, username, password of acquisition above
Number is incoming.The root that we are obtained by si variables in VMware after incoming manages object, and is assigned to variable rootFolder, leads to
Crossing rootFolder, we begin stepping through search " HostSystem " management object, and are assigned to host complexes variable host_
views。
Host computer system host_views object sets are traveled through, each object in host_view set is a physics
The mapping of server.We are circulated using for, the form arranged according to detailed design, are sequentially output the configuration of physical server
Item title, the currently used rates of CPU, memory usage.
When obtaining host CPU information, we can not directly obtain total Hz numbers of main frame, be single by obtaining main frame
CPU Hz numbers, logic CPU quantity, then they are multiplied, the Hz numbers that CPU is currently consumed can be obtained directly, still will
Obtaining CPU utilization rate is:The Hz numbers that single (CPU Hz numbers × logic CPU quantity)/CPU is currently consumed.Obtain host memory
During total capacity, the unit that we obtain is byte.And it is that unit is M byte to obtain current consumption figures.Still need to hold to total
Value/1024/1024 obtains M byte.Then calculated.At the end of function, we will discharge Web Services connections.
Here is that we define principal function, facilitates monitor supervision platform to call, and we attempt to use first
GetConnectionParameters checks whether variable is correct, and correctly we call getHostInfo functions to go to take physics afterwards
Server host information, in the event of parameter error, we will be prompted to parameter operation instruction, will be direct if there is other exceptions
Print exception stack information.
By gathering script, we will gather back initial data, is being packaged into initial data by data processing script
Monitor supervision platform can receive data, store data into database.
First row is Hostname, and second is classified as CPU usage, and the 3rd is classified as memory usage.It is per data line
The information of VMware main frames.
In RHEV environment, we install RHEV-H (Hypervisor) on physical host, are established in virtual machine
RHEV-M (Manager), traditional monitoring agent modes cannot be used to physical host monitoring resource.
We obtain the resource using status of physical host using Python by connecting RHEV-M REST API, we
Certificate is needed to use when connecting RHEV-M, we by RHEVM certificate by issue orders, being saved in local, and be named as
rhevm.cer。
The resource information of physical host in RHEV environment is obtained, ovirtsdk is Python api interface class, and we are in generation
Code first trip statement is python codes, then quotes the python classes that programming needs to use.
Call api interface be connected to REST interfaces, here we need specify url, username, password,
The parameters such as insecure, ca_file, ca_file refer to that we connect the certificate of needs, it is necessary to instruct the particular location of certificate.
Travel through physical host set, output Hostname, CPU usage, memory usage.CPU utilization rate is to pass through
System user resource consumption is consumed and drawn in itself plus system, for internal memory utilization rate we will obtain currently used internal memory number
Internal memory obtains altogether for amount divided by system.After output is handled, we disconnect REST connections, with release procedure connection resource.
Program exception processing, the try that program starts combine together, and try partial code normal operations are that output is normal
Result set, when try partial codes perform exception, exception will be captured, and print abnormal cause.
In order to support the High Availabitity of cloud computing platform and flexibility, and SLA is ensured, the industry in cloud platform
Business system is required to freely dynamically migrate between physical host.For example, found in an important operation system operation
Its physical host run has hardware alarm, for the continuous availability of operation system, it would be desirable to by the void to operation system
Plan machine is migrated to other healthy physical hosts.
Application program is often changed to the demand of resource, and infrastructure, which allows for very short time endoadaptation resource, to be needed
Ask change, this just need it is a kind of it is simple automatic, without needing keeper excessively to intervene configurable way to manage.Virtual machine is from moving
Shifting function (i.e. distributed resource scheduling) is exactly what we needed, its energy Continuous optimization cloud computing platform, virtual machine is existed automatically
Migrated between more physical servers, balanced more physical servers load.
When virtual machine migrates, its operation system normally externally provides clothes, will not damage any data and business continuance.
Virtual resource migration monitoring collection index is as follows under our abstract cloud computing platforms:
The acquisition index information of table 3
Acquisition index title: | X86 virtual machine (vm) migration events |
Configuration item template: | { operating system } |
Configuration item attribute: | { OS Type }=X86 |
Gather entrance: | { cloud control centre }->{ management net IP address } |
Acquisition interface type: | [REST API] |
Frequency acquisition: | Every 10 minutes (can dynamic configuration) |
Acquisition method describes, it would be desirable to provides the REST api interfaces address used during collection, connecting interface certification needs
The username and password wanted, SSL safety certification certificates, if ignore the visa security of certificate, it is each in the result set of return
Row is the monitoring information of the current acquirement of a physical server, and Section 1 is configuration key name of the physical server in CMDB
Claim, Section 2 is CPU usage, and Section 3 is memory usage.Such as following table:
The acquisition method information of table 4
Returning result collection sample:
The returning result sample of table 5
In enterprise's private clound, by opening VMware vSphere DRS (Distributed Resource
Scheduler) function, loaded by across main frame autobalance, according to service priority Adjustable calculation resource.During low-load
Main frame is closed to reduce energy consumption.In balanced load on host computers, vMotion technologies can be used;
VMware vMotion refer to real-time migration of virtual machine function, entirely can will transported in the case of non-stop-machine
Capable virtual machine is moved to another physical server from a physical server, and virtual machine can retain its network identity and company
Connect, can be by the movable internal memory of the high speed network transmission virtual machine and accurate so that it is guaranteed that realize seamless transition process
Execution state, allow the virtual machine from the vSphere main frames of source operation be switched to and run on target vSphere main frames;
When physical host failure, under VMware HA protection, the virtual machine of the physical host will be in physical host collection
Reruned in other main frames in group, it is necessary to know when physical host failure, influence those virtual machines, it is necessary to monitor
VMotion information in VMware environment.
VMotion events in VMware environment are obtained by VMware Web Services interfaces.
GetVmotionEvents classes are defined, and private variable is set, url is that we are connected to VMwaer vCetner
API address, userName are the users of certification, and password is the password of certification user, and si is Web Services connections
Service Instance.
GetConnectionParameters functions are defined to detect, whether our input parameter is legal.Inspection parameter
Quantity, whether parameter type accord with definition type, is provided if do not met using prompting, before we be described.
Principal function, we call getConnectionParameters (args) function before connection API interface is performed
To check whether input parameter closes rule, we create Service Instance si after closing rule.
EventManager objects are created, because EventManager includes all events of system, it would be desirable to put filtering
Device obtains vMotion events, and what EventManger was obtained is historical events all in vCenter, and the resource that we pay close attention to
Migration and variation occurs only has " VmRelocatedEvent ", " VmMigratedEvent ", " DrsVmMigratedEvent ", "
DrsVmPoweredOnEvent ", " VmPoweredOnEvent " this several class, while acquiescence is whole historical events, Wo Mending
When take once within 10 minutes, therefore only need to take the event of first 10 minutes every time, and we set the artificial system sheet of initiation of event
Body is administrator.
After we set event filtering condition, inquiry obtains event sets events, passes through for cyclic variables events
Meet the event information of filter condition in set.Call printEvent events output every event of processing.Behind for circulations
Be exactly abnormality processing, if abnormal parameters will be prompted to operation instruction, if other it is abnormal we by output abnormality storehouse,
We will be switched off Web Services connections at the end of program, discharge resource.
The title of main frame is found by the OID of physical host object instance in VMware systems, this title be with
The title of equipment is consistent in CMDB, so as to it is understood that virtual machine has done vMotion migrations in those main frames.We pass through clothes
Pragmatic example si obtains All hosts system object hostsystems, then travels through hostsystems and finds in host computer system
Main frame equal with the Oid provided Oid, return to the customized information of main frame.
Event output function, we first judge whether the physical host information of event is empty, if being not sky, from the point of view of us
See whether evt objects are " com.vmware.vim25.VmRelocatedEvent " classes, if we determined that in event evt
Host Oid, function findHostAnnonByOid is passed to, obtains configuration item coding of the physical host in monitor supervision platform, such as
Fruit evt is not " com.vmware.vim25.VmRelocatedEvent " class, and host information is arranged to empty by us, uses character
Null represents that such purpose is exported according to the output format of chapter 4 agreement, and next we are evt events to be exported
In, virtual machine (vm) migration to that physical server, equally we are also that the Oid that virtual machine in evt is reached to Host is passed to function
FindHostAnnonByOid, obtain the configuration item coding of destination server.Next we will export storage migration information, first
First judge whether evt is " com.vmware.vim25.VmRelocatedEvent ", if just output storage is deposited from that source
Storage, has moved to that target storage information, otherwise just exports " null null ", the time that our last outgoing events occur.
Us can be reached to replace with null when not obtaining information, such way be in order to meet the design arranged in chapter 4,
So Formatting Output data are the information that follow-up interface handles our acquisitions for convenience.
A line record represents a migration event, and often capable Section 1 is that configuration item of the virtual machine in monitor supervision platform is compiled
Code, Section 2 are source physical server codings, and Section 3 is the destination server coding moved to, and Section 4 is that source storage is compiled
Code, Section 5 be migrate to target store coding, Section 6 is transit time.
The migration information of virtual machine machine in RHEV environment how is obtained from the point of view of us below, ovirtsdk is Python
Api interface class, we are python codes in code first trip statement, then quote the python classes that programming needs to use, mainly
It is ovirtsdk and time-triggered protocol class datetime.
Call api interface be connected to REST interfaces, here we need specify url, username, password,
The parameters such as insecure, ca_file, ca_file refer to that we connect the certificate of needs, it is necessary to instruct the particular location of certificate.
Our every 10 minutes operation programs inquiries once, find virtual machine in 10 minutes and move to situation, still first take
Current time now is obtained, re-defines shift time aDay, combines to obtain the time for needing to inquire about by shift time and current time
QTime, then format qTime and obtain the rTmie for meeting querying condition, query function is put into using rTmie as parameter.
We define event_list set, and to be put into the event content for meeting our demands, we inquire about first
Code is equal to 32, and event content includes " started ", the event in nearest 10 minutes.The description of time is cut by space, weight
The information for the formatting that Combination nova needs into us, vm:{VmName},fromHost:null,toHost:{HostName},
formDS:null,toDS:null,eTime:{ time }, this querying condition obtain, and virtual machine is empty from the information of cold start-up
Plan machine when being not keyed up, will not host in any physical host, host is only just understood when opening in certain thing
Manage main frame, it would be desirable to the hosted information of the virtual machine of start is recorded, in physical host failure, to confirm coverage.
Next querying condition is that code is equal to 506, and event content includes " restarted ", the event in nearest 10 minutes,
The record that virtual machine is restarted event is also very necessary, and sometimes virtual machine is restarted, precisely due to physical host failure, empty
Host's physical host of plan machine is changed, and the form for recording content keeps constant.
Next querying condition is that code is equal to 63, and event content includes " Migration ", the event in nearest 10 minutes,
This kind of logout is virtual machine (vm) migration to information, and the form for recording content keeps being slightly different, the virtual machine of record from that
The physical host of individual host, that target physical main frame is moved to.
Three of the above event information is all recorded in event_list set by we, and in order to handle conveniently, we are by three
The kind time carries out time-sequencing, is exported after sequence according to form.
After output, we disconnect, and are abnormality processings in program termination, when above code sends abnormal,
Exception will be caught, and the description content through exception exports.
Virtual machine (vm) migration that procedure above obtains is run to initial data.Section 1 is host information, and Section 2 is source thing
Manage main frame, Section 3 is target physical main frame, Section 4 be source storage information (if storage migration does not occur will be with null tables
Show), Section 5 is target storage information (if migration is not occurred and will be represented with null for storage), and Section 6 is transit time.
The analysis linkage Engine objects of analysis linkage strategy resume module include base values, including:CPU instantaneous value;
Also polymerization index, wherein CPU daily mean are included, analysis engine supports pattern there are 4 kinds:
1) monodrome judgment model, such as:Equipment state!The per day load of=OK, CPU>=45%;
2) continuous sampling point probabilistic determination pattern is supported, such as:The instantaneous utilization rates of CPU contact 10 times in 8 times>=80%, should
Contacted 3 times with server thread>=25;
3) sampling interval ratio judgment model, such as:Store utilization rate>=60% accounts for whole month sampling ratio>=50%, CPU
Utilization rate>=80% accounts for whole day sampling ratio>=30%;
4) advanced combination judgment model, such as:Table space state!=Normal AND table space states!=Backup, file
System utilization rate>=85%AND remaining spaces<=10G;
The processing action that analysis engine is supported has three kinds;
The first creates the alarm of different stage, and warning level includes Fatal, Error, Warning, Info;
Second, O&M flow events are created, are docked with O&M workflow management platform, the O&M time is formed closed loop with alarm;
The third triggering automation mechanized operation, is docked, automation mechanized operation forms closed loop, example with alarm with automatic management platform
Such as, memory usage is worked as>=85% accounts for all sampling ratios>=50%, it is automatic to expand the 20% of internal memory;
For the monitoring data of physical server, the analysis linkage strategy of setting is as follows:
CPU usage>=50% and read-around ratio>=3
Memory usage>=75% and read-around ratio>=3
When the data in the CPU collections of physical host continuous 3 times both greater than 50%, accused by short message and mail he
It is alert, and event is generated in monitor supervision platform, keeper's processing is reminded, the analysis linkage strategy for internal memory is arranged to continuously here
3 times internal memory surpasses 75%;
After physical host data acquisition interface, the host data that is collected from interface routine is defined in monitor supervision platform
Key technical index, the type and display content of index, data retention time, frequency acquisition, chart display type can be defined;
In acquisition strategies configuration, acquisition strategies title is defined, the acquisition mode that acquisition interface is supported in monitor supervision platform has:
bat、HeartBeat、HTTP、Java、JDBC、JMX、Log、Mail、PING、RmoteSSH、SHELL、SnmpGet、
SnmpTrap, SnmpWalk, Syslog, Telnet, VBS, Web Services, wmic grab type, collection script, at data
Manage script.
By gathering script, we will gather back initial data, is being packaged into initial data by data processing script
Monitor supervision platform can receive data, store data into database.
We are to use Groovy language to data processing script, and initial data is often gone and cut according to space, obtains 3
Individual character string, three character strings are formed into a Map object, then Map objects are stored in List objects, monitor supervision platform reprocessing
List objects, data are stored in database.
Data collection strategy, here we opening for every acquisition index in the acquisition strategies of X86 virtualization controllers is set
With whether, collection period etc..
In acquisition strategies, we define the collection of technical indicator, number based on the data of these technical indicators collection
According to we can obtain other data by processing basic data, such as:Day peak value, daily mean, all average value etc..
After having the data that technical indicator gathers, we can obtain basic data and aggregation strategy data, for these
Data, monitor supervision platform will configure the analysis linkage strategy of automation to these data results, for example CPU usage contacts 3 times and surpassed
Cross 50% alarm;Physical memory utilization rate surpasses 75% alarm etc. continuous 3 times.
For virtual resource migration information, as long as having collected data, illustrate just there is virtual machine to be migrated,
The affiliated relevant information of monitor supervision platform mid-term resource is not just inconsistent, as upper table describes virtual machine PVVMDC0013, previous place
Main frame is B05XDL580D physical hosts, through migrating to B05XDL580A, if B05XDL580D occur accident power-off this
The failure of sample, in whole platform in impacted virtual machine should no PVVMDC0013 be only correctly, therefore for whenever void
After in plan machine resource migration information gathering, we will establish warning message in monitor supervision platform, and adjust void by auto-programming
Plan machine resource belonging relation.
So the analysis linkage strategy of setting is as follows:
Virtual resource migrating data!=empty
Our vMotion technical indicators defined in monitor supervision platform, Criterion Attribute include the coding of technical indicator, title,
Frequency acquisition, data retention time etc., it is important to which it is List to specify pointer type, and defines List relevant information, index
After gathered data, there is index post processor by data persistence into database.Here main presentation VMware places an order
(difference under RHEV environment is mainly acquisition mode, the raw data results gathered under two kinds of environment for data acquisition configuration displaying
Form is consistent).
It is JAVA that we, which need to specify acquisition interface, specifies grab type:One-to-one, collection script is following, and (we need
Jar is wrapped and passed in monitoring server), gather user name in script, password, url are obtained from platform.
GetVmotionEvents.jar is the JAVA codes that we write, and $ $ ip $ $ are the clouds obtained from monitor supervision platform
The IP address of control centre, $ $ username $ $ are the user names that the connection of the cloud control centre that is obtained in monitor supervision platform needs, $
$ password $ $ are the access passwords of cloud control centre configuration, and we are recorded in monitor supervision platform these data, can from the page
To be obtained by parametrization.
Data processing script is the initial data of processing collection script, and we establish List objects, and collection script is obtained
Data be stored in List objects by processing, be transferred to backstage storage into database.
Parameter mapping is that we obtain url, user, the encrypted message deposited in advance from monitor supervision platform.
We are talked under private clound monitoring resource platform by taking configuration item as an example below, and an x86 server is how to supervise
How description is recorded in control platform.
T_CI is configuration item core table, and in monitor supervision platform, the object for the management that is monitored all is configuration item, and configuration item has respectively
Kind of template (T_CI_TEMPLATE), so as to the various equipment in adaptation data center, such as:Server, interchanger, rack, storage are set
Standby, fire wall etc..Configuration item table structure is as follows:
Table 6T_CI (configuration item) table structure
Above is configuration item table, when have recorded configuration item numbering, configuration item coding, configuration item title, template number, establishment
Between, founder, newest maintenance time, newest safeguard people, remarks, configuration item state.Configuration item numbering is major key and external key and not
Can be sky, configuration item code Design is English alphabet and numeral composition, and it is regularly arranged according to Platform Designing, and configuration item title is adopted
Described with Chinese, template number is designed as external key, and the current configuration item of configuration item state recording is also be off in available
State.
Such as:
It may be seen that in the record of the first row, Section 1 is Digital Pipelined number 3538, and Section 2 is English alphabet
A06BR720_A, Section 3 are that Chinese describes x86 servers (A06BR720_A), and Section 4 is template number 139, the 5th
Item is settling time 2012-11-02 13:51:56, Section 6 is founder's code 179674, and Section 7 is newest modification time
2013-08-12 10:22:29, Section 8 is modification people code 180140, and Section 91, expression is currently upstate, after
Face we will look at how to show the association attributes of an x86 server by taking the record as an example.
At cloud center, our all devices are all represented with configuration item, and each equipment is the configuration item template defined
One example.It is exactly configuration item table structure below.
Table 7T_CI_TEMPLATE (configuration item template) table structure:
It may be seen that configuration item template contain configuration item numbering, template name, founder, creation time, label,
Stereotype, main frame and external key are template number.Physical record is as follows:
It may be seen that configuration item template number is 139, its template name is x86 servers.
Here is configuration item template attributes relation table, and configuration item template and its attribute will be linked by this table.We can
To see that configuration item template attributes light table includes template number, template attributes numbering, sequence mark.External key be template number and
Template attributes are numbered.
Table 8T_CI_TEMPLATE_PROPERTY_REL (configuration item template attributes relation) table structure
The attribute of x86 Server templates is relatively more, there is each attribute, is specifically those attributes, we will use down
The table in face.
Configuration item template attributes table, the literary name section have template attributes numbering, template attributes title, label, type (acquiescence
For 0), template attributes value, group, number, remarks, data type, unit.Major key and outer key mapping template attributes numbering.
Table 9T_CI_TEMPLATE_PROPERTY (configuration item template attributes) table structure
Configuration item template number be 139 attribute include production firm, model, Firware, CPU frequency, CPU models,
Flash Card Types etc. information.
Monitor supervision platform to cloud center overall information source show, show current X86 resource pools operational capability service condition and
Total amount, we are identified with XCU, XCU value we be defined as:Physical cpu sum × Core quantity × dominant frequency (GHz).It is aobvious
Show current minicomputer resource;NAS and SAN storage resources;The currently used value and total amount of load balancing resource pool etc..Green represents
Resource is sufficient, and yellow represents that resource is nervous, and red represents resource scarcity.Keeper's consideration is prompted to adopt when a certain resource yellow
Extended resources are purchased, warned when red expand this resource immediately.
Under cloud platform, physical server monitoring Foreground Data is shown to be shown in graphical form, physical host installation cloud meter
Software (Hypervisor) is calculated, red line represents the physical memory utilization rate of main frame, and blue line represents the CPU usage of main frame,
We can also set time started and end time to inquire about.
Data acquiescence shows the data of nearest 105 minutes, between we can be set at the beginning of inquiry by querying condition
And the end time, to show the datagram of corresponding period.
Virtual machine (vm) migration data acquisition results, we can define initial time and end time, to inquire about collection result.
The result of display is the time that we gather, and the virtual resource migration display format defined in chapter 4, virtual machine are matched somebody with somebody
Put item coding, source physical server, destination server, source use storage, target use storage, transit time.
General principle, principal character and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the simply explanation described in above-described embodiment and specification is originally
The principle of invention, various changes and modifications of the present invention are possible without departing from the spirit and scope of the present invention, these changes
Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its
Equivalent defines.
Claims (4)
1. the main frame intelligent monitor system based on high in the clouds, it is characterised in that the system includes foreground presentation layer, background service layer
Physical host collection of resources module, virtual resource migration monitoring module, analysis linkage strategy module, the data stock of the system adopt
Use NoSQL;
Foreground presentation layer mainly includes system administration, strategy configures, data check module;
Background service layer includes data acquisition, data receiver Queue module, analysis linkage strategy, event handling, resource distribution pipe
Reason, data receiver Queue module, data acquisition module;
Physical host collection of resources module is using the physical host monitoring data under Pull type collection cloud computing platforms, prison
The server monitoring data under server is managed by api interface active inquiry cloud control centre are controlled, cloud control centre will arrange
Data return to monitoring server;
Virtual resource migration monitoring module often changes in application program to the demand of resource, and infrastructure allows for very
Short time endoadaptation resource requirement changes, this just need it is a kind of it is simple automatic, without needing keeper excessively to intervene configurable pipe
Reason mode, virtual machine Autonomic Migration Framework function are that distributed resource scheduling needs, its energy Continuous optimization cloud computing platform, from
It is dynamic to migrate virtual machine between more physical servers, balanced more physical servers load, when virtual machine migrates,
Its operation system normally externally provides clothes, will not damage any data and business continuance;
After analysis linkage strategy module monitored device uploads monitoring data, by processing storage into NoSQL databases, point
Analysis module will be analyzed data, according to the strategy of setting, trigger various different disposal flows.
2. the main frame intelligent monitor system according to claim 1 based on high in the clouds, it is characterised in that the system data flows
Be it is upward by processing from underlying network, finally by foreground show call displaying, it is nethermost be that data adopt layer, including
Physical server data acquisition, the collection of resource dynamic migration, network equipment data acquisition module, physical server acquisition module is again
It is divided into the data acquisition and traditional platform data acquisition of cloud platform, this is due to collection and the tradition of physical server under cloud platform
Under platform there is bigger difference in the collection of physical server, and physical server installs a high-efficiency precision by bare metal under cloud platform
The virtual platform of letter, to realize resource pool, business cloud platform provide the upper safety from first floor system and stably unless, not
It is recommended that monitoring Agent in cloud base layer system installation third party in user, recommendation is obtained from cloud control centre, the thing of traditional platform
Server data collection is managed, by Agent patterns, Agent patterns can support more monitor control indexs, such as process monitoring, application
Monitoring, and data accuracy is high;It can also use without Agent patterns, lead to SNMP, SSH, Syslog etc., resource dynamic migration is adopted
Collection, primarily under collection cloud platform resource dynamic location information, cloud computing technology is by physical resource pool so that virtual
Function in more physical hosts dynamic migration, it is necessary in real time understand virtual machine position thus need to monitor, support VMware and
Most of the collection of RHEV platforms, network equipment data acquisition passes through SNMP type collections;
, it is necessary to upload onto the server to handle after the completion of data acquisition, in order to tackle substantial amounts of I/O request, data have been abstracted it and have connect
Queue module is received, can be because of the demand to substantial amounts of data summarization to server, the received team's module excessively of monitoring data will be original
Data delivery has arrived data memory module, and data memory module is supported relational data and non-relational data storage, supervised here
Control data have all been stored in non-relational database, and resources configuration management module is the core of monitor supervision platform, in monitor supervision platform
Middle physical equipment can abstract representation for more than one or multiple configuration items combination, configuration item is designed with configuration item attribute, matches somebody with somebody
Relation, key technical index, configuration view module between item are put, configuration item attribute is for example:It is model, U numbers, memory size, virtual
Change type etc., and specify that the Value Types (numerical value, text, enumerating) and constraint rule of each attribute, configuration item and configuration
The relation of item, including:Host, connection, using, belong to, manage, and dynamic expansion can be needed according to business at any time, closed
A series of key technical index defined in key technology index storehouse, such as:CPU usage, database concurrency connection number, business
Trading volume etc., a series of resource view defined in resource view storehouse, such as computer room view, rack view, resource pool view,
Configuration item relational view, capacity view etc., while specify that the component relationship, displaying framework and constraint rule of each resource view
Then;
By analyzing linkage strategy management, it is possible to achieve complex logic give warning in advance and fault diagnosis, interlocking module include single
It is worth judge module, as internal memory is more than 80%, consecutive sample values are handled for judging that single key index is big in continuous n times intermediate value
In or less than some threshold values, after setting key index analysis strategy, tactful threshold values, by trigger event, therefore has event everywhere
Processing module, monitor event can be created, and a variety of notice approach can be provided, including mail, short message, wechat etc.;
The superiors are presentation layer, and the alternation of bed of system, including system administration, and strategy configuration, data check three big modules, are
Reason under the overall leadership includes acquisition strategies, monitoring strategies etc. including the submodules such as user management, rights management, strategy configuration again, and data are looked into
See in module, user can check the current state of various kinds of equipment, the event of triggering, historical information, and energy specified time Duan Sheng
Reported into running status.
3. the main frame intelligent monitor system according to claim 1 based on high in the clouds, it is characterised in that virtual resource migration prison
The REST api interfaces address used when controlling the acquisition method of module by needing to provide collection, the use that connecting interface certification needs
Name in an account book and password, SSL safety certification certificates, if ignore the visa security of certificate, every a line is one in the result set of return
The monitoring information of the current acquirement of platform physical server, Section 1 are configuration item title of the physical server in CMDB, second
Item is CPU usage, and Section 3 is memory usage;
In enterprise's private clound, by opening VMware vSphere DRS (Distributed Resource Scheduler) work(
Can, loaded by across main frame autobalance, according to service priority Adjustable calculation resource, main frame is closed during low-load to drop
Low energy consumption, in balanced load on host computers, vMotion technologies can be used;
VMware vMotion refer to real-time migration of virtual machine function, can will be entirely currently running in the case of non-stop-machine
Virtual machine is moved to another physical server from a physical server, and virtual machine can retain its network identity and connection, from
And ensure to realize seamless transition process, shape can be performed by the movable internal memory of the high speed network transmission virtual machine and accurately
State, allow the virtual machine from the vSphere main frames of source operation be switched to and run on target vSphere main frames;
When physical host failure, under VMware HA protection, the virtual machine of the physical host will be in physical host cluster
Other main frames in rerun, it is necessary to know when physical host failure, influence those virtual machines, it is necessary to monitor VMware rings
VMotion information in border.
4. the main frame intelligent monitor system according to claim 1 based on high in the clouds, it is characterised in that enter to analyze linkage strategy
The analysis linkage Engine objects of resume module include base values, including:CPU instantaneous value;Also polymerization index is included, wherein
CPU daily mean, analysis engine support pattern to have 4 kinds:
1) monodrome judgment model, such as:Equipment state!The per day load of=OK, CPU>=45%;
2) continuous sampling point probabilistic determination pattern is supported, such as:The instantaneous utilization rates of CPU contact 10 times in 8 times>=80%, using clothes
Business device thread contacts 3 times>=25;
3) sampling interval ratio judgment model, such as:Store utilization rate>=60% accounts for whole month sampling ratio>=50%, CPU are used
Rate>=80% accounts for whole day sampling ratio>=30%;
4) advanced combination judgment model, such as:Table space state!=Normal AND table space states!=Backup, file system
Utilization rate>=85%AND remaining spaces<=10G;
The processing action that analysis engine is supported has three kinds;
The first creates the alarm of different stage, and warning level includes Fatal, Error, Warning, Info;
Second, O&M flow events are created, are docked with O&M workflow management platform, the O&M time is formed closed loop with alarm;
The third triggering automation mechanized operation, is docked with automatic management platform, and automation mechanized operation forms closed loop with alarm, for example,
Work as memory usage>=85% accounts for all sampling ratios>=50%, it is automatic to expand the 20% of internal memory;
For the monitoring data of physical server, the analysis linkage strategy of setting is as follows:
CPU usage>=50% and read-around ratio>=3
Memory usage>=75% and read-around ratio>=3
When the data in the CPU collections of physical host continuous 3 times both greater than 50%, alerted by short message and mail he, and
Event is generated in monitor supervision platform, reminds keeper's processing, the analysis linkage strategy for internal memory is arranged in continuous 3 times here
Deposit and surpass 75%;
The host data that is collected from interface routine, it is crucial defined in monitor supervision platform after physical host data acquisition interface
Technical indicator, the type and display content of index, data retention time, frequency acquisition, chart display type can be defined;
In acquisition strategies configuration, acquisition strategies title is defined, the acquisition mode that acquisition interface is supported in monitor supervision platform has:bat、
HeartBeat、HTTP、Java、JDBC、JMX、Log、Mai l、PING、RmoteSSH、SHELL、SnmpGet、SnmpTrap、
SnmpWalk, Syslog, Telnet, VBS, Web Services, wmic grab type, collection script, data processing script.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610788477.XA CN107786616A (en) | 2016-08-30 | 2016-08-30 | Main frame intelligent monitor system based on high in the clouds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610788477.XA CN107786616A (en) | 2016-08-30 | 2016-08-30 | Main frame intelligent monitor system based on high in the clouds |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107786616A true CN107786616A (en) | 2018-03-09 |
Family
ID=61450520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610788477.XA Pending CN107786616A (en) | 2016-08-30 | 2016-08-30 | Main frame intelligent monitor system based on high in the clouds |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107786616A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108429755A (en) * | 2018-03-21 | 2018-08-21 | 深圳天源迪科信息技术股份有限公司 | Basic network security information dynamic management platform and method |
CN109274557A (en) * | 2018-11-14 | 2019-01-25 | 江苏鸿信***集成有限公司 | Intelligent CMDB management and cloud host monitor method under a kind of cloud environment |
CN109412829A (en) * | 2018-08-30 | 2019-03-01 | 华为技术有限公司 | A kind of prediction technique and equipment of resource distribution |
CN109450686A (en) * | 2018-11-12 | 2019-03-08 | 北京交通大学 | A kind of network resource management system and method based on pervasive network |
CN109460223A (en) * | 2018-11-14 | 2019-03-12 | 沈阳林科信息技术有限公司 | A kind of API gateway management system and its method |
CN109728938A (en) * | 2018-12-11 | 2019-05-07 | 国云科技股份有限公司 | A kind of method of assessment system service level |
CN109901912A (en) * | 2019-03-01 | 2019-06-18 | 厦门容能科技有限公司 | A method of recommending the configuration of cloud host |
CN109933476A (en) * | 2019-03-20 | 2019-06-25 | 浪潮商用机器有限公司 | A kind of display methods and device of OpenPOWER server performance |
CN110290075A (en) * | 2019-04-17 | 2019-09-27 | 李士锋 | A kind of method for managing resource and system of police cloud computing platform |
CN110515701A (en) * | 2019-08-28 | 2019-11-29 | 杭州数梦工场科技有限公司 | A kind of thermomigration process and device of virtual machine |
CN111026336A (en) * | 2019-12-26 | 2020-04-17 | 中国建设银行股份有限公司 | Automatic operation and maintenance method and operation and maintenance system of SAN storage system |
CN111061612A (en) * | 2019-12-12 | 2020-04-24 | 天地伟业技术有限公司 | Embedded system state monitoring method |
CN111198854A (en) * | 2019-12-27 | 2020-05-26 | 南京金绿汇成信息科技有限公司 | Data state tracking method of multi-source data acquisition device |
CN111414129A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | System and method for configuring FPGA control data based on cloud and electronic equipment |
CN112204521A (en) * | 2018-05-25 | 2021-01-08 | 微软技术许可有限责任公司 | Processor feature ID response for virtualization |
CN112350855A (en) * | 2020-10-26 | 2021-02-09 | 浪潮云信息技术股份公司 | Configuration-based cloud center management method |
CN112395152A (en) * | 2019-08-19 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Server resource monitoring method and device |
CN113010491A (en) * | 2021-02-24 | 2021-06-22 | 光大兴陇信托有限责任公司 | Cloud-based data management method and system |
CN114826968A (en) * | 2022-07-01 | 2022-07-29 | 锐盈云科技(天津)有限公司 | Enterprise intelligent cloud monitoring system |
CN114979158A (en) * | 2022-05-23 | 2022-08-30 | 深信服科技股份有限公司 | Resource monitoring method, system, equipment and computer readable storage medium |
-
2016
- 2016-08-30 CN CN201610788477.XA patent/CN107786616A/en active Pending
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108429755A (en) * | 2018-03-21 | 2018-08-21 | 深圳天源迪科信息技术股份有限公司 | Basic network security information dynamic management platform and method |
CN112204521A (en) * | 2018-05-25 | 2021-01-08 | 微软技术许可有限责任公司 | Processor feature ID response for virtualization |
CN109412829A (en) * | 2018-08-30 | 2019-03-01 | 华为技术有限公司 | A kind of prediction technique and equipment of resource distribution |
CN109412829B (en) * | 2018-08-30 | 2020-11-17 | 华为技术有限公司 | Resource allocation prediction method and equipment |
CN109450686B (en) * | 2018-11-12 | 2020-11-03 | 北京交通大学 | Network resource management system and method based on pervasive network |
CN109450686A (en) * | 2018-11-12 | 2019-03-08 | 北京交通大学 | A kind of network resource management system and method based on pervasive network |
CN109274557A (en) * | 2018-11-14 | 2019-01-25 | 江苏鸿信***集成有限公司 | Intelligent CMDB management and cloud host monitor method under a kind of cloud environment |
CN109460223A (en) * | 2018-11-14 | 2019-03-12 | 沈阳林科信息技术有限公司 | A kind of API gateway management system and its method |
CN109728938A (en) * | 2018-12-11 | 2019-05-07 | 国云科技股份有限公司 | A kind of method of assessment system service level |
CN111414129B (en) * | 2019-01-07 | 2023-05-05 | 阿里巴巴集团控股有限公司 | Cloud-based FPGA control data configuration system and method and electronic equipment |
CN111414129A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | System and method for configuring FPGA control data based on cloud and electronic equipment |
CN109901912A (en) * | 2019-03-01 | 2019-06-18 | 厦门容能科技有限公司 | A method of recommending the configuration of cloud host |
CN109933476A (en) * | 2019-03-20 | 2019-06-25 | 浪潮商用机器有限公司 | A kind of display methods and device of OpenPOWER server performance |
CN110290075A (en) * | 2019-04-17 | 2019-09-27 | 李士锋 | A kind of method for managing resource and system of police cloud computing platform |
CN112395152A (en) * | 2019-08-19 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Server resource monitoring method and device |
CN112395152B (en) * | 2019-08-19 | 2022-04-12 | 阿里巴巴集团控股有限公司 | Server resource acquisition method and acquisition system |
CN110515701A (en) * | 2019-08-28 | 2019-11-29 | 杭州数梦工场科技有限公司 | A kind of thermomigration process and device of virtual machine |
CN111061612A (en) * | 2019-12-12 | 2020-04-24 | 天地伟业技术有限公司 | Embedded system state monitoring method |
CN111026336A (en) * | 2019-12-26 | 2020-04-17 | 中国建设银行股份有限公司 | Automatic operation and maintenance method and operation and maintenance system of SAN storage system |
CN111198854A (en) * | 2019-12-27 | 2020-05-26 | 南京金绿汇成信息科技有限公司 | Data state tracking method of multi-source data acquisition device |
CN112350855A (en) * | 2020-10-26 | 2021-02-09 | 浪潮云信息技术股份公司 | Configuration-based cloud center management method |
CN112350855B (en) * | 2020-10-26 | 2023-03-31 | 浪潮云信息技术股份公司 | Configuration-based cloud center management method |
CN113010491A (en) * | 2021-02-24 | 2021-06-22 | 光大兴陇信托有限责任公司 | Cloud-based data management method and system |
CN113010491B (en) * | 2021-02-24 | 2023-10-03 | 光大兴陇信托有限责任公司 | Cloud-based data management method and system |
CN114979158A (en) * | 2022-05-23 | 2022-08-30 | 深信服科技股份有限公司 | Resource monitoring method, system, equipment and computer readable storage medium |
CN114979158B (en) * | 2022-05-23 | 2024-04-09 | 深信服科技股份有限公司 | Resource monitoring method, system, equipment and computer readable storage medium |
CN114826968A (en) * | 2022-07-01 | 2022-07-29 | 锐盈云科技(天津)有限公司 | Enterprise intelligent cloud monitoring system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107786616A (en) | Main frame intelligent monitor system based on high in the clouds | |
CN105843904B (en) | For the monitoring warning system of database runnability | |
CN104463492B (en) | A kind of operation management method of power system cloud emulation platform | |
US8175863B1 (en) | Systems and methods for analyzing performance of virtual environments | |
Coutinho et al. | Elasticity in cloud computing: a survey | |
US10762452B2 (en) | System and method for designing and executing control loops in a cloud environment | |
CN110809017A (en) | Data analysis application platform system based on cloud platform and micro-service framework | |
CN109714192A (en) | A kind of monitoring method and system monitoring cloud platform | |
CN107943668A (en) | Computer server cluster daily record monitoring method and monitor supervision platform | |
WO2023142054A1 (en) | Container microservice-oriented performance monitoring and alarm method and alarm system | |
US10116534B2 (en) | Systems and methods for WebSphere MQ performance metrics analysis | |
CN108092813A (en) | Data center's total management system server hardware Governance framework and implementation method | |
CN106201754A (en) | Mission bit stream analyzes method and device | |
CN105323111A (en) | Operation and maintenance automation system and method | |
CN108777637A (en) | A kind of data center's total management system and method for supporting server isomery | |
CN109471845A (en) | Blog management method, server and computer readable storage medium | |
EP4020218B1 (en) | Analyzing large-scale data processing jobs | |
CN103295155A (en) | Security core service system monitoring method | |
CN105490864A (en) | Business module monitoring method based on OSGI | |
WO2023138014A1 (en) | Intelligent operation and maintenance system oriented to computing-network integration scenario and use method thereof | |
CN109165228A (en) | Smart grid Dispatching Control System real-time data base monitoring system and method | |
CN107704362A (en) | A kind of method and device based on Ambari monitoring big data components | |
Metsch et al. | Apex lake: a framework for enabling smart orchestration | |
Kocsis et al. | Measurement-based identification of infrastructures for trustworthy cyber-physical systems | |
Zurkowski et al. | Towards Self-Organizing Cloud Polyglot Database Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180309 |