CN111008646A

CN111008646A - Man-machine relationship verification method and device based on equipment use condition

Info

Publication number: CN111008646A
Application number: CN201911075033.1A
Authority: CN
Inventors: 王申华; 蒋红亮; 方小方; 何湘威; 吕齐; 陈澄; 柯公武; 严冬; 寿博仁; 刘吉权; 吴辉; 曹保良; 王挺; 张晨阳
Original assignee: Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd; Wuyi Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd; Wuyi Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2020-04-14

Abstract

The invention discloses a man-machine relationship verification method and a man-machine relationship verification device based on equipment use conditions, wherein the method comprises the following steps: acquiring a device use condition database, and extracting a training set D from the device use condition database; extracting a feature set A of a training set D, wherein the feature set A contains features used for judging the use condition of equipment; calculating the experience condition entropy and information gain of each feature in the feature set A to the training set D based on an ID3 algorithm to select proper root nodes and intermediate nodes; constructing a decision tree according to the selected root node and the middle node; analyzing whether the equipment is frequently used and whether the behavior of replacing the used equipment without permission occurs on the basis of the decision tree so as to adjust and maintain the requirement of the equipment; a corresponding apparatus is also disclosed. The invention can find whether the equipment is frequently used or not and whether the behavior of replacing the equipment without permission occurs or not in time through the equipment characteristic identification calculation, thereby improving the information safety and accuracy, and solving the problems of equipment idling, equipment replacing users without permission and the like.

Description

Man-machine relationship verification method and device based on equipment use condition

Technical Field

The invention relates to the technical field of power grid operation and maintenance, in particular to a man-machine relationship verification method and device based on equipment use conditions.

Background

The communication professional department is a communication transportation inspection class and belongs to a maintenance construction work area under an operation and maintenance department. In actual work, various communication devices (particularly terminal devices) are widely distributed, and in addition, personnel replacement and responsibility change are carried out. Due to long-term shortage of personnel, the team heavy operation and maintenance light management is caused, the updating of the equipment ledger is delayed, omission often occurs, and a corresponding control means is lacked. Particularly, when the post of the employee of the company is frequently transferred, the situation that the personnel is transferred in place and the equipment standing book information is not updated often occurs, so that the situations that the employee does not transfer equipment, the equipment is mixed and the equipment is idle exist all the time.

Disclosure of Invention

The invention provides a man-machine relationship verification method and device based on equipment use conditions to solve the technical problems.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

according to a first aspect of the embodiments of the present invention, there is provided a human-computer relationship verification method based on device usage, including the following steps:

step 101, acquiring a device use condition database, and extracting a training set D from the device use condition database;

step 102, extracting a feature set A of a training set D, wherein the feature set A contains features used for judging the use condition of equipment;

103, calculating the experience condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm to select proper root nodes and intermediate nodes;

104, constructing a decision tree according to the selected root node and the middle node;

and step 105, analyzing whether the equipment is frequently used and whether the behavior of replacing the used equipment without permission occurs on the basis of the decision tree, thereby carrying out requirement adjustment and maintenance on the equipment.

Preferably, the step 103 includes:

step 1031, classifying the training set D according to the fact that whether the equipment using crowd changes or not, and calculating experience entropy of the training set D;

step 1032, sequentially calculating the experience condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm;

step 1033, selecting the feature with the largest information gain as the root node feature of the training set D, and dividing the feature into a plurality of subsets;

step 1034, respectively calculating the empirical condition entropy and the information gain of the residual features in the feature set a to each subset, and selecting leaf nodes.

Preferably, the training set D is classified according to whether the device usage population changes, and the process of calculating the experience entropy of the training set D is as follows:

the "whether the population of devices is changed" is labeled as feature C, which has K possible values C ═ C₁,C₂,...,C_kThe training set D is divided into K classes according to the characteristics C, C_kFrequency of occurrence is p_kAnd K is more than or equal to 1 and less than or equal to K, and K and K are integers, the empirical entropy of the training set D is as follows:

preferably, the process of sequentially calculating the empirical condition entropy and the information gain of each feature in the feature set a to the training set D based on the ID3 algorithm is as follows:

in step 10321, each feature in the feature set A is labeled as A₁、A₂、…、A_mM is an integer greater than or equal to 1;

step 10322, feature set A₁Dividing the training set D into n subsets D_i＝[D₁,D₂,D₃,…,D_n]N is an integer greater than or equal to 1, each subset is divided into k classes according to the characteristics C, and then the characteristic set A₁Empirical conditional entropy for training set D:

wherein i is more than or equal to 1 and less than or equal to n, K is more than or equal to 1 and less than or equal to K, i, n, K and K are integers, | D_iL is the subset of samples D_iThe number of samples contained in, | D | is the number of samples contained in the training set D, | D_ikL is the subset of samples D_iThe number of samples contained in the kth class;

step 10323, calculate feature A₁Information gain of：

g(D,A₁)＝H(D)-H(D|A₁)；

Step 10324, repeat step 10322 and step 10323, and obtain the empirical conditional entropy and information gain of other features in the feature set a to the training set D.

Preferably, each feature in the feature set a corresponds to and represents the frequency of use of each service platform on the device, and the value of each feature includes four kinds: frequent, occasional, with access, without access.

Preferably, in step 101, only the devices with the average weekly visit number exceeding 1 are selected when the device usage database is acquired.

According to a second aspect of the embodiments of the present invention, there is provided a human-computer relationship verification apparatus based on device usage, including:

the data extraction module is used for acquiring an equipment use condition database and extracting a training set D from the equipment use condition database;

the characteristic extraction module is used for extracting a characteristic set A of a training set D, and the characteristic set A contains characteristics used for judging the service condition of equipment;

the information gain calculation module is used for calculating the experience condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm so as to select proper root nodes and middle nodes;

the decision tree construction module is used for constructing a decision tree according to the selected root node and the middle node;

and the analysis module is used for analyzing whether the equipment is frequently used and whether the behavior of replacing the used equipment without permission occurs on the basis of the decision tree so as to adjust and maintain the equipment.

Preferably, the information gain calculation module includes:

the experience entropy calculation submodule is used for classifying the training set D according to the fact that whether the equipment using crowd changes or not, and calculating the experience entropy of the training set D;

the information gain calculation submodule is used for sequentially calculating the experience condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm;

the root node selection submodule selects the characteristic with the maximum information gain as the root node characteristic of the training set D and divides the root node characteristic into a plurality of subsets;

and the leaf node selection submodule is used for respectively calculating the empirical condition entropy and the information gain of the residual features in the feature set A to each subset and selecting the leaf nodes.

Preferably, the data extraction module only selects the device with the average weekly visit number exceeding 1 when acquiring the device use condition database.

Compared with the prior art, the invention can find whether the equipment is frequently used or not and whether the behavior of replacing the equipment without permission occurs or not in time through the equipment characteristic identification calculation, thereby improving the information safety and accuracy and solving the problems of equipment idling, equipment replacing users without permission and the like.

Drawings

FIG. 1 is a flow chart of a human-computer relationship verification method based on device usage of the present invention;

FIG. 2 is a flowchart of step 103 of the method for verifying human-computer relationship based on device usage according to the present invention;

FIG. 3 is a block diagram of a human-computer relationship verification apparatus according to the present invention;

fig. 4 is a block diagram of an information gain calculation module in the human-computer relationship verification apparatus based on the device usage of the present invention.

In the figure, 201-a data extraction module, 202-a feature extraction module, 203-an information gain calculation module, 204-a decision tree construction module, 205-an analysis module, 231-an empirical entropy calculation sub-module, 232-an information gain calculation sub-module, 233-a root node selection sub-module and 234-a leaf node selection sub-module.

Detailed Description

The present invention will be described in detail below with reference to specific embodiments shown in the drawings. These embodiments are not intended to limit the present invention, and structural, methodological, or functional changes made by those skilled in the art according to these embodiments are included in the scope of the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, a man-machine relationship verification method based on device usage includes the following steps:

In step 101, only the devices with the average weekly visit number exceeding 1 are selected when the device usage database is obtained, so as to remove other data for data representativeness and accuracy. The device usage database contains data such as: computer switch (firewall) log data, fields, source address, destination address; the destination address and service system statistics, network segment address and corresponding service platform name; IP binding MAC address, P address, device user, device human department, etc. Typically, 20 sets of left and right data are extracted to form a training set D.

The feature set a in step 102 may be classified into a training set D according to whether the device usage population changes, and is used to represent two major directions of the decision tree: the crowd of the equipment is changed, and the crowd of the equipment is not changed. Each feature in the feature set a is used to identify various service platforms applied to the device, and represents the frequency of use of the device for the various service platforms, and the value of each feature includes four kinds: frequent, occasional, with access, without access. The values of the characteristics can be classified according to preset threshold values, so that the use times of various service platforms on the equipment are determined as "frequent", "occasional", "access" or "no access" according to the use times from high to low.

As described in further detail below with respect to step 103, step 103 includes the following steps, as shown in fig. 2.

And step 1031, classifying the training set D according to the fact that whether the equipment using crowd changes or not, and calculating the experience entropy of the training set D.

The process of calculating the experience entropy of the training set D according to the classification of the training set D according to the fact that whether the equipment use crowd changes is as follows:

wherein, contract 0log₂0＝0。

And step 1032, sequentially calculating the empirical condition entropy and information gain of each feature in the feature set A to the training set D based on the ID3 algorithm. The ID3 classification decision tree has the characteristics of strong readability and high classification speed. And (3) expanding rapid group screening of a large amount of data by adopting an ID3 classification decision tree, carrying out clustering processing on the screened group, and judging whether the crowd attributes of equipment users change or not by using the frequencies of different service platforms used by groups with different attributes.

The process of sequentially calculating the empirical condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm is as follows:

in step 10321, each feature in the feature set A is labeled as A₁、A₂、…、A_mAnd m is an integer greater than or equal to 1, and is used for identifying various service platforms applied to the equipment.

wherein i is more than or equal to 1 and less than or equal to n, K is more than or equal to 1 and less than or equal to K, i, n, K and K are integers, | D_iL is the subset of samples D_iThe number of samples contained in, | D | is the number of samples contained in the training set D, | D_ikL is the subset of samples D_iClass k contains the number of samples.

Step 10323, calculate feature A₁The information gain of (1):

g(D,A₁)＝H(D)-H(D|A₁)。

Step 1033, because the features with large information gain have stronger classification ability, the features with the largest information gain are selected as the root node features of the training set D, and are divided into a plurality of subsets.

The following example illustrates the training set D shown in table 1.

TABLE 1

The training set D has 15 samples, and whether the crowd changes "classifies training set D according to" equipment use "the value is" yes "has 9 samples, and the value is" no "has 6 samples, and training set D's experience entropy does:

the confidence gain for each feature on the data set D is then calculated.

Feature set A is set with A₁、A₂、A₃、A₄Platform 1, platform 2, platform 3 and platform 4 are shown separately.

"platform 1" A₁The values of (a) are occasionally, frequently, with access, without access, if the training set D is divided by using the feature, 4 sample subsets can be obtained, which are respectively recorded as: d₁(platform 1 ═ occasionally), D₂(platform 1 ═ frequently), D₃(platform 1 with access), D₄(platform 1 ═ no access).

As shown in Table 1, D₁Contains 5 samples, wherein the proportion of 'whether the equipment use population changes' taking the value of 'yes' is

The proportion of 'whether the equipment use population changes' with the value of 'no' is

D₂Contains 5 samples, wherein the proportion of 'whether the equipment use population changes' taking the value of 'yes' is

"whether the population of the device is changedThe ratio of "taking value as" no "is

D₃Contains sample 0; d₄Contains 5 samples, wherein the proportion of 'whether the equipment use population changes' valued as 'yes' is

The empirical entropy of its three branch points is then:

H(D₃)＝0

characteristic A₁Empirical conditional entropy for training set D:

characteristic A₁The information gain of (1):

g(D,A₁)＝H(D)-H(D|A₁)＝0.971-0.888＝0.083

similarly, the calculation can be:

characteristic A₂Information gain g (D, A)₂)＝0.324

Characteristic A₃Information gain g (D, A)₃)＝0.420

Characteristic A₄Information gain g (D, A)₄)＝0.363

Comparing the information gain values of the features to obtain feature A₃Has the largest value of information gain, so that the feature A can be selected₃As the optimal feature and the root node feature, and dividing the optimal feature and the root node feature into two subsets D₁And D₂For D₁There is only one type of sample point, so it is a leaf node, pair D₂Then it needs to be from a₁、A₂、A₄To select a new feature. The information gain for each feature is calculated as follows:

g(D₂,A₁)＝0.251

g(D₂,A₂)＝0.918

g(D₂,A₄)＝0.474

the feature A is known at this time₂Has the largest information gain, so that the feature A is selected₂As the characteristics of the intermediate node of the next layer, two sub-nodes are led out, one corresponds to the sub-node of 'yes', the other corresponds to the sub-node of 'no', and respective samples in the two nodes belong to the same class and therefore both belong to leaf nodes.

By analogy, a complete decision tree can be constructed, the use condition of the equipment is clear at a glance, and whether the equipment is frequently used or not and whether the behavior of replacing and using the equipment without permission can be easily obtained. On the basis, relevant maintenance and adjustment can be carried out, for example, function adjustment can be carried out on relevant equipment, and idle of part of equipment is avoided; and carrying out function pairing on related equipment and operators, and modifying the authority of the equipment and the personnel.

The invention constructs a decision tree, and can find whether the equipment is frequently used, abnormal equipment use, the behavior of replacing and using the equipment without permission and the like in time through equipment characteristic identification calculation, thereby improving the information safety and accuracy, and solving the problems of equipment idling, equipment replacing users without permission and the like.

Based on the above method, as shown in fig. 2, the present invention further provides a human-computer relationship verification apparatus based on the device usage, which includes:

the data extraction module 201 is configured to obtain an equipment use condition database, and extract a training set D from the equipment use condition database;

the feature extraction module 202 is configured to extract a feature set a of a training set D, where the feature set a includes features used for determining a usage situation of the device;

the information gain calculation module 203 is used for calculating the empirical condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm so as to select a proper root node and a proper intermediate node;

a decision tree construction module 204, configured to construct a decision tree according to the selected root node and the intermediate node;

As shown in fig. 4, the information gain calculating module includes:

the experience entropy calculation submodule 231 is used for classifying the training set D according to the fact that whether the equipment using crowd changes or not, and calculating the experience entropy of the training set D;

the information gain calculation submodule 232 sequentially calculates the experience condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm;

the root node selection submodule 233 selects the feature with the largest information gain as the root node feature of the training set D, and divides the feature into a plurality of subsets;

the leaf node selecting sub-module 234 calculates the empirical condition entropy and the information gain of the remaining features in the feature set a for each subset, and selects a leaf node.

Each feature in the feature set a corresponds to and represents the frequency of use of each service platform on the device, and the value of each feature includes four kinds: frequent, occasional, with access, without access.

And when the data extraction module acquires the device use condition database, only the device with the average weekly visit number exceeding 1 is selected.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. The man-machine relationship verification method based on the equipment use condition is characterized by comprising the following steps of:

2. The human-computer relationship verification method based on the device use condition as claimed in claim 1, wherein the step 103 comprises:

3. The human-computer relationship verification method based on the device use condition according to claim 2, wherein the training set D is classified according to whether the device use population changes, and the process of calculating the experience entropy of the training set D is as follows:

4. the human-computer relationship verification method based on the device use condition as claimed in claim 3, wherein the process of sequentially calculating the empirical condition entropy and the information gain of each feature in the feature set A to the training set D based on the ID3 algorithm is as follows:

step 10322, feature set A₁Dividing the training set D into n subsets D_i＝[D₁,D₂,D₃,…,D_n]N is an integer greater than or equal to 1, each sub-groupThe set is divided into k types according to the characteristic C, and then the characteristic set A is obtained₁Empirical conditional entropy for training set D:

step 10323, calculate feature A₁The information gain of (1):

g(D,A₁)＝H(D)-H(D|A₁)；

5. The human-computer relationship verification method based on the device usage according to claim 4, wherein each feature in the feature set A corresponds to a frequency degree of usage of each service platform on the device, and a value of each feature includes four types: frequent, occasional, with access, without access.

6. The human-computer relationship verification method based on the device usage according to any one of claims 1 to 5, wherein in the step 101, only the device with the average weekly access frequency exceeding 1 is selected when the device usage database is obtained.

7. Man-machine relationship verifying device based on equipment use condition is characterized by comprising:

8. The device usage-based human-computer relationship verification apparatus according to claim 7, wherein the information gain calculation module includes:

9. The device for human-computer relationship verification based on device usage according to claim 7, wherein each feature in the feature set a corresponds to a frequency degree of usage of each service platform on the device, and a value of each feature includes four types: frequent, occasional, with access, without access.

10. The device usage-based human-computer relationship verification apparatus according to any one of claims 7 to 9, wherein the data extraction module only selects devices with an average weekly access number exceeding 1 when acquiring the device usage database.