CN109657060B

CN109657060B - Safety production accident case pushing method and system

Info

Publication number: CN109657060B
Application number: CN201811571338.7A
Authority: CN
Inventors: 尹继尧; 汪大卫; 陈文刚
Original assignee: Shenzhen Technology Institute of Urban Public Safety Co Ltd
Current assignee: Shenzhen Technology Institute of Urban Public Safety Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2021-01-12
Anticipated expiration: 2038-12-21
Also published as: CN109657060A

Abstract

The invention relates to the technical field of intelligent pushing, in particular to a safe production accident case pushing method and system. The pushing method comprises the following steps: analyzing the data information of the case to respectively acquire attribute text data and numerical data of the case; respectively calculating the overall similarity of each two cases; clustering the multiple cases according to the overall similarity between every two cases to generate multiple case clusters; and pushing at least one case in the case cluster where the target case is located. According to the safety production accident case pushing method and system, data information of cases is divided into attribute text data and numerical data, the similarity of the two types of data between the cases is calculated respectively, then the overall similarity of the two cases is calculated according to the similarity of the two types of data between the cases, and finally clustering is carried out based on the overall similarity between the cases to form case clusters, so that the calculation efficiency and the pushing accuracy are improved.

Description

Safety production accident case pushing method and system

Technical Field

The invention relates to the technical field of intelligent pushing, in particular to a safe production accident case pushing method and system.

Background

In order to effectively assist in handling an emergency and provide reference information of similar cases for analysis and decision-making personnel when the emergency of next safety production occurs, it is necessary to research an intelligent pushing technology of a safety production accident case, and by inputting a concerned target case, a system automatically retrieves the similar cases and pushes the similar cases to provide reference for the decision-making personnel, thereby improving the decision-making capability of an emergency decision-making support system.

The intelligent pushing technology in the prior art generally adopts single similarity calculation, and the application range is limited, however, the data structure in the aspect of safety production is divided into a text data structure and a numerical data structure, and the intelligent pushing method in the prior art cannot meet the requirements.

In view of the above, it is an urgent technical problem in the art to provide a new method and system for pushing accident cases in safety production to overcome the above drawbacks in the prior art.

Disclosure of Invention

The invention aims to provide a method and a system for pushing safety production accident cases, aiming at the defects in the prior art.

The invention provides a safety production accident case pushing method in a first aspect, which comprises the following steps:

analyzing the data information of the safety production accident case to obtain attribute text data and numerical data of the safety production accident case;

respectively calculating a first similarity between the attribute text data of each two cases and a second similarity between the numerical data, and calculating the overall similarity between each two cases according to the first similarity and the second similarity of each two cases;

clustering the multiple cases according to the overall similarity between every two cases to generate multiple case clusters;

receiving a target case, acquiring a case cluster where the target case is located, and selecting at least one case in the case cluster where the target case is located to push.

Preferably, the calculating of the first similarity between the attribute text data of each two cases includes:

case G is respectively calculated based on WordNet semantic concept tree distance_iAnd case G_jSimilarity simA of each item of attribute text_k(G_i,G_j)，

Wherein depth (G)_i,k) Case G_iDepth (G) of the Kth item of attribute text in the semantic concept tree_j，k) Case G_jDepth of the Kth item of attribute text in the semantic concept tree, depth (lso (G)_i,k,G_j,k) Case G_iThe Kth item of attribute text and case G_jThe depth of the latest common original data of the Kth attribute text in the semantic concept tree is 1,2, …, n, and n is the number of items of the attribute text in the attribute text data;

according to case G_iAnd case G_jSimilarity simA of each item of attribute text_k(G_i，G_j) And each item of attribute text is weighted by a coefficient w in the similarity of the overall attribute text data_kCalculation case G_iAnd case G_jOf the first similarity of (a), wherein,

k is 1,2, …, n, n is the number of items of attribute text in the attribute text data.

Preferably, the calculating of the second similarity between the numerical data of each two cases includes:

case G is calculated based on European space distance_iAnd case G_jSecond similarity simC (G) of the numerical data of (2)_i,G_j)，

Wherein the content of the first and second substances,

case G_iThe value of the y-th item of numerical data,

case G_jNumerical value of the y-th item numerical type data, w'_yThe weight coefficient of the y-th item numerical data in the overall numerical data similarity is shown, y is 1,2, …, m is shown, and m is the item number of the numerical data.

Preferably, the step of calculating the overall similarity between each two cases according to the first similarity and the second similarity of each two cases includes:

global similarity sim (G)_i,G_j)＝α×simA(G_i,G_j)+β×simC(G_i,G_j)，

Wherein, α and β are respectively the weight of the first similarity and the second similarity in the total similarity.

Preferably, the step of clustering the cases according to the overall similarity between every two cases to generate a plurality of case clusters includes:

s1, randomly selecting an unprocessed case from the case total set, searching all cases with overall similarity smaller than a first threshold value with the unprocessed case in the case total set, establishing a case cluster by taking the unprocessed case as a core when the number of the searched cases is larger than or equal to a second threshold value, and adding the searched cases into a candidate set of the case cluster; when the number of searched cases is less than a second threshold, marking the unprocessed cases as noise;

s2, adding each unprocessed case in the candidate set into the case cluster, searching all cases with overall similarity smaller than a first threshold value from the case total set, and continuing to add the searched cases into the candidate set when the number of the searched cases is larger than or equal to a second threshold value;

s3, repeating the step S2 until no unprocessed case exists in the candidate set;

s4, repeating steps S1 to S3 until there are no unprocessed cases in the total set of cases.

Preferably, the pushing method further comprises:

and acquiring historical safety production accidents, and analyzing the data information of the historical safety production accident cases to respectively acquire attribute text data and numerical data of the historical safety production accident cases so as to establish a case library.

Preferably, the attribute text data comprises an accident type, an accident description, an accident reason, an accident site and a tracing case; the numerical data includes story time, accident loss and penalty conditions.

The invention provides a safety production accident case pushing system in a second aspect, which comprises:

the case analysis module is used for analyzing the data information of the safety production accident case to obtain attribute text data and numerical data of the safety production accident case;

the similarity calculation module is used for calculating first similarity between attribute text data of every two cases and second similarity between numerical data, and calculating total similarity between every two cases according to the first similarity and the second similarity of every two cases;

the clustering module is used for clustering the multiple cases according to the overall similarity between every two cases so as to generate multiple case clusters;

the pushing module is used for receiving the target cases, acquiring the case clusters where the target cases are located, and selecting at least one case in the case clusters where the target cases are located to push.

Preferably, the push system further comprises:

and the case library is used for storing attribute text data and numerical data of the historical safety production accident case.

Preferably, the push system further comprises:

and the interaction module is used for receiving the target case query information input by the user.

According to the safety production accident case pushing method and system, data information of cases is divided into attribute text data and numerical data, the similarity of the two types of data between the cases is calculated respectively, then the overall similarity of the two cases is calculated according to the similarity of the two types of data between the cases, and finally clustering is carried out based on the overall similarity between the cases to form case clusters, so that the calculation efficiency and the pushing accuracy are improved.

Drawings

Fig. 1 is a flowchart of a method for pushing an accident case in a safety production according to a first embodiment of the present invention.

Fig. 2 is a schematic diagram of the distance principle of the semantic concept tree in the push method according to the preferred embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating the principle of clustering candidate sets of DBSCAN in the push method according to the preferred embodiment of the present invention.

Fig. 4 is a flowchart of a method for pushing an accident case in a safety production according to a second embodiment of the present invention.

Fig. 5 is a block diagram of a software structure of the safety production accident case pushing system according to the first embodiment of the present invention.

Fig. 6 is a block diagram of a software structure of a safety production accident case pushing system according to a second embodiment of the present invention.

Fig. 7 is a block diagram of a hardware structure of a safety production accident case pushing system according to a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to make the description of the present disclosure more complete and complete, the following description is given for illustrative purposes with respect to the embodiments and examples of the present invention; it is not intended to be the only form in which the embodiments of the invention may be practiced or utilized. The embodiments are intended to cover the features of the various embodiments as well as the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and step sequences.

The safety production accident case pushing method provided by the embodiment of the invention divides the data information of the safety production accident case into the attribute text data (semantic class) and the numerical data (numerical class) based on the data structure characteristics of the safety production accident, and specifically, the attribute text data can include but is not limited to the following: accident type, accident description, accident cause, accident location and accountability situation; numerical data may include, but is not limited to, the following: story time, accident losses and penalty situations. When the pushing method provided by the embodiment of the invention is used for calculating the similarity between two cases, the similarity of two types of data between the cases is calculated respectively, and then the overall similarity of the two cases is calculated according to the similarity of the two types of data between the cases. And finally, clustering is carried out based on the overall similarity between the cases to form a case cluster, and at least one case in the case cluster where the target case is located is used as a similar case of the target case to be pushed.

Fig. 1 is a method for pushing an accident case in safety production according to an embodiment of the present invention, please refer to fig. 1, the method includes:

s101, analyzing the data information of the safety production accident case to obtain attribute text data and numerical data of the safety production accident case.

S102, respectively calculating a first similarity between the attribute text data of each two cases and a second similarity between the numerical data, and calculating the overall similarity between each two cases according to the first similarity and the second similarity of each two cases.

And S103, clustering the multiple cases according to the overall similarity between every two cases to generate multiple case clusters.

And S104, receiving the target cases, acquiring the case cluster where the target cases are located, and selecting at least one case in the case cluster where the target cases are located to push.

In step S101, the safety production accident case includes a target case and a case to be pushed except the target case, and data information of all cases is analyzed to divide the data information of the cases into attribute text data and numerical data. Wherein the attribute text data includes a plurality of items of attribute texts, for example, including item 1 attribute text, item 2 attribute text, …, item n attribute text; the numerical data includes a plurality of items of data, for example, including item 1 numerical data, item 2 numerical data, …, item m numerical data; n and m are both natural numbers.

In step S102, two cases G are considered based on semantic characteristics of the attribute text data_iAnd G_jThe kth item of attribute text G_i，kAnd G_j，kWhen the similarity of the two cases is in the similarity, the similarity simA is calculated according to the distance of the attribute text of the two cases in the WordNet semantic concept tree_k(G_i，G_j) Then, the similarity of each item of attribute text is weighted and averaged to calculate a first similarity simA (G)_i，G_j)。

In particular toFirst, case G is calculated based on the WordNet semantic concept tree distance according to the following formula_iAnd case G_jSimilarity simA of each item of attribute text_k(G_i,G_j)，

Wherein depth (G)_i,k) Case G_iDepth (G) of the Kth item of attribute text in the semantic concept tree_j,k) Case G_jDepth of the Kth item of attribute text in the semantic concept tree, depth (lso (G)_i,k，G_j，k) Case G_iThe Kth item of attribute text and case G_jK is 1,2, …, n, n is the number of items of attribute text in the attribute text data.

Specifically, referring to FIG. 2, in FIG. 2, G_i,kAnd G_j,kAre all 4, i.e., depth (G)_i,k) Is 4, depth (G)_j,k) 4, point a is shown in fig. 2 as the most recent common raw data, and point a has a depth of 3, that is, lso (G)_i，k,G_j,k) Is 3.

Then, according to case G_iAnd case G_jSimilarity sjmA of each attribute text_k(G_i,G_j) And each item of attribute text is weighted by a coefficient w in the similarity of the overall attribute text data_kCalculation case G_iAnd case G_jOf the first similarity of (a), wherein,

In step S102, case G is calculated based on Euclidean distance based on the characteristics of numerical data_iAnd case G_jSecond similarity simC (G) of the numerical data of (2)_i，G_j)，

Wherein the content of the first and second substances,

case G_iThe value of the y-th item of numerical data,

Finally, the overall similarity is calculated according to the first similarity and the second similarity, and the overall similarity sim (G) is calculated_i，G_j)＝α×simA(G_i，G_j)+β×simC(G_i，G_j) Wherein, α and β are respectively the weight of the first similarity and the second similarity in the total similarity.

In step S103, all cases are clustered using a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) Clustering algorithm, which is a Density-Based Spatial Clustering algorithm that divides areas with sufficient Density into clusters and finds arbitrarily shaped clusters in a Spatial database with Noise, defining clusters as the maximum set of Density-connected points. Specifically, to classify or flag all cases as noise, in this specification, an unprocessed case means that the case is neither classified into any one case cluster nor flagged as noise.

First, all cases are classified into a total case set Z, and initially, all cases in the total case set Z are unprocessed cases, and a search radius e and a minimum number minPts are determined, for example, e is a first threshold, and minPts is a second threshold.

Then, randomly selecting initial cases from the total case set Z, carrying out e-neighborhood density test on the initial cases, searching all cases with overall similarity smaller than a first threshold value with the initial cases, establishing a case cluster C1 by taking the initial cases as a core when the number of the searched cases is larger than or equal to a second threshold value, and adding the searched cases into a candidate set N1; when the number of searched cases is less than a second threshold, the initial cases are marked as noise. That is, if the similar case density of the initial case is large enough, a case cluster can be established using the same as the core case, and if the similar case density of the initial case is not large enough, the same is considered as noise.

Then, for each unprocessed case in the candidate set N1, the above-mentioned e-neighborhood density test is performed, the selected case is added into the case cluster C1, then all cases whose overall similarity to the case is smaller than the first threshold are continuously searched in the total case set Z, when the number of searched cases is greater than or equal to the second threshold, the searched cases are continuously added into the candidate set N1, at this time, the candidate set N1 is continuously expanded, as shown in fig. 3. This step is repeated until there are no unprocessed cases in the candidate set N1, at which point the case cluster C1 is established.

Then, randomly selecting a second unprocessed case from the total case set Z, continuing the steps, searching all cases with the overall similarity smaller than the first threshold value in the total case set Z, establishing a case cluster C2 by taking the case as a core when the number of the searched cases is larger than or equal to the second threshold value, and adding the searched cases into the candidate set N2 of the case cluster C2; when the number of searched cases is less than a second threshold, the case is marked as noise.

For each unprocessed case in the candidate set N2, adding the case into the case cluster C2, searching all cases with overall similarity smaller than a first threshold value in the total case set Z, and when the number of searched cases is larger than or equal to a second threshold value, continuing to add the searched cases into the candidate set N2; this step is repeated until there are no unprocessed cases in the candidate set N2.

And repeating the steps until no unprocessed case exists in the case total set Z.

In step S104, according to the clustering result in step S103, the other cases in the case cluster where the target case is located are similar cases of the target case, and can be pushed.

Fig. 4 is a method for pushing incident cases in safety production according to an embodiment of the present invention, please refer to fig. 2, and the method includes:

s201, acquiring historical safety production accidents, analyzing data information of the historical safety production accident cases to acquire attribute text data and numerical data of the historical safety production accident cases so as to establish a case library.

S202, receiving a target case or target case query information input by a user.

S203, respectively calculating a first similarity between the attribute text data of each two cases and a second similarity between the numerical data, and calculating the overall similarity between each two cases according to the first similarity and the second similarity of each two cases.

And S204, clustering the multiple cases according to the overall similarity between every two cases to generate multiple case clusters.

S205, at least one case in the case cluster where the target case is located is pushed.

In step S201, the data information of the historical safety production accident case is analyzed, and the acquired attribute text data and numerical data of the historical safety production accident case are stored in a case library to accelerate the calculation speed.

In step S202, target case query information input by a user is received, and when the target case is stored in the case base, the subsequent similarity calculation and clustering steps are directly performed. When the target case is not stored in the case base, the user inputs the information of the target case, or the information of the target case is automatically searched from the network, and then the information of the target case is analyzed to respectively obtain the attribute text data and the numerical data of the target case and store the attribute text data and the numerical data in the case base.

In step S203, the overall similarity between every two cases in the case base is calculated, and in step S204, clustering is performed, and please refer to the relevant description of the first embodiment for similarity calculation and clustering processing, which is not described in detail herein.

Based on the same inventive concept, the embodiment of the invention also provides a safety production accident case pushing system, which is as the following embodiment. The principle of solving the problems of the safety production accident case pushing system is similar to that of the safety production accident case pushing method, so the implementation of the safety production accident case pushing system can be referred to the implementation of the safety production accident case pushing method, and repeated parts are not described again. As used hereinafter, the terms "unit" or "sub-module" or "module" may implement a combination of software and/or hardware of predetermined functions. Although the functional modules of the mobile terminal described in the following embodiments are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 5 is a functional module schematic diagram of a safety production accident case pushing system according to a first embodiment of the present invention. The safety production accident case pushing system 100 of the present embodiment includes: the system comprises a case analysis module 10, a similarity calculation module 20, a clustering module 30 and a pushing module 40, wherein the case analysis module 10 is used for analyzing data information of safety production accident cases to obtain attribute text data and numerical data of the safety production accident cases; the similarity calculation module 20 is configured to calculate a first similarity between the attribute text data of each two cases and a second similarity between the numerical data, and calculate an overall similarity between each two cases according to the first similarity and the second similarity of each two cases; the clustering module 30 is configured to cluster the multiple cases according to the overall similarity between every two cases to generate multiple case clusters; the pushing module 40 is configured to receive the target cases, obtain the case cluster where the target cases are located, and select at least one case in the case cluster where the target cases are located to push.

Fig. 6 is a functional module schematic diagram of a safety production accident case pushing system according to a second embodiment of the present invention. This embodiment further includes, on the basis of the first embodiment shown in fig. 5: the system comprises a case base 50 and an interaction module 60, wherein the case base 50 is used for storing attribute text data and numerical data of historical safety production accident cases; the interaction module 60 is configured to receive target case query information input by a user.

Fig. 7 is a schematic diagram of hardware modules of a safety production accident case pushing system according to a third embodiment of the present invention. The system 100 may include: a processor 1001, such as a CPU, a communication bus 1002, and a memory 1003.

Wherein a communication bus 1002 is used to enable connective communication between these components. The memory 1003 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 1003 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the terminal structure shown in fig. 7 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 7, the memory 1003, which is a kind of computer storage medium, may include therein an operating system and a secure production incident case push processing program. The processor 1001 may be configured to call the secure production accident case push processing program stored in the memory 1003, and execute an operation step in the secure production accident case push method.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A safety production accident case pushing method is characterized by comprising the following steps:

receiving a target case, acquiring a case cluster where the target case is located, and selecting at least one case in the case cluster where the target case is located to push;

wherein the calculating of the first similarity between the attribute text data of each two cases comprises:

case G is respectively calculated based on WordNet semantic concept tree distance_iAnd case G_jSimilarity simA of each item of attribute text_k(G_i，G_j)，

Wherein depth (G)_i，k) Case G_iDepth (G) of the Kth item of attribute text in the semantic concept tree_j，k) Case G_jDepth of the Kth item of attribute text in the semantic concept tree, depth (lso (G)_i，k，G_j，k) Case G_iThe Kth item of attribute text and case G_jThe depth of the latest common original data of the Kth attribute text in the semantic concept tree is 1,2, …, n, and n is the number of items of the attribute text in the attribute text data;

2. The safety production accident case propelling method of claim 1, wherein the calculating step of the second similarity between the numerical data of every two cases comprises:

case G is calculated based on European space distance_iAnd case G_jSecond similarity simC (G) of the numerical data of (2)_i，G_j)，

Wherein the content of the first and second substances,

case G_iThe value of the y-th item of numerical data,

3. The safety production accident case propelling method of claim 2, wherein the step of calculating the overall similarity between each two cases according to the first similarity and the second similarity of each two cases comprises:

global similarity sim (G)_i，G_j)＝α×simA(G_i，G_j)+β×simC(G_i，G_j)，

4. The safety production accident case propelling method of claim 1, wherein the step of clustering a plurality of cases according to the overall similarity between every two cases to generate a plurality of case clusters comprises:

5. The safety production accident-case pushing method of claim 1, further comprising:

6. The safety production accident case pushing method according to claim 1, wherein the attribute text data includes an accident type, an accident description, an accident reason, an accident location and a tracing case; the numerical data includes story time, accident loss and penalty conditions.

7. A safety production accident case pushing system is characterized by comprising:

the pushing module is used for receiving the target cases, acquiring the case clusters where the target cases are located, and selecting at least one case in the case clusters where the target cases are located to push;

8. The safety production accident case pushing system of claim 7, further comprising:

9. The safety production accident case pushing system of claim 8, further comprising: