CN113221556A - Method, device and equipment for identifying potential safety hazard - Google Patents

Method, device and equipment for identifying potential safety hazard Download PDF

Info

Publication number
CN113221556A
CN113221556A CN202110495872.XA CN202110495872A CN113221556A CN 113221556 A CN113221556 A CN 113221556A CN 202110495872 A CN202110495872 A CN 202110495872A CN 113221556 A CN113221556 A CN 113221556A
Authority
CN
China
Prior art keywords
potential safety
hazard
text
safety hazard
production
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110495872.XA
Other languages
Chinese (zh)
Inventor
康庆
王恒俭
周国龙
彭道发
万鹏
杨蒙威
王越越
张升
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Bowo Wisdom Technology Co ltd
Original Assignee
Shenzhen Bowo Wisdom Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Bowo Wisdom Technology Co ltd filed Critical Shenzhen Bowo Wisdom Technology Co ltd
Priority to CN202110495872.XA priority Critical patent/CN113221556A/en
Publication of CN113221556A publication Critical patent/CN113221556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method, a device and equipment for identifying potential safety production hazards comprise the steps of obtaining a potential safety production hazard text to be identified; performing word segmentation processing on the potential safety production hazard text and extracting text characteristic data; inputting text characteristic data into a pre-trained potential safety production hazard identification model, identifying the potential safety production hazard text to obtain category information of potential safety hazard objects and potential safety hazard behavior states, and training the potential safety production hazard identification model on the basis of a plurality of training samples marked with the categories of the potential safety hazard objects and the potential safety hazard behavior states; and further determining the content of the potential safety hazard. Effective information of the potential safety production hazard text can be obtained through the text characteristic data; through the potential safety hazard identification model, the category information of the potential safety hazard objects and the potential safety hazard behavior states can be accurately obtained, and the potential safety hazard contents can be directly displayed through the combination form of the potential safety hazard objects and the potential safety hazard behavior states, so that the identification accuracy of the potential safety hazard contents is improved.

Description

Method, device and equipment for identifying potential safety hazard
Technical Field
The invention relates to the technical field of machine learning, in particular to a method, a device and equipment for identifying potential safety production hazards.
Background
The safety production is the basis of social and economic development, and the state can ensure that each enterprise carries out stable and orderly safety production through a series of effective measures. However, in the production and operation activities of many enterprises, there are safety production problems such as dangerous states, unsafe behaviors of employees, and defects in management, which may cause safety accidents, and the supervision department needs to check the problems, wherein the checking projects of the supervision department are many and the checking emphasis of the supervision departments in different industries is different. When the potential safety production hazards are reported, situations that different enterprises have different input specifications, the same hidden danger content description mode is different, one text contains multiple types of hidden dangers, staff have different input habits, a large amount of phrases are not standard and the like exist, and therefore a supervision department has great difficulty in analyzing the trend of the hidden dangers, the concentrated area of the hidden dangers and classification of hidden danger objects.
At present, when hidden dangers are analyzed, texts containing potential safety production dangers can be decomposed into phrases through a word segmentation technology, and then phrase results are displayed in an aggregation mode. However, because of the content of hidden dangers and many professional terms, the number of invalid phrases in the word segmentation result is too many, such as the existence of invalid phrases like 'organization inspection', 'law enforcement officers' and the like, and the phrases can not visually display the problems of hidden dangers, such as the existence of phrases like 'fire extinguishers', 'employees', 'device platforms' and the like. Therefore, the method still has some problems in the aspects of judging the type of the hidden danger, reflecting the problems in a concentrated manner, generating potential rules of the hidden danger and the like, and cannot accurately identify the content of the hidden danger in the safety production.
Disclosure of Invention
The embodiment of the invention provides a method, a device and equipment for identifying potential safety hazard, which are used for improving the identification accuracy of the content of the potential safety hazard.
According to a first aspect, an embodiment provides a method for identifying a potential safety hazard, the method comprising:
acquiring a potential safety hazard text to be identified;
performing word segmentation processing on the potential safety production hazard text to be identified, and extracting text characteristic data of the potential safety production hazard text to be identified;
inputting the text characteristic data of the potential safety production hazard text to be identified into a pre-trained potential safety production hazard identification model, and carrying out classification and identification on the potential safety production hazard text to be identified to obtain category information of potential safety hazard objects and potential safety hazard behavior states, wherein the potential safety production hazard identification model is obtained by training on the basis of a plurality of training samples marked with the categories of the potential safety hazard objects and the potential safety hazard behavior states;
and determining the content of the potential safety hazard in production according to the category information of the potential safety hazard object and the behavior state of the potential safety hazard. .
Optionally, the text of the potential safety hazard to be identified includes: the method comprises the following steps of enterprise name, enterprise address, enterprise region, enterprise industry type, potential safety production hazard discovering date, potential safety production hazard accepting unit, potential safety production hazard level and potential safety hazard description text.
Optionally, the safety production hidden danger identification model is obtained by training based on a support vector machine classification algorithm model, a convolutional neural network classification algorithm model or a random forest algorithm model.
Optionally, the text feature data is one of the following: word frequency-inverse text frequency index TF-IDF, word vector, sentence vector.
Optionally, the method further includes:
and according to the plurality of potential safety production hazard texts, carrying out statistical analysis on the potential safety hazard information, and respectively extracting potential safety hazard objects and category information of the potential safety hazard behavior state corresponding to each aspect of people, machines, materials, management and environment.
Optionally, the categories of the hidden danger objects and the hidden danger behavior states in the training samples are labeled based on the extracted category information.
Optionally, the method further includes:
and determining hidden danger rectification information of enterprises with hidden danger of safety production according to the hidden danger content of safety production.
According to a second aspect, an embodiment provides an identification apparatus for a potential safety hazard, the apparatus comprising:
the first acquisition module is used for acquiring a potential safety hazard text to be identified;
the second acquisition module is used for performing word segmentation processing on the potential safety production hazard text to be identified and extracting text characteristic data of the potential safety production hazard text to be identified;
the identification module is used for inputting the text characteristic data of the potential safety production hazard text to be identified into a pre-trained potential safety production hazard identification model, classifying and identifying the potential safety production hazard text to be identified to obtain category information of a potential hazard object and a potential hazard behavior state, wherein the potential safety production hazard identification model is obtained by training based on a plurality of training samples marked with categories of the potential hazard object and the potential hazard behavior state;
and the first determining module is used for determining the content of the potential safety hazard in production according to the category information of the potential safety hazard object and the potential safety hazard behavior state.
Optionally, the text of the potential safety hazard to be identified includes: the method comprises the following steps of enterprise name, enterprise address, enterprise region, enterprise industry type, potential safety production hazard discovering date, potential safety production hazard accepting unit, potential safety production hazard level and potential safety hazard description text.
Optionally, the safety production hidden danger identification model is obtained by training based on a support vector machine classification algorithm model, a convolutional neural network classification algorithm model or a random forest algorithm model.
Optionally, the text feature data is one of the following: word frequency-inverse text frequency index TF-IDF, word vector, sentence vector.
Optionally, the apparatus further comprises:
and the third acquisition module is used for carrying out statistical analysis on the potential safety hazard information according to the plurality of potential safety production hazard texts and respectively extracting the category information of the potential safety hazard objects and the potential safety hazard behavior states corresponding to each aspect of people, machines, materials, management and environment.
Optionally, the categories of the hidden danger objects and the hidden danger behavior states in the training samples are labeled based on the extracted category information.
Optionally, the apparatus further comprises:
and the second determining module is used for determining hidden danger rectification information of an enterprise with hidden danger of safety production according to the hidden danger content of safety production.
According to a third aspect, there is provided in one embodiment an electronic device comprising: a memory for storing a program; a processor, configured to execute the program stored in the memory to implement the method for identifying a potential safety hazard according to any one of the first aspect.
According to a fourth aspect, an embodiment provides a computer-readable storage medium, on which a program is stored, the program being executable by a processor to implement the method for identifying a potential safety hazard of the first aspect.
The embodiment of the invention provides a method, a device and equipment for identifying potential safety hazard, which comprises the following steps: acquiring a potential safety hazard text to be identified; performing word segmentation processing on the potential safety production hazard text to be identified, and extracting text characteristic data of the potential safety production hazard text to be identified; inputting text characteristic data of a potential safety production hazard text to be recognized into a pre-trained potential safety production hazard recognition model, and classifying and recognizing the potential safety production hazard text to be recognized to obtain category information of a potential safety hazard object and a potential safety hazard behavior state, wherein the potential safety production hazard recognition model is obtained by training on the basis of a plurality of training samples marked with categories of the potential safety hazard object and the potential safety hazard behavior state; and determining the content of the potential safety hazard in production according to the category information of the potential safety hazard object and the behavior state of the potential safety hazard. By extracting the text characteristic data of the potential safety production hazard text to be identified, effective information of the potential safety production hazard text to be identified can be obtained; the potential safety production hazard identification model is obtained by training based on a plurality of training samples marked with the categories of the potential safety production hazard objects and the potential safety hazard behavior states, so that the potential safety production hazard texts to be identified are classified and identified through the potential safety production hazard identification model, the category information of the potential safety production hazard objects and the potential safety hazard behavior states can be accurately obtained, the potential safety production hazards existing in the potential safety production hazard objects can be directly displayed through the combination form of the category information of the potential safety production hazard objects and the category information of the potential safety hazard behavior states, and the identification accuracy of the potential safety production hazard contents is improved.
Drawings
Fig. 1 is a schematic flowchart of a first embodiment of a method for identifying a potential safety hazard according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a second embodiment of a method for identifying a potential safety hazard according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a third embodiment of a method for identifying a potential safety hazard according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an identification apparatus for potential safety hazard according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).
At present, when hidden dangers are analyzed, texts containing potential safety production dangers can be decomposed into phrases through a word segmentation technology, and then phrase results are displayed in an aggregation mode. However, because of the content of hidden dangers and many professional terms, the number of invalid phrases in the word segmentation result is too many, such as the existence of invalid phrases like 'organization inspection', 'law enforcement officers' and the like, and the phrases can not visually display the problems of hidden dangers, such as the existence of phrases like 'fire extinguishers', 'employees', 'device platforms' and the like. Therefore, the method still has some problems in the aspects of judging the type of the hidden danger, reflecting the problems in a concentrated manner, generating potential rules of the hidden danger and the like, and cannot accurately identify the content of the hidden danger in the safety production. In order to improve the identification accuracy of the content of the potential safety hazard, embodiments of the present invention provide a method, an apparatus, and a device for identifying the potential safety hazard, which are described in detail below.
Fig. 1 is a schematic flowchart of a first embodiment of a method for identifying a potential safety hazard according to an embodiment of the present invention, where an execution subject of the embodiment of the present invention is any device with processing capability. As shown in fig. 1, the method for identifying a potential safety hazard provided by this embodiment may include:
and S101, acquiring a potential safety hazard text to be identified.
In specific implementation, the text of the potential safety hazard to be identified may include: the method comprises the steps of obtaining a potential safety hazard description text, wherein the potential safety hazard description text is a text mainly describing potential safety hazard contents, such as ' the number of fire extinguishers in a workshop is not enough, the potential safety hazard exists, ' the number of fire extinguishers in an office is not enough ', and the like.
And S102, performing word segmentation on the potential safety production hazard text to be identified, and extracting text characteristic data of the potential safety production hazard text to be identified.
During specific implementation, the method can perform word segmentation processing on the potential safety production hazard text to be identified through the existing word segmentation processing algorithm. For example, the safety production hidden trouble text is that "the number of fire extinguishers in a workshop is not enough, and there is a safety hidden trouble", and the result obtained after the word segmentation processing is that "the | number | of fire extinguishers | in the workshop | is not enough |, | there is | safety | hidden trouble".
Optionally, the extracted text feature data of the text with the potential safety hazard to be identified may be one of the following items: TF-IDF (Term Frequency-Inverse text Frequency index), word vector and sentence vector.
S103, inputting the text characteristic data of the potential safety hazard text to be recognized into a pre-trained potential safety hazard recognition model, and classifying and recognizing the potential safety hazard text to be recognized to obtain the category information of the potential safety hazard object and the behavior state of the potential safety hazard.
The safety production hidden danger identification model can be obtained by training based on a plurality of training samples marked with hidden danger objects and hidden danger behavior states. Optionally, the safety production hidden danger identification model may be obtained by training based on a support vector machine classification algorithm model, a convolutional neural network classification algorithm model, or a random forest algorithm model.
Based on the service and the hidden danger content, the hidden danger content is split into a hidden danger object and a hidden danger behavior state, if staff and illegal charging and fire extinguisher and insufficient equipment are adopted, the hidden danger object and the hidden danger behavior state of the hidden danger are respectively constructed from five aspects of people, machines, materials, management and environment, a complete label system of the hidden danger object and the hidden danger behavior state is formed, and a plurality of hidden danger labels are combined, so that the problems of the hidden danger object can be intuitively obtained. During specific implementation, the potential safety hazard information can be subjected to statistical analysis according to a plurality of potential safety production hazard texts, and potential safety hazard objects and category information of potential safety hazard behavior states corresponding to each aspect of people, machines, materials, management and environment are respectively extracted.
For example, for the hidden danger factor of "person", the statistics of the hidden danger object corresponding to the hidden danger factor of "person" may include: the hidden danger behavior states corresponding to the hidden danger factors of common staff, drivers, security personnel and management personnel can include: insufficient skill, operation in violation of regulations, no certificate on duty and the like; the hidden behavior state corresponding to the hidden factor of "machine" may include: blocked, facility blocked, out of specification setup, design not meeting industry standards, insufficient equipment, unscheduled inspections, etc.; the hidden danger behavior state corresponding to the hidden danger factor of the material can comprise: the transportation is not standard, the storage is not carried out according to the regulations, no anti-toppling measures, no protective measures, insufficient safety distance and the like are provided; the hidden behavior state corresponding to the hidden factor of "management" may include: missing, non-normative, non-informed, non-rehearsed, etc.; the hidden behavior state corresponding to the hidden factor of "environment" may include: damage, blockage, unclear layout, and the like.
And the categories of the hidden danger objects and the hidden danger behavior states in the training samples can be labeled based on the extracted category information. For example, the method can organize manual labeling and cross inspection of the screened sample set aiming at hidden danger objects of different hidden danger factors (human, machine, material, management and environment), so as to audit the difficult and complicated hidden danger text labels, ensure the label accuracy of the hidden danger texts, and take the sample set manually labeled with the hidden danger objects and the classes of hidden danger behavior states as training samples of the safety production hidden danger identification model.
And S104, determining the content of the potential safety hazard in production according to the category information of the potential safety hazard objects and the behavior states of the potential safety hazard.
In specific implementation, the output of the safety production hidden danger identification model is assumed to be (x, y), wherein x represents the category of the hidden danger object, and y represents the category of the hidden danger behavior state. x can be taken as: 1. 2, 3, 4 and 5, the hidden danger object is named as 'human' by '1', the hidden danger object is named as 'machine' by '2', the hidden danger object is named as 'material' by '3', the hidden danger object is named as 'management' by '4', and the hidden danger object is named as 'environment' by '5'. Assuming that x takes 1, y can take: a. b, c and d, wherein the hidden danger behavior state is ' no protective articles are worn ' and is not on duty ' and the hidden danger behavior state is ' operation is carried out in violation of regulations ', ' c ' is the hidden danger behavior state is ' the number of personal protective articles is insufficient ' and'd ' is the hidden danger behavior state is ' production work management is not in place '. Assuming that the output of the safety production risk identification model is (1, b), the safety production risk content can be determined as follows: people + do job violations.
The identification method of the potential safety production hazard provided by the embodiment of the invention comprises the steps of obtaining a potential safety production hazard text to be identified; performing word segmentation processing on the potential safety production hazard text to be identified, and extracting text characteristic data of the potential safety production hazard text to be identified; inputting text characteristic data of a potential safety production hazard text to be recognized into a pre-trained potential safety production hazard recognition model, and classifying and recognizing the potential safety production hazard text to be recognized to obtain category information of a potential safety hazard object and a potential safety hazard behavior state, wherein the potential safety production hazard recognition model is obtained by training on the basis of a plurality of training samples marked with categories of the potential safety hazard object and the potential safety hazard behavior state; and determining the content of the potential safety hazard in production according to the category information of the potential safety hazard object and the behavior state of the potential safety hazard. By extracting the text characteristic data of the potential safety production hazard text to be identified, effective information of the potential safety production hazard text to be identified can be obtained; the potential safety production hazard identification model is obtained by training based on a plurality of training samples marked with the categories of the potential safety production hazard objects and the potential safety hazard behavior states, so that the potential safety production hazard texts to be identified are classified and identified through the potential safety production hazard identification model, the category information of the potential safety production hazard objects and the potential safety hazard behavior states can be accurately obtained, the potential safety production hazards existing in the potential safety production hazard objects can be directly displayed through the combination form of the category information of the potential safety production hazard objects and the category information of the potential safety hazard behavior states, and the identification accuracy of the potential safety production hazard contents is improved.
As an implementation manner, according to the region to which the enterprise belongs, which is included in the potential safety production hazard text, the potential safety hazard contents of all enterprises with potential safety production hazards in the same region can be obtained through statistics; or, the hidden danger contents of all enterprises with potential safety production hazards in the same industry type can be obtained through statistics according to the industry type of the enterprise included in the potential safety production hazard text; or, the hidden danger contents of all enterprises with hidden safety hazards on the same discovery date can be obtained through statistics according to the discovery date of the hidden safety hazards included in the text of the hidden safety hazards. Therefore, the method is beneficial to mining the occurrence rule, the occurrence industry, key point investigation and the like of the potential safety hazard, and can be used for knowing the potential safety hazard distribution, characteristics, trend and the like of the enterprise, so that the situation perception of the potential safety hazard of the enterprise is realized, a supervision department can be assisted to accurately enforce law, and the method is used for specially treating the specific potential safety hazard, daily inspection of the enterprise with the key potential safety hazard and seasonal potential safety hazard inspection.
In the embodiment of the invention, the safety production hidden danger identification model is trained based on the random forest algorithm model for example. Then, the training process of the safety production risk identification model may include the following steps:
step a: the method comprises the steps of obtaining a plurality of training samples, wherein each training sample comprises preset category data and text characteristic data of the hidden danger behavior state of a hidden danger object.
The preset category data can be used for identifying the category of the hidden danger behavior state of the hidden danger object, and the text feature data can be TF-IDF.
Step b: and inputting the training samples into an initial random forest model to obtain a plurality of types of prediction data.
Wherein, the initial random forest model may include a plurality of decision trees.
Step c: and calculating index parameters of the initial random forest model according to the plurality of category prediction data and the plurality of preset category data.
In a specific implementation, the index parameters of the initial random forest model may include: the method comprises the steps of accuracy, precision, recall rate and an F value, wherein the accuracy is the ratio of the number of all samples with correct prediction to the total number of samples, the precision is the ratio of the number of samples in a first category with correct prediction to the number of all samples in the first category with correct prediction, the recall rate is the ratio of the number of samples in the first category with correct prediction to the number of samples in the first category actually, and the F value is the harmonic average value of the accuracy and the recall rate.
Step d: and optimizing the initial random forest model according to the index parameters to obtain a potential safety hazard identification model.
In specific implementation, the optimization ranges of the model parameters can be preset, for example, the value ranges of the number of classifiers, the maximum depth of a decision tree and the minimum sample number of leaf nodes in a random forest model are respectively as follows: 50-200,5-50,1-10. And then optimizing the initial random forest model by using a grid search algorithm, namely traversing all model parameter combinations and selecting the optimal combination by using the index parameters. The safety production hidden danger identification model is obtained by adjusting model parameters such as the number of classifiers in the initial random forest model, the maximum depth of a decision tree, the minimum sample number of leaf nodes and the like.
Specifically, a random forest Classification from function of a sklern library of Python can be called, and a safety production hidden danger identification model is obtained based on random forest algorithm model training. The method comprises the following steps that a random forest algorithm model randomly extracts a part of samples from original samples in a putting-back mode to generate a new sample set, the operation is repeated to generate a plurality of sample sets, and each sample set can generate a decision tree subsequently; during the generation of each decision tree, randomly extracting partial feature participation branches from the decision tree when each node branches, and then recursively extracting the branches, wherein each time during the recursive branching, partial feature participation branches are randomly extracted from the rest features (because the features which have participated in the branches do not appear in the nodes behind the node); and finally, generating a plurality of decision trees, wherein when the class prediction is carried out on the new input samples, each tree generates a prediction result, and finally, the class of the new input samples is determined by a few majority-obeying principles.
Fig. 2 is a schematic flow diagram of a second embodiment of the identification method for potential safety hazard provided in the embodiment of the present invention, and as shown in fig. 2, the identification method for potential safety hazard provided in this embodiment may include:
s201, obtaining a potential safety hazard text to be identified.
S202, performing word segmentation on the potential safety production hazard text to be identified, and extracting text characteristic data of the potential safety production hazard text to be identified.
S203, inputting the text characteristic data of the potential safety hazard text to be recognized into a pre-trained potential safety hazard recognition model, and classifying and recognizing the potential safety hazard text to be recognized to obtain the category information of the potential safety hazard object and the behavior state of the potential safety hazard.
The safety production hidden danger identification model is obtained by training on the basis of a plurality of training samples marked with hidden danger objects and hidden danger behavior states.
And S204, determining the content of the potential safety hazard in production according to the category information of the potential safety hazard objects and the behavior states of the potential safety hazard.
The specific implementation of S201-S204 can refer to the related descriptions of S101-S104 in the first embodiment.
And S205, determining hidden danger rectification information of an enterprise with hidden danger of safety production according to the content of the hidden danger of safety production.
According to the identification method of the potential safety hazard, provided by the embodiment of the invention, the potential safety hazard rectification information of the enterprise with the potential safety hazard is determined according to the content of the potential safety hazard, so that the method can assist a supervision department in accurately enforcing the law, perform special treatment, daily inspection of key potential safety hazard enterprises, seasonal potential safety hazard inspection and the like on specific potential safety hazards, and play a role in guiding safety management practice and assisting intelligent decision making.
The following describes a method for identifying potential safety hazards provided by the embodiment of the present invention by taking a specific implementation manner as an example. Fig. 3 is a schematic flow diagram of a third embodiment of the identification method for potential safety hazard provided in the embodiment of the present invention, and as shown in fig. 3, the identification method for potential safety hazard provided in this embodiment may include:
s301, constructing a label system for classifying the hidden danger texts, and splitting key information.
Based on business and hidden danger content understanding, the hidden danger content is divided into a hidden danger object and a hidden danger behavior state, the hidden danger object and the hidden danger behavior state are respectively constructed from five aspects of people, machines, materials, management and environment, a complete label system of the hidden danger object and the hidden danger behavior state is formed, and a plurality of types of hidden danger labels are combined.
Specifically, algorithms such as word segmentation, part-of-speech tagging, named entity recognition and the like are applied on the basis of a self-built professional word bank, common hidden danger main bodies in hidden danger contents are counted, hidden danger objects are obtained by comprehensive screening of hidden danger texts, and the hidden danger objects are redistributed to five aspects of people, machines, materials, management and environment.
Specifically, based on business knowledge, learning "safety production accident hidden danger scheduling table", "safety production hidden danger classification standard" and the like, and combining with actual hidden danger content description, extracting hidden danger state description label groups from five aspects of human, machine, material, management and environment, for example, the hidden danger behavior state corresponding to the hidden danger factor of "human" may include: insufficient skill, violation of the specified operation, no certificate on duty and the like.
And S302, acquiring a hidden danger sample set labeled manually.
Aiming at hidden danger objects of different hidden danger factors (human, machine, material, management and environment), manual labeling and cross inspection of the screened sample set are organized, difficult and complicated hidden danger text labels are checked, and the label accuracy of the hidden danger text is ensured.
And S303, constructing, training and evaluating a potential safety hazard identification model.
Specifically, the word frequency-inverse text features in the potential safety production hazard text to be identified can be extracted by using a text word segmentation technology to form a feature matrix which can be understood by any equipment with processing capacity, so that the feature matrix can be used for model training. And (4) taking the hidden danger text word frequency-inverse text characteristics and the corresponding labels as model input and output. When the model effect is trained, tested and evaluated, the utilized model parameter optimizing method is a grid searching algorithm, the evaluation model method is a confusion matrix, and the evaluation indexes are accuracy, recall rate, accuracy and F value.
Based on the technical exploration verification result, the number of classifiers in the safety production hidden danger identification model can be determined to be 150, the maximum feature number selected by the decision tree is an evolution integer of the feature number, the maximum depth of the decision tree is 34, and the minimum sample number of the leaf node is 8.
Fig. 4 is a schematic structural diagram of an identification apparatus for a potential safety hazard according to an embodiment of the present invention, and as shown in fig. 4, the identification apparatus 40 for a potential safety hazard may include:
the first obtaining module 410 may be configured to obtain a text of the hidden production safety hazard to be identified.
The second obtaining module 420 may be configured to perform word segmentation on the potential safety hazard text to be identified, and extract text feature data of the potential safety hazard text to be identified.
The identification module 430 may be configured to input text feature data of the potential safety hazard text to be identified into a pre-trained potential safety hazard identification model, and perform classification and identification on the potential safety hazard text to be identified to obtain category information of a potential safety hazard object and a potential safety hazard behavior state, where the potential safety hazard identification model is obtained by training based on a plurality of training samples labeled with categories of the potential safety hazard object and the potential safety hazard behavior state.
The first determining module 440 may be configured to determine the content of the hidden danger in the safety production according to the category information of the hidden danger object and the hidden danger behavior state.
According to the identification device for the potential safety hazard, provided by the embodiment of the invention, a potential safety hazard text to be identified is obtained through the first obtaining module; performing word segmentation processing on the potential safety production hazard text to be identified through a second acquisition module, and extracting text characteristic data of the potential safety production hazard text to be identified; inputting text characteristic data of a potential safety hazard text to be recognized into a pre-trained potential safety hazard recognition model through a recognition module, and classifying and recognizing the potential safety hazard text to be recognized to obtain category information of a potential safety hazard object and a potential safety hazard behavior state, wherein the potential safety hazard recognition model is obtained by training based on a plurality of training samples marked with categories of the potential safety hazard object and the potential safety hazard behavior state; and determining the content of the potential safety hazard in production by the first determining module according to the category information of the potential safety hazard object and the potential safety hazard behavior state. By extracting the text characteristic data of the potential safety production hazard text to be identified, effective information of the potential safety production hazard text to be identified can be obtained; the potential safety production hazard identification model is obtained by training based on a plurality of training samples marked with the categories of the potential safety production hazard objects and the potential safety hazard behavior states, so that the potential safety production hazard texts to be identified are classified and identified through the potential safety production hazard identification model, the category information of the potential safety production hazard objects and the potential safety hazard behavior states can be accurately obtained, the potential safety production hazards existing in the potential safety production hazard objects can be directly displayed through the combination form of the category information of the potential safety production hazard objects and the category information of the potential safety hazard behavior states, and the identification accuracy of the potential safety production hazard contents is improved.
Optionally, the text of the potential safety hazard to be identified may include: the method comprises the following steps of enterprise name, enterprise address, enterprise region, enterprise industry type, potential safety production hazard discovering date, potential safety production hazard accepting unit, potential safety production hazard level and potential safety hazard description text.
Optionally, the safety production hidden danger identification model may be obtained by training based on a support vector machine classification algorithm model, a convolutional neural network classification algorithm model, or a random forest algorithm model.
Optionally, the text feature data may be one of the following: word frequency-inverse text frequency index TF-IDF, word vector, sentence vector.
Optionally, the device 40 for identifying a potential safety hazard may further include: and the third acquisition module (not shown in the figure) is used for carrying out statistical analysis on the potential safety hazard information according to the plurality of potential safety production hazard texts and respectively extracting the category information of the potential safety hazard objects and the potential safety hazard behavior states corresponding to each aspect of people, machines, materials, management and environment.
Optionally, the categories of the hidden danger objects and the hidden danger behavior states in the training samples may be labeled based on the extracted category information.
Optionally, the device 40 for identifying a potential safety hazard may further include: and the second determining module (not shown in the figure) is used for determining the hidden danger rectification information of the enterprise with the potential safety production danger according to the content of the potential safety production danger.
In addition, corresponding to the method for identifying potential safety hazard provided by the above embodiment, an embodiment of the present invention further provides an electronic device, where the electronic device may include: a memory for storing a program; and the processor is used for executing the program stored in the memory to realize all the steps of the identification method of the potential safety hazard provided by the embodiment of the invention.
In addition, corresponding to the identification method of the potential safety hazard provided in the foregoing embodiment, an embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, all steps of the identification method of the potential safety hazard provided in the embodiment of the present invention are implemented.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims (10)

1. A method for identifying potential safety production hazards is characterized by comprising the following steps:
acquiring a potential safety hazard text to be identified;
performing word segmentation processing on the potential safety production hazard text to be identified, and extracting text characteristic data of the potential safety production hazard text to be identified;
inputting the text characteristic data of the potential safety production hazard text to be identified into a pre-trained potential safety production hazard identification model, and carrying out classification and identification on the potential safety production hazard text to be identified to obtain category information of potential safety hazard objects and potential safety hazard behavior states, wherein the potential safety production hazard identification model is obtained by training on the basis of a plurality of training samples marked with the categories of the potential safety hazard objects and the potential safety hazard behavior states;
and determining the content of the potential safety hazard in production according to the category information of the potential safety hazard object and the behavior state of the potential safety hazard.
2. The method of claim 1, wherein the production safety hazard text to be identified comprises: the method comprises the following steps of enterprise name, enterprise address, enterprise region, enterprise industry type, potential safety production hazard discovering date, potential safety production hazard accepting unit, potential safety production hazard level and potential safety hazard description text.
3. The method of claim 1, wherein the safety production hazard identification model is trained based on a support vector machine classification algorithm model, a convolutional neural network classification algorithm model, or a random forest algorithm model.
4. The method of claim 1, wherein the text feature data is one of: word frequency-inverse text frequency index TF-IDF, word vector, sentence vector.
5. The method of claim 1, wherein the method further comprises:
and according to the plurality of potential safety production hazard texts, carrying out statistical analysis on the potential safety hazard information, and respectively extracting potential safety hazard objects and category information of the potential safety hazard behavior state corresponding to each aspect of people, machines, materials, management and environment.
6. The method of claim 5, wherein the categories of the hidden danger objects and the hidden danger behavior states in the training samples are labeled based on the extracted category information.
7. The method of claim 2, wherein the method further comprises:
and determining hidden danger rectification information of enterprises with hidden danger of safety production according to the hidden danger content of safety production.
8. An identification device for potential safety production hazards, characterized in that the device comprises:
the first acquisition module is used for acquiring a potential safety hazard text to be identified;
the second acquisition module is used for performing word segmentation processing on the potential safety production hazard text to be identified and extracting text characteristic data of the potential safety production hazard text to be identified;
the identification module is used for inputting the text characteristic data of the potential safety production hazard text to be identified into a pre-trained potential safety production hazard identification model, classifying and identifying the potential safety production hazard text to be identified to obtain category information of a potential hazard object and a potential hazard behavior state, wherein the potential safety production hazard identification model is obtained by training based on a plurality of training samples marked with categories of the potential hazard object and the potential hazard behavior state;
and the first determining module is used for determining the content of the potential safety hazard in production according to the category information of the potential safety hazard object and the potential safety hazard behavior state.
9. An electronic device, comprising:
a memory for storing a program;
a processor for implementing the method of any one of claims 1-7 by executing a program stored by the memory.
10. A computer-readable storage medium, characterized in that the medium has stored thereon a program which is executable by a processor to implement the method according to any one of claims 1-7.
CN202110495872.XA 2021-05-07 2021-05-07 Method, device and equipment for identifying potential safety hazard Pending CN113221556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110495872.XA CN113221556A (en) 2021-05-07 2021-05-07 Method, device and equipment for identifying potential safety hazard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110495872.XA CN113221556A (en) 2021-05-07 2021-05-07 Method, device and equipment for identifying potential safety hazard

Publications (1)

Publication Number Publication Date
CN113221556A true CN113221556A (en) 2021-08-06

Family

ID=77091511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110495872.XA Pending CN113221556A (en) 2021-05-07 2021-05-07 Method, device and equipment for identifying potential safety hazard

Country Status (1)

Country Link
CN (1) CN113221556A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139880A (en) * 2021-11-08 2022-03-04 中国安全生产科学研究院 Enterprise safety management risk dynamic monitoring system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347805A (en) * 2019-07-22 2019-10-18 中海油安全技术服务有限公司 Petroleum industry security risk key element extracting method, device, server and storage medium
CN110569330A (en) * 2019-07-18 2019-12-13 华瑞新智科技(北京)有限公司 text labeling system, device, equipment and medium based on intelligent word selection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569330A (en) * 2019-07-18 2019-12-13 华瑞新智科技(北京)有限公司 text labeling system, device, equipment and medium based on intelligent word selection
CN110347805A (en) * 2019-07-22 2019-10-18 中海油安全技术服务有限公司 Petroleum industry security risk key element extracting method, device, server and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马非: "基于HD-MSCNN的煤矿安全隐患信息自动分类方法研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》, no. 04, pages 021 - 111 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139880A (en) * 2021-11-08 2022-03-04 中国安全生产科学研究院 Enterprise safety management risk dynamic monitoring system

Similar Documents

Publication Publication Date Title
CN103294592B (en) User instrument is utilized to automatically analyze the method and system of the defect in its service offering alternately
CN110689438A (en) Enterprise financial risk scoring method and device, computer equipment and storage medium
CN107578353A (en) The registrable property determination methods of work mark based on big data and device
KR102009649B1 (en) Construction regulation legal information search system according to each classification plan of construction regulation, and method for the same
CN110147540B (en) Method and system for generating business security requirement document
TW201539216A (en) Document analysis system, document analysis method and document analysis program
Mandal et al. Overview of the FIRE 2017 IRLeD Track: Information Retrieval from Legal Documents.
US20170221075A1 (en) Fraud inspection framework
CN109492097B (en) Enterprise news data risk classification method
Erfani et al. Predictive risk modeling for major transportation projects using historical data
CN108509561B (en) Post recruitment data screening method and system based on machine learning and storage medium
CN113723737A (en) Enterprise portrait-based policy matching method, device, equipment and medium
WO2016016973A1 (en) Result evaluation device, control method for result evaluation device, and control program for result evaluation device
CN111695979A (en) Method, device and equipment for analyzing relation between raw material and finished product
CN113221556A (en) Method, device and equipment for identifying potential safety hazard
CN113807751A (en) Safety risk grade assessment method and system based on knowledge graph
Zaki et al. Analyzing financial fraud cases using a linguistics-based text mining approach
CN110543910A (en) Credit state monitoring system and monitoring method
CN115482075A (en) Financial data anomaly analysis method and device, electronic equipment and storage medium
Ali et al. Analyzing Stock Market Fraud Cases Using a Linguistics-Based Text Mining Approach.
CN114328819A (en) Power safety production hidden danger pre-control method based on knowledge graph
Stasytytė Risk identification and visualization techniques for reasonable enterprise risk management
Haeri Analyzing safety level and recognizing flaws of commercial centers through data mining approach
Xu et al. Identification of construction safety risks based on text mining and LIBSVM method
Zheng et al. Text Mining-Based Patent Analysis for Automated Rule Checking in AEC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210806

RJ01 Rejection of invention patent application after publication