CN116992034B - Intelligent event marking method, device and storage medium - Google Patents

Intelligent event marking method, device and storage medium Download PDF

Info

Publication number
CN116992034B
CN116992034B CN202311245716.3A CN202311245716A CN116992034B CN 116992034 B CN116992034 B CN 116992034B CN 202311245716 A CN202311245716 A CN 202311245716A CN 116992034 B CN116992034 B CN 116992034B
Authority
CN
China
Prior art keywords
event
label
classification
model
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311245716.3A
Other languages
Chinese (zh)
Other versions
CN116992034A (en
Inventor
李坤
黄泰峰
段曼妮
王永恒
李博康
王智
程军
陈钟鸣
梅莉
蔡阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311245716.3A priority Critical patent/CN116992034B/en
Publication of CN116992034A publication Critical patent/CN116992034A/en
Application granted granted Critical
Publication of CN116992034B publication Critical patent/CN116992034B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an intelligent event marking method, an intelligent event marking device and a storage medium, wherein the method comprises the following steps: s1, acquiring an event feedback text, and predicting an event primary label by using a multi-classification model; s2, performing collision comparison on the first-level tag and the event expert rule base to determine classification difficulty, if the first-level tag is an easy-classification tag, turning to S3, otherwise turning to S4; s3, constructing a hierarchical classification model to determine a multi-level event label result; s4, constructing a retrieval model, outputting a plurality of similar label vectors according to the similarity scores of the corresponding vectors and the label vectors of each event, if at least one similarity score is not smaller than a preset threshold value, taking one label selected by a user as an event label result, otherwise, executing S5; s5, constructing a generating model, taking the event feedback text as input, outputting a recommended label, and storing the recommended label in an event marking result table. Compared with the prior art, the method has the advantages of automatic marking, high accuracy of determining the label and the like.

Description

Intelligent event marking method, device and storage medium
Technical Field
The present invention relates to the field of text processing technologies, and in particular, to an intelligent event marking method, apparatus, and storage medium.
Background
Currently, this task is typically manually marked by an administrator, i.e., categorized and labeled according to the final feedback content of the event. However, as the number of events increases and the need for information processing continues to increase, the manner in which manual marking is done faces some challenges and limitations.
(1) Subjectivity and subjective error: due to differences in individual experience and knowledge levels of the manager, there may be subjectivity in their classification and labeling of events. This may result in different administrators giving different labels for the same event, thereby affecting subsequent data analysis and model training.
(2) Human error and inconsistency: human errors, such as label errors, classification errors, etc., are easily generated in the manual marking process. These errors may be caused by inattention, misjudgment, or fatigue. Furthermore, there is also inconsistency in the marking between different administrators, i.e. different administrators may give different labels for the same event.
(3) Work burden and inefficiency: the authorities generate a large number of events per day, and the manager needs to invest a lot of time and effort to process and annotate these events. Such manual operations are time consuming and laborious and may affect other important tasks of the manager, resulting in excessive workload and inefficiency.
(4) Expertise limitation: the classification and labeling of events requires a lot of expertise and experience in order to understand and judge accurately. However, there are variations in expertise and experience among different administrators, some of which may be unfamiliar or unintelligible with a particular type of event, resulting in an affected accuracy of classification and labeling.
Therefore, it becomes necessary to introduce methods and techniques for automatic marking.
With the rapid development of the fields of natural language processing and machine learning, a pre-training model is one of hot spots for research and application in recent years. The pre-training model can learn rich language representation and semantic understanding capability by performing self-supervised learning on large-scale text data. However, the existing pre-training model cannot cope with a scene with complex event marking, so that the marking result has low precision or cannot meet the actual requirement.
Disclosure of Invention
The invention aims to provide an intelligent event marking method, device and storage medium, which are used for intelligently marking an event by utilizing a pre-training model and combining expert rules, event label hierarchical division rules and a generating model, so that marking efficiency and marking precision are improved.
The aim of the invention can be achieved by the following technical scheme:
an intelligent event marking method comprises the following steps:
s1, acquiring an event feedback text, and predicting an event primary label by using a multi-classification model;
s2, collision comparison is carried out on the predicted event primary label and an event expert rule base, the classification difficulty corresponding to the primary label is determined, if the event primary label is an easy-classification label, the step S3 is executed, and if the event primary label is an difficult-classification label, the step S4 is executed;
s3, intelligent marking based on classification: constructing a hierarchical classification model, taking event feedback texts classified as easy-classification labels as input, outputting multi-level event label results, and storing the event label results in an event marking result table;
s4, intelligent marking based on retrieval: constructing a retrieval model, taking event feedback texts classified into tags difficult to classify as input, calculating similarity scores of corresponding vectors and tag vectors of each event, sorting from high to low according to the similarity scores, outputting a plurality of similar tag vectors ranked at the front, if at least one of the similarity scores corresponding to the similar tag vectors is not smaller than a preset threshold value, selecting one tag from the similar tag vectors as an event tag result according to user selection, storing the event tag result in an event marking result table, and otherwise, executing step S5;
s5, constructing a generating model, taking the event feedback text as input, outputting a recommended label, and storing the recommended label in an event marking result table.
The training of the multi-classification model, the hierarchical classification model and the retrieval model is realized through pre-training and fine-tuning, wherein the pre-training process carries out self-supervision learning based on large-scale unlabeled text data, the fine-tuning process fixes part of parameters of the model determined in the pre-training process, and the event feedback text with labels is used for updating and training the parameters of the model.
The event expert rule base stores rules in a dictionary structure, event primary labels are used as keys, collision comparison is carried out in the event expert rule base, and corresponding values, namely classification difficulty, are searched.
The multi-classification model activation function adopts a softmax function, so that the probability of model output is between (0, 1) and the sum is 1, and the label with the highest probability is used as an event-level label prediction result.
The activating function of the hierarchical classification model adopts a sigmoid function, so that the probability of model output is limited between (0 and 1), and all labels with the probability larger than a preset value are output to be used as multi-level event label results.
The multi-level event label result is an N-level label result, N is more than or equal to 4 and less than or equal to 6, and N is an integer.
The search model calculates a similarity score by calculating cosine similarity between vectors.
An intelligent event marking apparatus, comprising:
the primary label prediction module is used for acquiring an event feedback text and predicting an event primary label by utilizing a multi-classification model;
the classification difficulty determining module is used for carrying out collision comparison on the predicted event primary label and the event expert rule base to determine the classification difficulty corresponding to the primary label, calling the intelligent marking module based on classification if the event primary label is an easy-classification label, and calling the intelligent marking module based on retrieval if the event primary label is an difficult-classification label;
the intelligent marking module based on classification is used for executing the following steps: constructing a hierarchical classification model, taking event feedback texts classified as easy-classification labels as input, outputting multi-level event label results, and storing the event label results in an event marking result table;
the intelligent marking module based on the retrieval is used for executing the following steps: constructing a retrieval model, taking event feedback texts classified into tags difficult to classify as input, calculating similarity scores of corresponding vectors and tag vectors of each event, sorting from high to low according to the similarity scores, outputting a plurality of similar tag vectors ranked at the front, and if at least one of the similarity scores corresponding to the similar tag vectors is not smaller than a preset threshold value, selecting one tag from the similar tag vectors as an event tag result according to user selection, storing the event tag result in an event tag result table, otherwise, calling to generate a recommendation tag module;
the generation recommendation marking module is used for constructing a generation model, taking the event feedback text as input, outputting a recommendation label and storing the recommendation label in an event marking result table.
An intelligent event marking apparatus comprises a memory, a processor, and a program stored in the memory, wherein the processor implements the method described above when executing the program.
A storage medium having stored thereon a program which when executed performs a method as described above.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention realizes intelligent marking in the event management field, can assist management personnel to perform event marking work, and improves efficiency compared with the prior manual marking.
(2) The method uses two schemes of classification and search for marking, and according to the preset rules of an expert, the label is directly hit on the event which is easy to classify, so that accurate classification is realized; for the TopN labels which are most similar to the easily confused event retrieval, the manager can manually select the most similar TopN labels, so that the man-machine interaction type event marking is realized, and the marking accuracy of the easily confused labels is improved.
(3) For events which cannot be matched with the existing labels, the recommendation-generation model of the potential event labels is utilized, so that the problem of complexity of the events is solved, and the recommendation of new labels is beneficial to management staff to master the latest event situation, so that risks are timely solved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic view of the structure of the device of the present invention.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.
The embodiment provides an intelligent event marking method, as shown in fig. 1, comprising the following steps:
s1, acquiring an event feedback text, and predicting an event primary label by using a multi-classification model;
s2, collision comparison is carried out on the predicted event primary label and an event expert rule base, the classification difficulty corresponding to the primary label is determined, if the event primary label is an easy-classification label, the step S3 is executed, and if the event primary label is an difficult-classification label, the step S4 is executed;
s3, intelligent marking based on classification: constructing a hierarchical classification model, taking event feedback texts classified as easy-classification labels as input, outputting multi-level event label results, and storing the event label results in an event marking result table;
s4, intelligent marking based on retrieval: constructing a retrieval model, taking event feedback texts classified into tags difficult to classify as input, calculating similarity scores of corresponding vectors and tag vectors of each event, sorting from high to low according to the similarity scores, outputting a plurality of similar tag vectors ranked at the front, if at least one of the similarity scores corresponding to the similar tag vectors is not smaller than a preset threshold value, selecting one tag from the similar tag vectors as an event tag result according to user selection, storing the event tag result in an event marking result table, and otherwise, executing step S5;
s5, constructing a generating model, taking the event feedback text as input, outputting a recommended label, and storing the recommended label in an event marking result table.
In this embodiment, training of the multi-classification model, the hierarchical classification model and the search model is achieved through pre-training-fine tuning, wherein the pre-training process performs self-supervised learning based on large-scale unlabeled text data, the fine tuning process fixes part of parameters of the model determined in the pre-training process, and update training of the model parameters is performed by using event feedback text with labels.
a) Pretraining (Pretraining): pretraining refers to self-supervised learning over large scale unlabeled text data. The process allows the model to automatically learn statistical rules and semantic information of the language by exposing the model to a large amount of text data. The pre-training model usually adopts a self-encoder, a transducer and other structures, performs multiple rounds of iterative training on data, and learns the representation of the text through methods such as maximum likelihood estimation and the like.
b) Fine-tuning (Fine-tuning): fine tuning refers to the process of supervised learning on tagged event data using a pre-trained model as an initialization parameter on an event tag classification task. The goal of the fine tuning is to further optimize the parameters of the model on the event-tagged dataset to adapt it to the characteristics and requirements of the task. Trimming generally comprises two steps: freezing most of the parameters of the pre-training model, and training only the last layers or classifiers; training using the tag data of the event tag classification task, updating the model parameters by optimizing the loss function.
And predicting the corresponding label according to the event feedback content through the model file obtained through the pre-training and fine-tuning.
In step S1, in one embodiment, the multi-classification model is implemented by using an ERNIE 3.0 model through a hundred degree pad deep learning framework, and is based on a Transformer network structure, including a plurality of encoder layers. Each encoder layer is composed of a multi-headed self-attention mechanism and a feed-forward neural network for feature extraction and representation of the input text. The calculation formula of the attention mechanism is as follows:
Attention(Q,K,V)=softmax(Similarity(Q,K))V
where Q, K, V represents a matrix of entered queries, keys and values.
The calculation formula of the feedforward neural network is as follows:
FFN(x) = max(0,xW_1 +b_1)W_2 + b_2
where w_1, b_1, w_2, and b_2 represent learnable weights and biases.
In the pre-training stage, ERNIE 3.0 trains using massive text content data including encyclopedia, forum, question-answer communities and the like in a large-scale unsupervised pre-training data set. Through the mask language modeling (Masked Language Modeling) and next sentence prediction (Next Sentence Prediction) tasks, ERNIE 3.0 is able to learn rich semantic representations. In the fine tuning stage, the embodiment keeps the pretrained parameters (including all weights and offsets in the transducer network) fixed, and adds a full-connection layer at the top to conduct classification prediction, and only the weights of the full-connection layer need to be subjected to fine tuning change so as to adapt to specific downstream text multi-classification tasks.
In a specific embodiment, the multi-classification model is specifically structured as 6 layers (6-layer), 768 hidden nodes (768-hidden), 12 heads of multi-head attention (12-heads), 7500 ten thousand parameters (75M parameters), trained on chinese corpus, see in particular: https:// paddlendlendics io/zh/last/model_zo/transformers/ERNIE/contents.
In another embodiment, the multi-classification model may also adopt other model structures, which do not affect the achievement of the object of the present invention, where the activation function of the network adopts a softmax function, as shown in equation (1), so that the probability of the model output is between (0, 1) and the sum is 1, and the label with the highest probability is used as the event-level label prediction result.
(1)
In step S2, the event expert rule base stores rules in a dictionary structure, uses the first-level event label as a key, performs collision comparison in the event expert rule base, and searches for a corresponding value, namely classification difficulty. For example, the rule base stores { "dispute": easy classification label, "help" easy classification label, "consultation": difficult classification label, … … }.
In this embodiment, the event expert rule base only divides the classification difficulty into 2 classes, in another embodiment, more dimensions may be divided, and models are respectively built for different dimensions to predict, so as to improve the marking accuracy.
In step S3, the hierarchical classification model activation function uses a sigmoid function, as shown in formula (2), so that the probability of model output is limited between (0, 1), and all labels with probability greater than a preset value are output as a multi-level event label result.
(2)
In one embodiment, the structure of the hierarchical classification model may refer to the structure of the multi-classification model described in this embodiment, with the difference that the activation functions used are different. Moreover, the difference of model structures of the hierarchical classification model does not affect the achievement of the object of the invention
The multi-level event label result is generally divided into 4-6 levels, and if the number of levels is too small, the multi-level event label result is too general and cannot refine the problem; partitioning too many layers increases the complexity of analysis, affecting the response speed. In one embodiment, the multi-level event tag result is a five-level tag result.
In step S4, the search model also uses a pretraining-fine tuning-based concept, and the pretraining model may employ a locketqa-zh-duread-query-encoder, but the downstream task becomes a TopN-like label returned by each event. After model training is completed, each event and label become 768-dimensional vectors, the event marking problem becomes to find the most similar TopN vector of the corresponding vector of each event, namely, the first N labels and similarity scores of the most similar events are obtained by cosine similarity, so that an event label recommending function is realized, and at the moment, a user can select 1 label as a final label to realize marking.
In one embodiment, the retrieval model uses a RocketQA model implemented by a hundred degree Paddle deep learning framework, based on a transducer network architecture, comprising multiple encoder layers. The structure of each encoder layer may refer to the structure of the multi-classification model described above, and will not be described herein. In the pre-training phase, the RocketQA is trained using massive text content data such as a large-scale unsupervised pre-training dataset (e.g., duReader_retrieval). Through mask language modeling and reading understanding tasks, the dockqa is able to learn rich semantic representations. In the fine tuning stage, the embodiment keeps the pretrained parameters (including all weights and offsets in the transducer network) fixed, and adds a full-connection layer at the top to extract text vectors, and only the weights of the full-connection layer need to be subjected to fine tuning change so as to facilitate the retrieval of similar vectors by downstream tasks. Cosine similarity calculation formula: cosine_similarity (a, B) =dot_product (AB)/(|a|b|) wherein dot_product (a, B) represents the dot product of vector a and vector B, |a|| represents the norm of vector a (i.e., the length of vector a), and|b|| represents the norm of vector B.
In step S4, if at least one of the similarity scores corresponding to the plurality of similar tag vectors is not less than 0.5, the process is ended, and the result is stored in the intelligent marking result table; if the label is smaller than 0.5, the label corresponding to the event is not strong in relevance with the existing label, the label is a potential new label, and the step S5 is executed to recommend the label.
In step S5, the generated model directly adopts a trained large model, such as GPT4, without training itself.
To verify the performance of the present invention, this example conducted experiments on a given event data set, with data accumulated at 41.3w events, the data set was divided into a training set (90%) and a test set (10%). As can be seen from Table 1, the search-based method performs well, especially the search based on event-tag, the hit rates of the front 3 and the front 5 are basically more than 80%, and the marking precision is high.
Table 1 experimental results
The present invention also provides a computer readable storage medium storing a computer program operable to perform an intelligent event marking method as provided in fig. 1 above.
The invention also provides an intelligent event marking device, as shown in fig. 2, comprising:
the primary label prediction module is used for acquiring an event feedback text and predicting an event primary label by utilizing a multi-classification model;
the classification difficulty determining module is used for carrying out collision comparison on the predicted event primary label and the event expert rule base to determine the classification difficulty corresponding to the primary label, calling the intelligent marking module based on classification if the event primary label is an easy-classification label, and calling the intelligent marking module based on retrieval if the event primary label is an difficult-classification label;
the intelligent marking module based on classification is used for executing the following steps: constructing a hierarchical classification model, taking event feedback texts classified as easy-classification labels as input, outputting multi-level event label results, and storing the event label results in an event marking result table;
the intelligent marking module based on the retrieval is used for executing the following steps: constructing a retrieval model, taking event feedback texts classified into tags difficult to classify as input, calculating similarity scores of corresponding vectors and tag vectors of each event, sorting from high to low according to the similarity scores, outputting a plurality of similar tag vectors ranked at the front, and if at least one of the similarity scores corresponding to the similar tag vectors is not smaller than a preset threshold value, selecting one tag from the similar tag vectors as an event tag result according to user selection, storing the event tag result in an event tag result table, otherwise, calling to generate a recommendation tag module;
the generation recommendation marking module is used for constructing a generation model, taking the event feedback text as input, outputting a recommendation label and storing the recommendation label in an event marking result table.
At the hardware level, the intelligent event marking apparatus includes a processor, an internal bus, a network interface, a memory and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by a person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims (9)

1. An intelligent event marking method is characterized by comprising the following steps:
s1, acquiring an event feedback text, and predicting an event primary label by using a multi-classification model;
s2, collision comparison is carried out on the predicted event primary label and an event expert rule base, the classification difficulty corresponding to the primary label is determined, if the event primary label is an easy-classification label, the step S3 is executed, and if the event primary label is an difficult-classification label, the step S4 is executed; the event expert rule base stores rules in a dictionary structure, event primary labels are used as keys, collision comparison is carried out in the event expert rule base, and corresponding values, namely classification difficulty, are searched;
s3, intelligent marking based on classification: constructing a hierarchical classification model, taking event feedback texts classified as easy-classification labels as input, outputting multi-level event label results, and storing the event label results in an event marking result table;
s4, intelligent marking based on retrieval: constructing a retrieval model, taking event feedback texts classified into tags difficult to classify as input, calculating similarity scores of corresponding vectors and tag vectors of each event, sorting from high to low according to the similarity scores, outputting a plurality of similar tag vectors ranked at the front, if at least one of the similarity scores corresponding to the similar tag vectors is not smaller than a preset threshold value, selecting one tag from the similar tag vectors as an event tag result according to user selection, storing the event tag result in an event marking result table, and otherwise, executing step S5;
s5, constructing a generating model, taking the event feedback text as input, outputting a recommended label, and storing the recommended label in an event marking result table.
2. The method according to claim 1, wherein the training of the multi-classification model, the hierarchical classification model and the search model is achieved through pre-training-fine tuning, wherein the pre-training process performs self-supervised learning based on large-scale unlabeled text data, and the fine tuning process fixes part of parameters of the model determined by the pre-training process, and updates and trains the model parameters using event feedback text with labels.
3. The method of claim 1, wherein the activation function of the multi-classification model uses a softmax function, such that the probability of model output is between (0, 1) and the sum is 1, and the label with the highest probability is used as the event-level label prediction result.
4. The method for marking intelligent events according to claim 1, wherein the activation function of the hierarchical classification model adopts a sigmoid function, so that the probability of model output is limited between (0, 1), and all labels with probability larger than a preset value are output as multi-level event label results.
5. The intelligent event marking method according to claim 1, wherein the multi-level event label result is an N-level label result, N is 4-6, and N is an integer.
6. The method of claim 1, wherein the search model calculates the similarity score by calculating cosine similarity between vectors.
7. An intelligent event marking apparatus, comprising:
the primary label prediction module is used for acquiring an event feedback text and predicting an event primary label by utilizing a multi-classification model;
the classification difficulty determining module is used for carrying out collision comparison on the predicted event primary label and the event expert rule base to determine the classification difficulty corresponding to the primary label, calling the intelligent marking module based on classification if the event primary label is an easy-classification label, and calling the intelligent marking module based on retrieval if the event primary label is an difficult-classification label;
the intelligent marking module based on classification is used for executing the following steps: constructing a hierarchical classification model, taking event feedback texts classified as easy-classification labels as input, outputting multi-level event label results, and storing the event label results in an event marking result table;
the intelligent marking module based on the retrieval is used for executing the following steps: constructing a retrieval model, taking event feedback texts classified into tags difficult to classify as input, calculating similarity scores of corresponding vectors and tag vectors of each event, sorting from high to low according to the similarity scores, outputting a plurality of similar tag vectors ranked at the front, and if at least one of the similarity scores corresponding to the similar tag vectors is not smaller than a preset threshold value, selecting one tag from the similar tag vectors as an event tag result according to user selection, storing the event tag result in an event tag result table, otherwise, calling to generate a recommendation tag module;
the generation recommendation marking module is used for constructing a generation model, taking the event feedback text as input, outputting a recommendation label and storing the recommendation label in an event marking result table.
8. An intelligent event marking apparatus comprising a memory, a processor, and a program stored in the memory, wherein the processor implements the method of any of claims 1-6 when executing the program.
9. A storage medium having a program stored thereon, wherein the program, when executed, implements the method of any of claims 1-6.
CN202311245716.3A 2023-09-26 2023-09-26 Intelligent event marking method, device and storage medium Active CN116992034B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311245716.3A CN116992034B (en) 2023-09-26 2023-09-26 Intelligent event marking method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311245716.3A CN116992034B (en) 2023-09-26 2023-09-26 Intelligent event marking method, device and storage medium

Publications (2)

Publication Number Publication Date
CN116992034A CN116992034A (en) 2023-11-03
CN116992034B true CN116992034B (en) 2023-12-22

Family

ID=88534063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311245716.3A Active CN116992034B (en) 2023-09-26 2023-09-26 Intelligent event marking method, device and storage medium

Country Status (1)

Country Link
CN (1) CN116992034B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045847A (en) * 2019-12-18 2020-04-21 Oppo广东移动通信有限公司 Event auditing method and device, terminal equipment and storage medium
CN111737484A (en) * 2020-05-15 2020-10-02 浙江工业大学 Warning situation knowledge graph construction method based on joint learning
CN112597366A (en) * 2020-11-25 2021-04-02 中国电子科技网络信息安全有限公司 Encoder-Decoder-based event extraction method
CN114090781A (en) * 2022-01-20 2022-02-25 北京零点远景网络科技有限公司 Text data-based repulsion event detection method and device
CN114595333A (en) * 2022-04-27 2022-06-07 之江实验室 Semi-supervision method and device for public opinion text analysis
CN115080748A (en) * 2022-08-16 2022-09-20 之江实验室 Weak supervision text classification method and device based on noisy label learning
CN115203421A (en) * 2022-08-02 2022-10-18 中国平安人寿保险股份有限公司 Method, device and equipment for generating label of long text and storage medium
WO2022227207A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Text classification method, apparatus, computer device, and storage medium
CN115422352A (en) * 2022-07-29 2022-12-02 苏州市公安局苏州工业园区分局 Event label detection method based on similarity and element knowledge model fusion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10055481B2 (en) * 2016-07-20 2018-08-21 LogsHero Ltd. Method and system for automatic event classification
CN108334605B (en) * 2018-02-01 2020-06-16 腾讯科技(深圳)有限公司 Text classification method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045847A (en) * 2019-12-18 2020-04-21 Oppo广东移动通信有限公司 Event auditing method and device, terminal equipment and storage medium
CN111737484A (en) * 2020-05-15 2020-10-02 浙江工业大学 Warning situation knowledge graph construction method based on joint learning
CN112597366A (en) * 2020-11-25 2021-04-02 中国电子科技网络信息安全有限公司 Encoder-Decoder-based event extraction method
WO2022227207A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Text classification method, apparatus, computer device, and storage medium
CN114090781A (en) * 2022-01-20 2022-02-25 北京零点远景网络科技有限公司 Text data-based repulsion event detection method and device
CN114595333A (en) * 2022-04-27 2022-06-07 之江实验室 Semi-supervision method and device for public opinion text analysis
WO2023092961A1 (en) * 2022-04-27 2023-06-01 之江实验室 Semi-supervised method and apparatus for public opinion text analysis
CN115422352A (en) * 2022-07-29 2022-12-02 苏州市公安局苏州工业园区分局 Event label detection method based on similarity and element knowledge model fusion
CN115203421A (en) * 2022-08-02 2022-10-18 中国平安人寿保险股份有限公司 Method, device and equipment for generating label of long text and storage medium
CN115080748A (en) * 2022-08-16 2022-09-20 之江实验室 Weak supervision text classification method and device based on noisy label learning

Also Published As

Publication number Publication date
CN116992034A (en) 2023-11-03

Similar Documents

Publication Publication Date Title
CN110209806B (en) Text classification method, text classification device and computer readable storage medium
Kadous Temporal classification: Extending the classification paradigm to multivariate time series
EP3173983A1 (en) A method and apparatus for providing automatically recommendations concerning an industrial system
Ye et al. Few-shot learning with a strong teacher
He et al. Dynamic feature selection for dependency parsing
CN113343690B (en) Text readability automatic evaluation method and device
CN111274790A (en) Chapter-level event embedding method and device based on syntactic dependency graph
Zhang et al. n-BiLSTM: BiLSTM with n-gram Features for Text Classification
CN113254675B (en) Knowledge graph construction method based on self-adaptive few-sample relation extraction
US20230386646A1 (en) Combined vision and language learning models for automated medical reports generation
CN116737967B (en) Knowledge graph construction and perfecting system and method based on natural language
Ohashi et al. Convolutional neural network for classification of source codes
CN113836891A (en) Method and device for extracting structured information based on multi-element labeling strategy
Popa et al. Implicit discourse relation classification with syntax-aware contextualized word representations
Athavale et al. Predicting algorithm classes for programming word problems
Basu et al. Word difficulty prediction using convolutional neural networks
Preetham et al. Comparative Analysis of Research Papers Categorization using LDA and NMF Approaches
CN116992034B (en) Intelligent event marking method, device and storage medium
Wang et al. $ k $-Nearest Neighbor Augmented Neural Networks for Text Classification
CN113722477B (en) Internet citizen emotion recognition method and system based on multitask learning and electronic equipment
Pradhan et al. Knowledge graph generation with deep active learning
Ng et al. Sentiment analysis using learning-based approaches: A comparative study
Baad Automatic job skill taxonomy generation for recruitment systems
Baginski Automatic Detection and classification of suicide-related content in English texts
Sithole et al. Mining knowledge graphs to map heterogeneous relations between the internet of things patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant