CN115935245A - Automatic classification and distribution method for government affair hotline cases - Google Patents

Automatic classification and distribution method for government affair hotline cases Download PDF

Info

Publication number
CN115935245A
CN115935245A CN202310228000.6A CN202310228000A CN115935245A CN 115935245 A CN115935245 A CN 115935245A CN 202310228000 A CN202310228000 A CN 202310228000A CN 115935245 A CN115935245 A CN 115935245A
Authority
CN
China
Prior art keywords
case
model
training
department
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310228000.6A
Other languages
Chinese (zh)
Other versions
CN115935245B (en
Inventor
杨伊态
李颖
李军霞
王敬佩
柯宝宝
黄亚林
张兆文
李成涛
陈胜鹏
付卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geospace Information Technology Co ltd
Original Assignee
Geospace Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geospace Information Technology Co ltd filed Critical Geospace Information Technology Co ltd
Priority to CN202310228000.6A priority Critical patent/CN115935245B/en
Publication of CN115935245A publication Critical patent/CN115935245A/en
Application granted granted Critical
Publication of CN115935245B publication Critical patent/CN115935245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention is suitable for the field of artificial intelligence, and provides an automatic classification and allocation method for government affair hotline cases, which comprises the following steps: s1, establishing a sample set and training a pre-training model; s2, traversing a sample set according to a trained pre-training model to generate a case category related linked list and a department related linked list; s3, predicting case categories and departments in charge with highest probability of samples by using a pre-training model, obtaining case category enhancement vectors and department enhancement vectors by combining case category related linked lists and department related linked lists, and obtaining a case classification model and a department allocation model through training and parameter updating; and S4, acquiring the input case content, processing through a case classification model and a department distribution model, and outputting a predicted case classification result and a predicted division result. The method has higher accuracy rate of case classification and case distribution, can automatically classify complaints in a convenient government hot line and distribute the complaints to competent departments for processing.

Description

Automatic classification and distribution method for government affair hotline cases
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to an automatic classification and allocation method for government affair hotline cases.
Background
In the government affair service hotline, citizens report appeal information in the modes of calling, weChat small programs, APP, portal website message leaving and the like, and a wire connector completes case classification according to appeal contents and then distributes to related responsibility units and disposal departments. With the wide application of the government affair service hotline, on one hand, the acceptance amount of hotline cases is increased day by day, the manual disposal cost is increased day by day, and when a social heat problem occurs, the seats are often busy and cannot meet the requirements of vast citizens; on the other hand, due to the fact that the cases of the convenient government affair hotline are classified more, the distinction among the cases is not obvious, the case association departments are wide, and the levels are complex, the realization of rapid and accurate case classification and allocation becomes a difficult problem to be solved urgently for the improvement effect of the government affair service hotline office work.
At present, the automatic classification and allocation method of the government affair hotline is roughly divided into three categories:
the first category is rule decision tree based approaches. Such methods first design filtering and matching rules and then classify cases according to the rules. Such as keyword-based matching, knowledge base-based lookup, etc. The method has good effect on cases with obvious category distinction, but has poor accuracy on case classification and allocation with complicated categories or similar categories.
The second category is machine learning based methods. The method classifies and dispatches case texts by designing a machine learning algorithm or a model. Common methods include XGboost algorithm, cosine similarity and SVM support vector machine. The method can learn more category features and partial semantic features, and the accuracy rate of classifying and allocating cases with complex categories or similar categories is improved but not ideal compared with that of the first category.
The third category is neural network based methods. The method extracts deep semantic features of case texts by designing a multilayer neural network, and has higher accuracy in classification of similar case categories and distribution of similar departments compared with a machine learning method. However, the existing neural network-based method has poor accuracy in case classification with high similarity and also has poor accuracy in departments which classify cases of different classes into different administrative levels.
In order to realize efficient case classification and distribution in business, at present, a computer is used for replacing manual work to realize automatic case classification by a method based on a design rule decision tree, a machine learning algorithm and a deep neural network, and then the case classification and the distribution are distributed to relevant responsibility departments for disposal. However, the existing method has low accuracy for classifying categories which are not obvious, such as the case categories of 'construction site noise (daytime)', 'construction site noise (night)', and 'business noise problem'; the accuracy rate of the division allocation of the departments belonging to the same class and different levels is low, and cases processing ranges between departments of 'city management committee' and 'district management committee' are difficult to distinguish.
Disclosure of Invention
In view of the above problems, the present invention aims to provide an automatic classification and allocation method for government affair hotline cases, which aims to solve the technical problem of low accuracy of the existing method.
The invention adopts the following technical scheme:
the invention provides an automatic classification and allocation method for government affair hotline cases, which comprises the following steps:
s1, establishing a sample set and training a pre-training model;
s2, traversing a sample set according to a trained pre-training model to generate a case category related linked list and a department related linked list;
s3, predicting case categories and departments in charge with highest probability of samples by using a pre-training model, obtaining case category enhancement vectors and department enhancement vectors by combining case category related linked lists and department related linked lists, and obtaining a case classification model and a department allocation model through training and parameter updating;
and S4, acquiring the input case content, processing through a case classification model and a department distribution model, and outputting a predicted case classification result and a predicted division result.
The invention has the beneficial effects that: the invention provides an automatic classification and allocation method for government affair hotline cases, which is based on a neural network model, wherein a case classification model improves the classification accuracy of the model to similar case categories by mainly training the differences of the similar categories; the department allocation model improves the accuracy of the model for allocating cases by fusing administrative division information and the difference of key training similar case administrative departments. Compared with the existing method, the method has higher accuracy in case classification and case allocation, can automatically classify the complaints in the convenient government affair hotline and allocate the complaints to the competent departments for processing, improves the service efficiency of the convenient government affair hotline, reduces the labor cost, improves the intellectualization and automation degree of the hotline, and improves the civil service level.
Drawings
FIG. 1 is a flow chart of an automatic classification and allocation method for government affair hotline cases according to an embodiment of the present invention;
FIG. 2 is a schematic process diagram of a pre-training model provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of case category correlation linked list generation;
FIG. 4 is a process diagram of case classification model and department assignment model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
As shown in fig. 1, the method for automatically classifying and allocating government affairs hotline cases provided by this embodiment includes the following steps:
s1, establishing a sample set and training a pre-training model.
The step is used for establishing a pre-training model and fine-tuning parameters of the pre-training model, and the specific process of the step is as follows with reference to fig. 2:
s11, establishing a sample set, wherein the sample format is [ case information, area information, case category and administrative department ], the area information is optional, and the sample set is divided into a training sample set and a verification sample set according to the proportion.
For example, sample a [ "day noise makes people go to sleep, complains, and no person is in charge, too slow, and" Da san ","440", and" 105 "after one hour, where 440 and 105 are case type code and department code, respectively.
And S12, for each sample in the training sample set, converting the case information in the sample into an embedded vector by using a BERT model.
Firstly, if the sample has the area information, combining the area information with the case information to obtain the combined case information. Such as: the case information after the sample a is combined is that noise in a Da zone everyday makes people not sleep and complains, and after one hour, people do not have a pipe, and the case information is too slow and too slow.
Then, the merged case information is converted into corresponding morpheme codes by a BERT word segmentation device, and special character codes are added at the head and the tail of the case information to form the morpheme codes of the case information. In this embodiment, the BERT model is Chinese-BERT-wwm-extBERT (Bidirectional Encoder Representation from transformations).
Such as: the information after the sample a is combined is converted into a word element code as follows: [101, 1515, 1515, 1277, 1921, 1692, 7509, 6375, 782, 3187, 3791, 4717, 6230, 8024, 2347, 2832, 6401, 8024, 6814, 749, 671, 702, 2207, 3198, 738, 3766, 782, 5052, 8024, 1922, 2714, 8024, 1922, 2714, 102]. Where 101 is the encoding of the special character 'CLS' and 102 is the encoding of the special character 'SEP'.
Finally, the morpheme code of case information is input into a BERT model to obtain an embedded vector E of a special character' CLS CLS And a token vector [ E ] for each token 1 ,E 2 ,E 3 …E n ]And n is the number of the lemma.
The outputs of the BERT model are usually mainly two, one is the embedded vector of the special character "CLS", and the other is the lemma vector of each word. The CLS embedded vector is obtained by Pool processing all the lemma vectors. Simple classification typically uses the embedded vectors of CLS directly, but individual token vectors can be used if the model requires each token vector. In general, the embedded vector of the CLS is not used together with the lemma vector, and the embedded vector of the CLS is essentially derived from each lemma vector and represents the semantic features of the whole text. The embedded vectors obtained after processing are different for different samples.
S13, respectively inputting the embedded vectors into two linear layers, outputting each linear layer to a SOFTMAX layer to respectively obtain a case type number and a department under charge number, and updating model parameters of the pre-training model by using a gradient descent method.
The two linear layers are linear layers L1 and L2, respectively. In the training phase of the pre-training model, the embedded vector E CLS The linear layer L1 is input, and then the loss value is calculated using the cross entropy function, and the model parameters are updated using the gradient descent method. Input dimension and embedding vector E of linear layer L1 CLS The output dimension is the case category number N class . In the same way, vector E will be embedded CLS Inputting a linear layer L2, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent methodAnd (4) counting. Input dimension and embedding vector E of linear layer L2 CLS The output dimension is the number of departments N depa . This embodiment uses a pytorech framework and the cross entropy function used is Cross EntropyLoss ().
When used in subsequent models, the vector E is embedded during the prediction phase CLS Inputting the case sequence number into a linear layer L1, then inputting a softmax layer to obtain the probability value of each case category, and then taking the case sequence number with the highest probability value as a prediction result. In the same way, vector E will be embedded CLS Inputting the linear layer L2, then inputting the softmax layer to obtain the probability value of each department, and then taking the department serial number with the highest probability value as a prediction result.
And S14, after the pre-training model is subjected to iterative training by using the training sample set, verifying the accuracy of the model by using the verification sample set, and taking the model with the highest verification accuracy as the trained pre-training model, wherein the output of the pre-training model in the prediction stage is the probability value of each case type and department to which the sample belongs.
And (4) performing repeated iterative training on the training set according to the pre-training model in the above mode, verifying the accuracy of the model by using the verification sample set, and selecting the model with the highest accuracy. In the concrete implementation:
firstly, after the pre-training model traverses all training sample sets, parameters are frozen, in the embodiment, a pitoch frame is used, and parameters are frozen by using a model.
And then inputting each sample in the verification sample set into a pre-training model, and obtaining a corresponding predicted case category and a predicted department for each sample. For each validation sample, the prediction is correct if the predicted case category is consistent with the case categories in the sample, otherwise the prediction is wrong. The department prediction mode is the same. Wherein the accuracy is the number of correct verification divided by the total number of verification samples. The total verification accuracy of the pre-training model is (case accuracy + department accuracy)/2. If the accuracy is higher than the existing highest accuracy, the model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.
And S2, traversing the sample set according to the trained pre-training model to generate a case category related linked list and a department related linked list.
The process of generating the case category correlation linked list is as follows:
s211, setting an empty case type correlation linked list, wherein in the correlation linked list, the position sequence number x stores the correlation case type of the case type x, and the length of the correlation linked list is N class
S212, inputting each sample into the trained pre-training model to obtain the probability value of each case type to which the sample belongs, and sequentially outputting the front k with the maximum probability value 1 The serial number of each case type;
s213, comparing the output case types with the correct case types of the sample one by one in sequence, and ending when the comparison is consistent or the comparison is not consistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the sample correct case type serial number of the related linked list, and rearranging the information from large to small according to the count; if the current case type is y and the correct case type is z, the information recording rule is as follows: if the information of the correct case type z exists in the position y of the related linked list, adding 1 to the information count corresponding to the correct case type z; if there is no correct case category z information, add information { case category z: count 1}.
In the step, each sample is used with a trained pre-training model to obtain the probability value of each case category to which the sample belongs, and the front k is selected 1 The case category with the maximum probability value is output as k according to the probability value from large to small 1 The serial number of each case category.
And then comparing the output case types with the case types with correct samples one by one, starting from the case type with the maximum probability value, and recording information in the position y of the case type related linked list if the current case type y is inconsistent with the correct case type z recorded by the samples. The rule of recording information is that if the information of case type z exists in the position y of the related linked list, the information count corresponding to the correct case type z is increased by 1; if there is no correct case category z information, add information { case category z: count 1}.
Repeat the comparison until case type y matches the correct case type z for the sample record, or k 1 And stopping comparing all the case types. When the model prediction is correct, the first comparison is consistent, and no information is recorded; when k is predicted by the model 1 If the case categories are not consistent, k will be recorded 1 And (4) secondary information.
The case category correlation linked list shown in fig. 3 is generated schematically, and the case category is taken first 3 times with 4 samples as input.
Input sample 1, model prediction result of pre-training model 1: the case category of the top 3 predicted probability values is: 449 326, 11; the correct case category is 326;
the predicted case category 449 does not accord with the correct case category 326, and a position of a 449 th linked list related to the case category is recorded as { 326;
the predicted case category 326 matches the correct case category 326, and the process ends.
Input sample 2, model prediction result of pre-trained model 2: the case category of the top 3 predicted probability values is as follows: 449,440,326; the correct case category is 326;
the predicted case category 449 is inconsistent with the correct case category 326, and the record related to the correct case category 326 is already recorded at the position of the 449 th link table of the case category, so that the count of the correct case category 326 is only increased by 1 to become { 326;
the predicted case category 440 is inconsistent with the correct case category 326, and a { 326;
the predicted case category 326 matches the correct case category 326, and the process ends.
Input sample 3, model prediction result of pre-training model 3: the case category of the top 3 predicted probability values is as follows: 449,440,1; the correct case category is 1;
the predicted case category 449 does not coincide with the correct case category 1, and a { 1.
Sorting the 449 position information in the case category related linked list according to the category count, and recording as follows: {326:2},{1:1}.
The predicted case category 440 is inconsistent with the correct case category 1, and a { 1.
And sorting the 440 position information in the case category related linked list according to the category counting size, and recording as follows: {326:1},{1:1}.
The predicted case type 1 matches the correct case type 1, and the process ends.
Input sample 4, model prediction result of pre-training model 4: the case category of the top 3 predicted probability values is: 1,2,449; the correct case category is 1.
The predicted case type 1 matches the correct case type 1, and the process ends.
And completing the generation of the case category correlation linked list.
The generation mode of the department related linked list is the same as the case type related linked list, and the generation mode is as follows: s321, setting an empty department related linked list; s322, inputting each sample into the trained pre-training model to obtain the probability value of each department to which the sample belongs, and sequentially outputting the front k with the maximum probability value 1 The serial number of each department; s323, comparing the output departments with correct departments of the sample one by one in sequence, and ending when the first comparison is consistent or all comparisons are inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the current department serial number of the related linked list, and rearranging the information from large to small according to the count; if the current department is y 'and the correct department is z', the information recording rule is as follows: if the information of the correct department z ' exists in the position y ' of the related linked list, adding 1 to the information count corresponding to the correct department z '; if there is no information of the correct department z ', add information { department z': count 1}.
And S3, predicting the case type and the department in charge with the highest probability of the sample by using the pre-training model, obtaining a case type enhanced vector and a department enhanced vector by combining the case type related linked list and the department related linked list, and obtaining a case classification model and a department allocation model by training and parameter updating.
As shown in FIG. 4, the case classification model and the department distribution model are trained in the same process. For obtaining a case classification model, the specific process is as follows:
s311, predicting the case type c with the highest probability value of the current sample by using the pre-training model.
Firstly, a sample set is input, the sample set is divided into a training sample set and a verification sample set according to the proportion, and the case class c with the highest probability of the sample is predicted through a pre-training model.
After the sample a is converted into the lexical element code input bert model, the embedded vector E is obtained CLS And a lemma vector [ E 1 ,E 2 ,E 3 …E n ]. Embedding vector E CLS Inputting the linear layer L1, and then predicting the case class c with the highest probability value through a softmax layer. Assume that the predicted case category is 449 and department is 105.
S312, taking out the front k from the serial number c position of the case type related linked list 2 The case categories related to the case category c are respectively marked as c 1 ,c 2 …c k2
S313, taking out the case type vector group with the serial numbers c 1 ,c 2 …c k2 And case type vector with sequence number c, respectively denoted as Rc 1 ,Rc 2 …Rc k2 ,Rc。
The case category vector group is composed of a dimension of [ N ] class ,E]Matrix formation of where N class The number of case categories, E the lemma dimension, and the embedding vector E cls Is uniform, typically 768.
Such as: assume that the predicted case category is 449.K 2 =2, top 2 most relevant category of case category 449 362,1; the case category vectors on lines 362,1 and 449 are taken from the set of case category vectors.
S314, calculating case type enhancement vectors corresponding to the case type vectors
Figure SMS_1
,/>
Figure SMS_2
,…,/>
Figure SMS_3
,/>
Figure SMS_4
Wherein
Figure SMS_5
,/>
Figure SMS_6
In the formula>
Figure SMS_7
The jth lemma vector of a sample represents a dot product.
S315, inputting the case type enhancement vectors into two linear layers in sequence, and then inputting the case type enhancement vectors into a SOFTMAX layer.
Enhancing case categories by vectors
Figure SMS_8
,/>
Figure SMS_9
,…,/>
Figure SMS_10
,/>
Figure SMS_11
Inputting two linear layers L3 and L4, the input and output dimensions of the linear layer L3, the input dimension of the linear layer L4 and the embedding vector E cls The lengths of the linear layers L4 are all consistent, and the output dimension of the linear layer L4 is the case class number N class
For example, for the category vector corresponding to the case category 362, the dimension is [1, e ], the vector of each token is [1, e ], the category vector is multiplied by each token point to obtain n category token vectors with the dimension of [1, e ], and then the n category token vectors are added to obtain a case category enhanced vector with the dimension of [1, e ].
Then the dimension [1, E ]]Case class enhancement vector ofThe linear layer L3 is input, and the resulting dimension is [1,E ]](ii) a The result is then input into the linear layer L4, resulting in a result dimension of [1, N' class ]. And finally inputting the result into a SOFTMAX layer for normalization.
S316, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method, wherein only the linear layers L4 and L5 and the case category vector group are trained, and the parameters of the Bert model, the linear layer L1 and the linear layer L2 are frozen, namely the parameters are not updated in the iterative training of the pre-training model.
S317, after iterative training is carried out on the case classification model through the training sample set, the accuracy of the model is verified through the verification sample set, and the version of the model with the highest verification accuracy is used as the well-trained case classification model.
And after the case classification model traverses all the training sample sets, freezing the parameters. The present embodiment uses the pytorech framework and uses the model. Inputting each sample in the verification sample set into a case classification model, and obtaining a corresponding predicted case type for each sample; for each validation sample, the prediction is correct if the predicted case category is consistent with the case categories in the sample, otherwise the prediction is wrong. The accuracy is the number of correct verification divided by the total number of verification samples. The verification accuracy of the case classification model is (case accuracy + department accuracy)/2; if the accuracy is higher than the existing highest accuracy, the case classification model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.
The training process of the department allocation model is the same, and can be simply described as: s321, predicting the department c with the highest probability value of the current sample by using a pre-training model, specifically, embedding a vector E CLS Inputting the linear layer L2, and then predicting the department d with the highest probability value through a softmax layer. S322, taking out the front k from the serial number d position of the related linked list of the department 2 The case categories related to the department d are respectively marked as d 1 ,d 2 …d k2 (ii) a S323, from the department vector group, the serial numbers taken out are d respectively 1 ,d 2 …d k2 Andthe department vectors with sequence number d are respectively marked as Rd 1 ,Rd 2 …Rd k2 Rd; for example, the top 2 most relevant departments of the department 105 are 100,13, and the department vectors on lines 100,13, and 449 are taken from the set of department category vectors. S324, calculating department enhancement vectors corresponding to each department vector
Figure SMS_13
,/>
Figure SMS_16
,…,/>
Figure SMS_19
,/>
Figure SMS_14
Wherein->
Figure SMS_15
Figure SMS_18
In the formula>
Figure SMS_21
The jth lemma vector of the sample represents the dot product; the department vector group is composed of a dimension of [ N depa ,E]Matrix formation of where N depa The number of the gates. S325, sequentially inputting the department enhancement vectors into two linear layers and then inputting the department enhancement vectors into a SOFTMAX layer; for example, the department boost vector ≥ is>
Figure SMS_12
,/>
Figure SMS_17
,/>
Figure SMS_20
,…,/>
Figure SMS_22
Inputting two linear layers L5 and L6, the input and output dimensions of the linear layer L5, the input dimension of the linear layer L6 and the embedding vector E cls Length of (1)Thus, the output dimension of the linear layer L6 is the number of department categories N depa . S326, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method; and S327, after the department allocation model is iteratively trained by using the training sample set, verifying the accuracy of the model by using the verification sample set, and taking the model with the highest verification accuracy as the trained department allocation model.
And S4, acquiring the input case content, processing through a case classification model and a department distribution model, and outputting a predicted case classification result and a predicted division result.
Firstly inputting case content and optionally inputting area information;
and then, predicting according to a pre-training model to obtain the case class number and the department number with the highest probability.
And then, according to the case classification model and the department allocation model, obtaining output results of the linear layer L4 and the linear layer L6, inputting the output result of the linear layer L4 into the softmax layer to obtain a probability value of each case category, and then taking the case serial number with the highest score as a predicted case classification result. And inputting the output result of the linear layer L6 into the softmax layer to obtain the probability value of each department, and then taking the department serial number with the highest probability value as the predicted division result.
And finally, outputting the predicted case classification result and the result of the division department as final output.
In summary, the invention provides an automatic classification and allocation method for government affair hotline cases, which is based on a neural network model and uses a two-step training method to improve the classification accuracy of the model on similar case categories; and integrating administrative division information to improve the accuracy of case allocation of the model. Compared with the existing method, the method provided by the invention has higher case classification and case allocation accuracy. The method can automatically classify the complaints in the convenient government affair hotline and distribute the complaints to the competent departments for processing, thereby reducing the labor cost and improving the intelligent and automatic degree of the hotline.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (4)

1. A government affair hotline case automatic classification and distribution method is characterized by comprising the following steps:
s1, establishing a sample set and training a pre-training model;
s2, traversing a sample set according to a trained pre-training model to generate a case category related linked list and a department related linked list;
s3, predicting case categories and departments in charge with highest probability of samples by using a pre-training model, obtaining case category enhancement vectors and department enhancement vectors by combining case category related linked lists and department related linked lists, and obtaining a case classification model and a department allocation model through training and parameter updating;
and S4, acquiring the input case content, processing through a case classification model and a department distribution model, and outputting a predicted case classification result and a predicted division result.
2. The method for automatically classifying and allocating government hotline cases according to claim 1, wherein the step S1 is implemented as follows:
s11, establishing a sample set, wherein the sample format is [ case information, area information, case category and administrative department ], the area information is optional, and the sample set is divided into a training sample set and a verification sample set according to the proportion;
s12, for each sample in the training sample set, converting case information in the sample into an embedded vector by using a BERT model;
s13, respectively inputting the embedded vectors into two linear layers, outputting each linear layer to an SOFTMAX layer, respectively obtaining case type numbers and department numbers, and updating model parameters of a pre-training model by using a gradient descent method;
and S14, after iterative training is carried out on the pre-training model by using the training sample set, the accuracy of the model is verified by using the verification sample set, the model with the highest verification accuracy is used as the trained pre-training model, and the output of the pre-training model in the prediction stage is the probability value of each case type and department to which the sample belongs.
3. The method for automatically classifying and allocating government affair hotline cases according to claim 2, wherein in the step S2, the process of generating the case category related linked list and the department related linked list is the same;
for generating the case category correlation linked list, the process is as follows:
s211, setting an empty case type correlation linked list, wherein in the correlation linked list, the position serial number x stores the correlation case type of the case type x;
s212, inputting each sample into the trained pre-training model to obtain the probability value of each case type to which the sample belongs, and sequentially outputting the front k with the maximum probability value 1 The serial number of each case type;
s213, comparing the output case categories with the correct case categories of the sample one by one in sequence, and ending when the comparison is consistent or all the comparisons are inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the sample correct case type serial number of the related linked list, and rearranging the information from large to small according to the counting; if the current case type is y and the correct case type is z, the information recording rule is as follows: if the information of the correct case type z exists in the position y of the related linked list, adding 1 to the information count corresponding to the correct case type z; if there is no correct case category z information, add information { case category z: count 1}.
4. The method for automatically classifying and allocating government affair hotline cases according to claim 3, wherein in the step S3, the process of obtaining the case classification model and the department allocation model is the same;
the specific process for obtaining the case classification model is as follows:
s311, predicting the case type c with the highest probability value of the current sample by using a pre-training model;
s312, taking out from the case type related linked list sequence number cFront k 2 The case categories related to the case category c are respectively marked as c 1 ,c 2 …c k2
S313, taking out the case type vector group with the serial numbers c 1 ,c 2 …c k2 And case type vector with sequence number c, respectively denoted as Rc 1 ,Rc 2 …Rc k2 ,Rc;
S314, calculating case type enhancement vectors corresponding to the case type vectors
Figure QLYQS_1
,/>
Figure QLYQS_2
,…,/>
Figure QLYQS_3
,/>
Figure QLYQS_4
In which
Figure QLYQS_5
,/>
Figure QLYQS_6
In the formula>
Figure QLYQS_7
The jth lemma vector of the sample represents the dot product;
s315, inputting the case type enhancement vectors to two linear layers in sequence, and then inputting the case type enhancement vectors to a SOFTMAX layer;
s316, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method;
s317, after iterative training is carried out on the case classification model through the training sample set, the accuracy of the model is verified through the verification sample set, and the version of the model with the highest verification accuracy is used as the well-trained case classification model.
CN202310228000.6A 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases Active CN115935245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310228000.6A CN115935245B (en) 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310228000.6A CN115935245B (en) 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases

Publications (2)

Publication Number Publication Date
CN115935245A true CN115935245A (en) 2023-04-07
CN115935245B CN115935245B (en) 2023-05-26

Family

ID=85818574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310228000.6A Active CN115935245B (en) 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases

Country Status (1)

Country Link
CN (1) CN115935245B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611453A (en) * 2023-07-19 2023-08-18 天津奇立软件技术有限公司 Intelligent order-distributing and order-following method and system based on big data and storage medium
CN116861302A (en) * 2023-09-05 2023-10-10 吉奥时空信息技术股份有限公司 Automatic case classifying and distributing method

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239529A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of public sentiment hot category classification method based on deep learning
US20180204135A1 (en) * 2017-01-18 2018-07-19 Wipro Limited Systems and methods for improving accuracy of classification-based text data processing
WO2019149200A1 (en) * 2018-02-01 2019-08-08 腾讯科技(深圳)有限公司 Text classification method, computer device, and storage medium
WO2019214133A1 (en) * 2018-05-08 2019-11-14 华南理工大学 Method for automatically categorizing large-scale customer complaint data
CN111177367A (en) * 2019-11-11 2020-05-19 腾讯科技(深圳)有限公司 Case classification method, classification model training method and related products
CN112488551A (en) * 2020-12-11 2021-03-12 浪潮云信息技术股份公司 XGboost algorithm-based hot line intelligent order dispatching method
CN112581106A (en) * 2021-02-23 2021-03-30 苏州工业园区测绘地理信息有限公司 Government affair event automatic order dispatching method fusing grid semantics of handling organization
CN112800232A (en) * 2021-04-01 2021-05-14 南京视察者智能科技有限公司 Big data based case automatic classification and optimization method and training set correction method
WO2022093982A1 (en) * 2020-10-30 2022-05-05 Convey, Llc Machine learning event classification and automated case creation
CN114547315A (en) * 2022-04-25 2022-05-27 湖南工商大学 Case classification prediction method and device, computer equipment and storage medium
CN115242487A (en) * 2022-07-19 2022-10-25 浙江工业大学 APT attack sample enhancement and detection method based on meta-behavior
CN115344695A (en) * 2022-07-27 2022-11-15 中国人民解放军空军工程大学 Service text classification method based on field BERT model
CN115455315A (en) * 2022-11-10 2022-12-09 吉奥时空信息技术股份有限公司 Address matching model training method based on comparison learning
CN115659974A (en) * 2022-09-30 2023-01-31 中国科学院软件研究所 Software security public opinion event extraction method and device based on open source software supply chain

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180204135A1 (en) * 2017-01-18 2018-07-19 Wipro Limited Systems and methods for improving accuracy of classification-based text data processing
CN107239529A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of public sentiment hot category classification method based on deep learning
WO2019149200A1 (en) * 2018-02-01 2019-08-08 腾讯科技(深圳)有限公司 Text classification method, computer device, and storage medium
WO2019214133A1 (en) * 2018-05-08 2019-11-14 华南理工大学 Method for automatically categorizing large-scale customer complaint data
CN111177367A (en) * 2019-11-11 2020-05-19 腾讯科技(深圳)有限公司 Case classification method, classification model training method and related products
WO2022093982A1 (en) * 2020-10-30 2022-05-05 Convey, Llc Machine learning event classification and automated case creation
CN112488551A (en) * 2020-12-11 2021-03-12 浪潮云信息技术股份公司 XGboost algorithm-based hot line intelligent order dispatching method
CN112581106A (en) * 2021-02-23 2021-03-30 苏州工业园区测绘地理信息有限公司 Government affair event automatic order dispatching method fusing grid semantics of handling organization
CN112800232A (en) * 2021-04-01 2021-05-14 南京视察者智能科技有限公司 Big data based case automatic classification and optimization method and training set correction method
CN114547315A (en) * 2022-04-25 2022-05-27 湖南工商大学 Case classification prediction method and device, computer equipment and storage medium
CN115242487A (en) * 2022-07-19 2022-10-25 浙江工业大学 APT attack sample enhancement and detection method based on meta-behavior
CN115344695A (en) * 2022-07-27 2022-11-15 中国人民解放军空军工程大学 Service text classification method based on field BERT model
CN115659974A (en) * 2022-09-30 2023-01-31 中国科学院软件研究所 Software security public opinion event extraction method and device based on open source software supply chain
CN115455315A (en) * 2022-11-10 2022-12-09 吉奥时空信息技术股份有限公司 Address matching model training method based on comparison learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JINGYU LUO 等: "Research on Civic Hotline Complaint Text Classification Model Based on word2vec", 《2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY》 *
薛彬: "针对民生热线文本的热点挖掘***设计", 《中国计量大学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611453A (en) * 2023-07-19 2023-08-18 天津奇立软件技术有限公司 Intelligent order-distributing and order-following method and system based on big data and storage medium
CN116611453B (en) * 2023-07-19 2023-10-03 天津奇立软件技术有限公司 Intelligent order-distributing and order-following method and system based on big data and storage medium
CN116861302A (en) * 2023-09-05 2023-10-10 吉奥时空信息技术股份有限公司 Automatic case classifying and distributing method
CN116861302B (en) * 2023-09-05 2024-01-23 吉奥时空信息技术股份有限公司 Automatic case classifying and distributing method

Also Published As

Publication number Publication date
CN115935245B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN115935245A (en) Automatic classification and distribution method for government affair hotline cases
CN109446331B (en) Text emotion classification model establishing method and text emotion classification method
CN109325116B (en) Urban event automatic classification dispatching method and device based on deep learning
CN102591854B (en) For advertisement filtering system and the filter method thereof of text feature
CN111694924A (en) Event extraction method and system
CN112182246B (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
CN110750635A (en) Joint deep learning model-based law enforcement recommendation method
CN109543764B (en) Early warning information validity detection method and detection system based on intelligent semantic perception
CN112036842B (en) Intelligent matching device for scientific and technological service
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN111125520B (en) Event line extraction method based on deep clustering model for news text
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN111581368A (en) Intelligent expert recommendation-oriented user image drawing method based on convolutional neural network
CN115936624A (en) Basic level data management method and device
CN116541755A (en) Financial behavior pattern analysis and prediction method based on time sequence diagram representation learning
CN115905538A (en) Event multi-label classification method, device, equipment and medium based on knowledge graph
CN109543038B (en) Emotion analysis method applied to text data
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN116843162B (en) Contradiction reconciliation scheme recommendation and scoring system and method
CN113920379A (en) Zero sample image classification method based on knowledge assistance
CN110705310B (en) Article generation method and device
CN113742498B (en) Knowledge graph construction and updating method
CN115688729A (en) Power transmission and transformation project cost data integrated management system and method thereof
CN113657091A (en) Government affair hot line work order allocation method based on event extraction and authority list
CN111782803B (en) Work order processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant