CN115935245A

CN115935245A - Automatic classification and distribution method for government affair hotline cases

Info

Publication number: CN115935245A
Application number: CN202310228000.6A
Authority: CN
Inventors: 杨伊态; 李颖; 李军霞; 王敬佩; 柯宝宝; 黄亚林; 张兆文; 李成涛; 陈胜鹏; 付卓
Original assignee: Geospace Information Technology Co ltd
Current assignee: Geospace Information Technology Co ltd
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-04-07
Anticipated expiration: 2043-03-10
Also published as: CN115935245B

Abstract

The invention is suitable for the field of artificial intelligence, and provides an automatic classification and allocation method for government affair hotline cases, which comprises the following steps: s1, establishing a sample set and training a pre-training model; s2, traversing a sample set according to a trained pre-training model to generate a case category related linked list and a department related linked list; s3, predicting case categories and departments in charge with highest probability of samples by using a pre-training model, obtaining case category enhancement vectors and department enhancement vectors by combining case category related linked lists and department related linked lists, and obtaining a case classification model and a department allocation model through training and parameter updating; and S4, acquiring the input case content, processing through a case classification model and a department distribution model, and outputting a predicted case classification result and a predicted division result. The method has higher accuracy rate of case classification and case distribution, can automatically classify complaints in a convenient government hot line and distribute the complaints to competent departments for processing.

Description

Automatic classification and distribution method for government affair hotline cases

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to an automatic classification and allocation method for government affair hotline cases.

Background

In the government affair service hotline, citizens report appeal information in the modes of calling, weChat small programs, APP, portal website message leaving and the like, and a wire connector completes case classification according to appeal contents and then distributes to related responsibility units and disposal departments. With the wide application of the government affair service hotline, on one hand, the acceptance amount of hotline cases is increased day by day, the manual disposal cost is increased day by day, and when a social heat problem occurs, the seats are often busy and cannot meet the requirements of vast citizens; on the other hand, due to the fact that the cases of the convenient government affair hotline are classified more, the distinction among the cases is not obvious, the case association departments are wide, and the levels are complex, the realization of rapid and accurate case classification and allocation becomes a difficult problem to be solved urgently for the improvement effect of the government affair service hotline office work.

At present, the automatic classification and allocation method of the government affair hotline is roughly divided into three categories:

the first category is rule decision tree based approaches. Such methods first design filtering and matching rules and then classify cases according to the rules. Such as keyword-based matching, knowledge base-based lookup, etc. The method has good effect on cases with obvious category distinction, but has poor accuracy on case classification and allocation with complicated categories or similar categories.

The second category is machine learning based methods. The method classifies and dispatches case texts by designing a machine learning algorithm or a model. Common methods include XGboost algorithm, cosine similarity and SVM support vector machine. The method can learn more category features and partial semantic features, and the accuracy rate of classifying and allocating cases with complex categories or similar categories is improved but not ideal compared with that of the first category.

The third category is neural network based methods. The method extracts deep semantic features of case texts by designing a multilayer neural network, and has higher accuracy in classification of similar case categories and distribution of similar departments compared with a machine learning method. However, the existing neural network-based method has poor accuracy in case classification with high similarity and also has poor accuracy in departments which classify cases of different classes into different administrative levels.

In order to realize efficient case classification and distribution in business, at present, a computer is used for replacing manual work to realize automatic case classification by a method based on a design rule decision tree, a machine learning algorithm and a deep neural network, and then the case classification and the distribution are distributed to relevant responsibility departments for disposal. However, the existing method has low accuracy for classifying categories which are not obvious, such as the case categories of 'construction site noise (daytime)', 'construction site noise (night)', and 'business noise problem'; the accuracy rate of the division allocation of the departments belonging to the same class and different levels is low, and cases processing ranges between departments of 'city management committee' and 'district management committee' are difficult to distinguish.

Disclosure of Invention

In view of the above problems, the present invention aims to provide an automatic classification and allocation method for government affair hotline cases, which aims to solve the technical problem of low accuracy of the existing method.

The invention adopts the following technical scheme:

the invention provides an automatic classification and allocation method for government affair hotline cases, which comprises the following steps:

s1, establishing a sample set and training a pre-training model;

s2, traversing a sample set according to a trained pre-training model to generate a case category related linked list and a department related linked list;

s3, predicting case categories and departments in charge with highest probability of samples by using a pre-training model, obtaining case category enhancement vectors and department enhancement vectors by combining case category related linked lists and department related linked lists, and obtaining a case classification model and a department allocation model through training and parameter updating;

and S4, acquiring the input case content, processing through a case classification model and a department distribution model, and outputting a predicted case classification result and a predicted division result.

The invention has the beneficial effects that: the invention provides an automatic classification and allocation method for government affair hotline cases, which is based on a neural network model, wherein a case classification model improves the classification accuracy of the model to similar case categories by mainly training the differences of the similar categories; the department allocation model improves the accuracy of the model for allocating cases by fusing administrative division information and the difference of key training similar case administrative departments. Compared with the existing method, the method has higher accuracy in case classification and case allocation, can automatically classify the complaints in the convenient government affair hotline and allocate the complaints to the competent departments for processing, improves the service efficiency of the convenient government affair hotline, reduces the labor cost, improves the intellectualization and automation degree of the hotline, and improves the civil service level.

Drawings

FIG. 1 is a flow chart of an automatic classification and allocation method for government affair hotline cases according to an embodiment of the present invention;

FIG. 2 is a schematic process diagram of a pre-training model provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of case category correlation linked list generation;

FIG. 4 is a process diagram of case classification model and department assignment model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

As shown in fig. 1, the method for automatically classifying and allocating government affairs hotline cases provided by this embodiment includes the following steps:

s1, establishing a sample set and training a pre-training model.

The step is used for establishing a pre-training model and fine-tuning parameters of the pre-training model, and the specific process of the step is as follows with reference to fig. 2:

s11, establishing a sample set, wherein the sample format is [ case information, area information, case category and administrative department ], the area information is optional, and the sample set is divided into a training sample set and a verification sample set according to the proportion.

For example, sample a [ "day noise makes people go to sleep, complains, and no person is in charge, too slow, and" Da san ","440", and" 105 "after one hour, where 440 and 105 are case type code and department code, respectively.

And S12, for each sample in the training sample set, converting the case information in the sample into an embedded vector by using a BERT model.

Firstly, if the sample has the area information, combining the area information with the case information to obtain the combined case information. Such as: the case information after the sample a is combined is that noise in a Da zone everyday makes people not sleep and complains, and after one hour, people do not have a pipe, and the case information is too slow and too slow.

Then, the merged case information is converted into corresponding morpheme codes by a BERT word segmentation device, and special character codes are added at the head and the tail of the case information to form the morpheme codes of the case information. In this embodiment, the BERT model is Chinese-BERT-wwm-extBERT (Bidirectional Encoder Representation from transformations).

Such as: the information after the sample a is combined is converted into a word element code as follows: [101, 1515, 1515, 1277, 1921, 1692, 7509, 6375, 782, 3187, 3791, 4717, 6230, 8024, 2347, 2832, 6401, 8024, 6814, 749, 671, 702, 2207, 3198, 738, 3766, 782, 5052, 8024, 1922, 2714, 8024, 1922, 2714, 102]. Where 101 is the encoding of the special character 'CLS' and 102 is the encoding of the special character 'SEP'.

Finally, the morpheme code of case information is input into a BERT model to obtain an embedded vector E of a special character' CLS _CLS And a token vector [ E ] for each token ₁ ,E ₂ ,E ₃ …E _n ]And n is the number of the lemma.

The outputs of the BERT model are usually mainly two, one is the embedded vector of the special character "CLS", and the other is the lemma vector of each word. The CLS embedded vector is obtained by Pool processing all the lemma vectors. Simple classification typically uses the embedded vectors of CLS directly, but individual token vectors can be used if the model requires each token vector. In general, the embedded vector of the CLS is not used together with the lemma vector, and the embedded vector of the CLS is essentially derived from each lemma vector and represents the semantic features of the whole text. The embedded vectors obtained after processing are different for different samples.

S13, respectively inputting the embedded vectors into two linear layers, outputting each linear layer to a SOFTMAX layer to respectively obtain a case type number and a department under charge number, and updating model parameters of the pre-training model by using a gradient descent method.

The two linear layers are linear layers L1 and L2, respectively. In the training phase of the pre-training model, the embedded vector E _CLS The linear layer L1 is input, and then the loss value is calculated using the cross entropy function, and the model parameters are updated using the gradient descent method. Input dimension and embedding vector E of linear layer L1 _CLS The output dimension is the case category number N _class . In the same way, vector E will be embedded _CLS Inputting a linear layer L2, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent methodAnd (4) counting. Input dimension and embedding vector E of linear layer L2 _CLS The output dimension is the number of departments N _depa . This embodiment uses a pytorech framework and the cross entropy function used is Cross EntropyLoss ().

When used in subsequent models, the vector E is embedded during the prediction phase _CLS Inputting the case sequence number into a linear layer L1, then inputting a softmax layer to obtain the probability value of each case category, and then taking the case sequence number with the highest probability value as a prediction result. In the same way, vector E will be embedded _CLS Inputting the linear layer L2, then inputting the softmax layer to obtain the probability value of each department, and then taking the department serial number with the highest probability value as a prediction result.

And S14, after the pre-training model is subjected to iterative training by using the training sample set, verifying the accuracy of the model by using the verification sample set, and taking the model with the highest verification accuracy as the trained pre-training model, wherein the output of the pre-training model in the prediction stage is the probability value of each case type and department to which the sample belongs.

And (4) performing repeated iterative training on the training set according to the pre-training model in the above mode, verifying the accuracy of the model by using the verification sample set, and selecting the model with the highest accuracy. In the concrete implementation:

firstly, after the pre-training model traverses all training sample sets, parameters are frozen, in the embodiment, a pitoch frame is used, and parameters are frozen by using a model.

And then inputting each sample in the verification sample set into a pre-training model, and obtaining a corresponding predicted case category and a predicted department for each sample. For each validation sample, the prediction is correct if the predicted case category is consistent with the case categories in the sample, otherwise the prediction is wrong. The department prediction mode is the same. Wherein the accuracy is the number of correct verification divided by the total number of verification samples. The total verification accuracy of the pre-training model is (case accuracy + department accuracy)/2. If the accuracy is higher than the existing highest accuracy, the model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.

And S2, traversing the sample set according to the trained pre-training model to generate a case category related linked list and a department related linked list.

The process of generating the case category correlation linked list is as follows:

s211, setting an empty case type correlation linked list, wherein in the correlation linked list, the position sequence number x stores the correlation case type of the case type x, and the length of the correlation linked list is N _class ；

S212, inputting each sample into the trained pre-training model to obtain the probability value of each case type to which the sample belongs, and sequentially outputting the front k with the maximum probability value ₁ The serial number of each case type;

s213, comparing the output case types with the correct case types of the sample one by one in sequence, and ending when the comparison is consistent or the comparison is not consistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the sample correct case type serial number of the related linked list, and rearranging the information from large to small according to the count; if the current case type is y and the correct case type is z, the information recording rule is as follows: if the information of the correct case type z exists in the position y of the related linked list, adding 1 to the information count corresponding to the correct case type z; if there is no correct case category z information, add information { case category z: count 1}.

In the step, each sample is used with a trained pre-training model to obtain the probability value of each case category to which the sample belongs, and the front k is selected ₁ The case category with the maximum probability value is output as k according to the probability value from large to small ₁ The serial number of each case category.

And then comparing the output case types with the case types with correct samples one by one, starting from the case type with the maximum probability value, and recording information in the position y of the case type related linked list if the current case type y is inconsistent with the correct case type z recorded by the samples. The rule of recording information is that if the information of case type z exists in the position y of the related linked list, the information count corresponding to the correct case type z is increased by 1; if there is no correct case category z information, add information { case category z: count 1}.

Repeat the comparison until case type y matches the correct case type z for the sample record, or k ₁ And stopping comparing all the case types. When the model prediction is correct, the first comparison is consistent, and no information is recorded; when k is predicted by the model ₁ If the case categories are not consistent, k will be recorded ₁ And (4) secondary information.

The case category correlation linked list shown in fig. 3 is generated schematically, and the case category is taken first 3 times with 4 samples as input.

Input sample 1, model prediction result of pre-training model 1: the case category of the top 3 predicted probability values is: 449 326, 11; the correct case category is 326;

the predicted case category 449 does not accord with the correct case category 326, and a position of a 449 th linked list related to the case category is recorded as { 326;

the predicted case category 326 matches the correct case category 326, and the process ends.

Input sample 2, model prediction result of pre-trained model 2: the case category of the top 3 predicted probability values is as follows: 449,440,326; the correct case category is 326;

the predicted case category 449 is inconsistent with the correct case category 326, and the record related to the correct case category 326 is already recorded at the position of the 449 th link table of the case category, so that the count of the correct case category 326 is only increased by 1 to become { 326;

the predicted case category 440 is inconsistent with the correct case category 326, and a { 326;

Input sample 3, model prediction result of pre-training model 3: the case category of the top 3 predicted probability values is as follows: 449,440,1; the correct case category is 1;

the predicted case category 449 does not coincide with the correct case category 1, and a { 1.

Sorting the 449 position information in the case category related linked list according to the category count, and recording as follows: {326:2},{1:1}.

The predicted case category 440 is inconsistent with the correct case category 1, and a { 1.

And sorting the 440 position information in the case category related linked list according to the category counting size, and recording as follows: {326:1},{1:1}.

The predicted case type 1 matches the correct case type 1, and the process ends.

Input sample 4, model prediction result of pre-training model 4: the case category of the top 3 predicted probability values is: 1,2,449; the correct case category is 1.

And completing the generation of the case category correlation linked list.

The generation mode of the department related linked list is the same as the case type related linked list, and the generation mode is as follows: s321, setting an empty department related linked list; s322, inputting each sample into the trained pre-training model to obtain the probability value of each department to which the sample belongs, and sequentially outputting the front k with the maximum probability value ₁ The serial number of each department; s323, comparing the output departments with correct departments of the sample one by one in sequence, and ending when the first comparison is consistent or all comparisons are inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the current department serial number of the related linked list, and rearranging the information from large to small according to the count; if the current department is y 'and the correct department is z', the information recording rule is as follows: if the information of the correct department z ' exists in the position y ' of the related linked list, adding 1 to the information count corresponding to the correct department z '; if there is no information of the correct department z ', add information { department z': count 1}.

And S3, predicting the case type and the department in charge with the highest probability of the sample by using the pre-training model, obtaining a case type enhanced vector and a department enhanced vector by combining the case type related linked list and the department related linked list, and obtaining a case classification model and a department allocation model by training and parameter updating.

As shown in FIG. 4, the case classification model and the department distribution model are trained in the same process. For obtaining a case classification model, the specific process is as follows:

s311, predicting the case type c with the highest probability value of the current sample by using the pre-training model.

Firstly, a sample set is input, the sample set is divided into a training sample set and a verification sample set according to the proportion, and the case class c with the highest probability of the sample is predicted through a pre-training model.

After the sample a is converted into the lexical element code input bert model, the embedded vector E is obtained _CLS And a lemma vector [ E ₁ ,E ₂ ,E ₃ …E _n ]. Embedding vector E _CLS Inputting the linear layer L1, and then predicting the case class c with the highest probability value through a softmax layer. Assume that the predicted case category is 449 and department is 105.

S312, taking out the front k from the serial number c position of the case type related linked list ₂ The case categories related to the case category c are respectively marked as c ₁ ,c ₂ …c _k2 。

S313, taking out the case type vector group with the serial numbers c ₁ ,c ₂ …c _k2 And case type vector with sequence number c, respectively denoted as Rc ₁ ,Rc ₂ …Rc _k2 ,Rc。

The case category vector group is composed of a dimension of [ N ] _class ，E]Matrix formation of where N _class The number of case categories, E the lemma dimension, and the embedding vector E _cls Is uniform, typically 768.

Such as: assume that the predicted case category is 449.K ₂ =2, top 2 most relevant category of case category 449 362,1; the case category vectors on

lines

362,1 and 449 are taken from the set of case category vectors.

S314, calculating case type enhancement vectors corresponding to the case type vectors

,/>

,…,/>

,/>

Wherein

，/>

In the formula>

The jth lemma vector of a sample represents a dot product.

S315, inputting the case type enhancement vectors into two linear layers in sequence, and then inputting the case type enhancement vectors into a SOFTMAX layer.

Enhancing case categories by vectors

,/>

,…,/>

,/>

Inputting two linear layers L3 and L4, the input and output dimensions of the linear layer L3, the input dimension of the linear layer L4 and the embedding vector E _cls The lengths of the linear layers L4 are all consistent, and the output dimension of the linear layer L4 is the case class number N _class 。

For example, for the category vector corresponding to the case category 362, the dimension is [1, e ], the vector of each token is [1, e ], the category vector is multiplied by each token point to obtain n category token vectors with the dimension of [1, e ], and then the n category token vectors are added to obtain a case category enhanced vector with the dimension of [1, e ].

Then the dimension [1, E ]]Case class enhancement vector ofThe linear layer L3 is input, and the resulting dimension is [1,E ]](ii) a The result is then input into the linear layer L4, resulting in a result dimension of [1, N' _class ]. And finally inputting the result into a SOFTMAX layer for normalization.

S316, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method, wherein only the linear layers L4 and L5 and the case category vector group are trained, and the parameters of the Bert model, the linear layer L1 and the linear layer L2 are frozen, namely the parameters are not updated in the iterative training of the pre-training model.

S317, after iterative training is carried out on the case classification model through the training sample set, the accuracy of the model is verified through the verification sample set, and the version of the model with the highest verification accuracy is used as the well-trained case classification model.

And after the case classification model traverses all the training sample sets, freezing the parameters. The present embodiment uses the pytorech framework and uses the model. Inputting each sample in the verification sample set into a case classification model, and obtaining a corresponding predicted case type for each sample; for each validation sample, the prediction is correct if the predicted case category is consistent with the case categories in the sample, otherwise the prediction is wrong. The accuracy is the number of correct verification divided by the total number of verification samples. The verification accuracy of the case classification model is (case accuracy + department accuracy)/2; if the accuracy is higher than the existing highest accuracy, the case classification model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.

The training process of the department allocation model is the same, and can be simply described as: s321, predicting the department c with the highest probability value of the current sample by using a pre-training model, specifically, embedding a vector E _CLS Inputting the linear layer L2, and then predicting the department d with the highest probability value through a softmax layer. S322, taking out the front k from the serial number d position of the related linked list of the department ₂ The case categories related to the department d are respectively marked as d ₁ ,d ₂ …d _k2 (ii) a S323, from the department vector group, the serial numbers taken out are d respectively ₁ ,d ₂ …d _k2 Andthe department vectors with sequence number d are respectively marked as Rd ₁ ,Rd ₂ …Rd _k2 Rd; for example, the top 2 most relevant departments of the department 105 are 100,13, and the department vectors on lines 100,13, and 449 are taken from the set of department category vectors. S324, calculating department enhancement vectors corresponding to each department vector

,/>

,…,/>

,/>

Wherein->

，

In the formula>

The jth lemma vector of the sample represents the dot product; the department vector group is composed of a dimension of [ N _depa ，E]Matrix formation of where N _depa The number of the gates. S325, sequentially inputting the department enhancement vectors into two linear layers and then inputting the department enhancement vectors into a SOFTMAX layer; for example, the department boost vector ≥ is>

,/>

,/>

,…,/>

Inputting two linear layers L5 and L6, the input and output dimensions of the linear layer L5, the input dimension of the linear layer L6 and the embedding vector E _cls Length of (1)Thus, the output dimension of the linear layer L6 is the number of department categories N _depa . S326, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method; and S327, after the department allocation model is iteratively trained by using the training sample set, verifying the accuracy of the model by using the verification sample set, and taking the model with the highest verification accuracy as the trained department allocation model.

Firstly inputting case content and optionally inputting area information;

and then, predicting according to a pre-training model to obtain the case class number and the department number with the highest probability.

And then, according to the case classification model and the department allocation model, obtaining output results of the linear layer L4 and the linear layer L6, inputting the output result of the linear layer L4 into the softmax layer to obtain a probability value of each case category, and then taking the case serial number with the highest score as a predicted case classification result. And inputting the output result of the linear layer L6 into the softmax layer to obtain the probability value of each department, and then taking the department serial number with the highest probability value as the predicted division result.

And finally, outputting the predicted case classification result and the result of the division department as final output.

In summary, the invention provides an automatic classification and allocation method for government affair hotline cases, which is based on a neural network model and uses a two-step training method to improve the classification accuracy of the model on similar case categories; and integrating administrative division information to improve the accuracy of case allocation of the model. Compared with the existing method, the method provided by the invention has higher case classification and case allocation accuracy. The method can automatically classify the complaints in the convenient government affair hotline and distribute the complaints to the competent departments for processing, thereby reducing the labor cost and improving the intelligent and automatic degree of the hotline.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. A government affair hotline case automatic classification and distribution method is characterized by comprising the following steps:

s1, establishing a sample set and training a pre-training model;

2. The method for automatically classifying and allocating government hotline cases according to claim 1, wherein the step S1 is implemented as follows:

s11, establishing a sample set, wherein the sample format is [ case information, area information, case category and administrative department ], the area information is optional, and the sample set is divided into a training sample set and a verification sample set according to the proportion;

s12, for each sample in the training sample set, converting case information in the sample into an embedded vector by using a BERT model;

s13, respectively inputting the embedded vectors into two linear layers, outputting each linear layer to an SOFTMAX layer, respectively obtaining case type numbers and department numbers, and updating model parameters of a pre-training model by using a gradient descent method;

and S14, after iterative training is carried out on the pre-training model by using the training sample set, the accuracy of the model is verified by using the verification sample set, the model with the highest verification accuracy is used as the trained pre-training model, and the output of the pre-training model in the prediction stage is the probability value of each case type and department to which the sample belongs.

3. The method for automatically classifying and allocating government affair hotline cases according to claim 2, wherein in the step S2, the process of generating the case category related linked list and the department related linked list is the same;

for generating the case category correlation linked list, the process is as follows:

s211, setting an empty case type correlation linked list, wherein in the correlation linked list, the position serial number x stores the correlation case type of the case type x;

s213, comparing the output case categories with the correct case categories of the sample one by one in sequence, and ending when the comparison is consistent or all the comparisons are inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the sample correct case type serial number of the related linked list, and rearranging the information from large to small according to the counting; if the current case type is y and the correct case type is z, the information recording rule is as follows: if the information of the correct case type z exists in the position y of the related linked list, adding 1 to the information count corresponding to the correct case type z; if there is no correct case category z information, add information { case category z: count 1}.

4. The method for automatically classifying and allocating government affair hotline cases according to claim 3, wherein in the step S3, the process of obtaining the case classification model and the department allocation model is the same;

the specific process for obtaining the case classification model is as follows:

s311, predicting the case type c with the highest probability value of the current sample by using a pre-training model;

s312, taking out from the case type related linked list sequence number cFront k ₂ The case categories related to the case category c are respectively marked as c ₁ ,c ₂ …c _k2 ；

S313, taking out the case type vector group with the serial numbers c ₁ ,c ₂ …c _k2 And case type vector with sequence number c, respectively denoted as Rc ₁ ,Rc ₂ …Rc _k2 ,Rc；

,/>

,…,/>

,/>

In which

，/>

In the formula>

The jth lemma vector of the sample represents the dot product;

s315, inputting the case type enhancement vectors to two linear layers in sequence, and then inputting the case type enhancement vectors to a SOFTMAX layer;

s316, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method;