CN111695052A - Label classification method, data processing device and readable storage medium - Google Patents

Label classification method, data processing device and readable storage medium Download PDF

Info

Publication number
CN111695052A
CN111695052A CN202010537640.1A CN202010537640A CN111695052A CN 111695052 A CN111695052 A CN 111695052A CN 202010537640 A CN202010537640 A CN 202010537640A CN 111695052 A CN111695052 A CN 111695052A
Authority
CN
China
Prior art keywords
processed
data
label
features
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010537640.1A
Other languages
Chinese (zh)
Inventor
沈大框
张莹
陈成才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiaoi Robot Technology Co Ltd
Original Assignee
Shanghai Xiaoi Robot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xiaoi Robot Technology Co Ltd filed Critical Shanghai Xiaoi Robot Technology Co Ltd
Priority to CN202010537640.1A priority Critical patent/CN111695052A/en
Publication of CN111695052A publication Critical patent/CN111695052A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9562Bookmark management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A label classification method, a data processing device and a readable storage medium, wherein the method comprises the following steps: acquiring data to be processed, wherein the data to be processed comprises corpora to be processed; extracting semantic features of the data to be processed, and performing logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed; calculating the numerical value of each candidate category label based on the fusion characteristics of the data to be processed so as to represent the association degree of each candidate category label and the corpus to be processed; and obtaining candidate category labels with the numerical values meeting a preset first selection condition based on the numerical values of the candidate category labels to obtain a category label prediction set. By adopting the scheme, the accuracy of the label classification prediction result can be improved.

Description

Label classification method, data processing device and readable storage medium
Technical Field
The embodiment of the specification relates to the technical field of information processing, in particular to a label classification method, data processing equipment and a readable storage medium.
Background
In the era of explosion of internet information, in order to quickly acquire required information from massive information of the internet, internet information is classified and labeled with a label (Tag) of corresponding classification, and the label is usually represented by a key feature which has strong information relevance and is convenient to identify, so that a user can search and filter the label.
At present, label labeling of internet information usually adopts two modes of manual classification and automatic classification. The manual mode is high in cost and low in efficiency, and cannot meet the growth speed of internet information. A large amount of training data are needed to train the label classification model in the early stage of the automatic classification mode, and the existing label classification model is weak in structural generalization capability and poor in universality, so that the accuracy of a label classification prediction result is low.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a label classification method, a data processing device, and a readable storage medium, which can improve the accuracy of a label classification prediction result.
An embodiment of the present specification provides a tag classification method, including:
acquiring data to be processed, wherein the data to be processed comprises corpora to be processed;
extracting semantic features of the data to be processed, and performing logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed;
calculating the numerical value of each candidate category label based on the fusion characteristics of the data to be processed so as to represent the association degree of each candidate category label and the corpus to be processed;
and obtaining candidate category labels with the numerical values meeting a preset first selection condition based on the numerical values of the candidate category labels to obtain a category label prediction set.
An embodiment of the present specification further provides a tag classification method, including:
acquiring data to be processed, wherein the data to be processed comprises corpora to be processed;
inputting the data to be processed into a preset label classification model to extract semantic features of the data to be processed, performing logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed, calculating values of all candidate class labels based on the fusion features of the data to be processed, obtaining candidate class labels of which the values meet preset first selection conditions, and obtaining a class label prediction set.
The embodiment of the invention also provides data processing equipment, which comprises a memory and a processor; wherein the memory is adapted to store one or more computer instructions which, when executed by the processor, perform the steps of the method of any of the above embodiments.
The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the steps of the method described in any of the above embodiments are performed.
By adopting the tag classification scheme of the embodiment of the specification, after the data to be processed is obtained, the extracted semantic features of the data to be processed and the extracted semantic information in the semantic features can be fused by performing logic operation on the extracted semantic features of the data to be processed and the data to be processed, so that the influence of semantic feature extraction errors or key semantic information loss on the tag classification prediction result is avoided, the fused features contain rich semantic information, the data to be processed with complicated content or variable sources can be represented, the flexible processing of a single tag task or a multi-tag classification task is facilitated, the numerical value of each candidate class tag can be calculated more accurately, the correct candidate class tag is obtained to represent the classification information existing in the corpus to be processed, and the accuracy of the tag classification result is improved.
Drawings
FIG. 1 is a flow chart of a tag classification method in an embodiment of the present disclosure;
FIG. 2 is a flow chart of another method of tag classification in an embodiment of the present description;
FIG. 3 is a schematic structural diagram of a tag classification model in an embodiment of the present specification;
FIG. 4 is a schematic structural diagram of an iteration layer in an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of another tag classification model in an embodiment of the present specification;
FIG. 6 is a schematic structural diagram of another tag classification model in an embodiment of the present specification;
FIG. 7 is a schematic structural diagram of another tag classification model in an embodiment of the present specification;
FIG. 8 is a flowchart of a method for training a label classification model in an embodiment of the present disclosure;
FIG. 9 is a flow chart of another method for training a label classification model in an embodiment of the present description;
fig. 10 is a schematic structural diagram of another tag classification model in this embodiment.
Detailed Description
As described above, in the era of explosion of internet information, in order to quickly obtain required information from mass information of the internet, internet information is classified and labeled with labels (tags) of corresponding classifications. At present, label labeling of internet information usually adopts two modes of manual work and machine learning.
The manual mode is high in cost and low in efficiency, and cannot meet the growth speed of internet information. In the early stage of the machine learning mode, a large amount of training data is needed to train the label classification model.
However, the conventional label Classification model has weak generalization capability and poor universality, can only perform single label Classification on network information, and cannot efficiently process more complex Multi-label Classification (Multi-label Classification) tasks.
This is because in the multi-label classification task, it is necessary to characterize the content information of a picture or document with a plurality of class labels. Therefore, the preset category label sets are not completely independent, but have certain dependency relationship or mutual exclusion relationship. However, the number of labels often involved in the multi-label classification task is large, and complex association among class labels is formed, so that the multi-label classification task is more difficult to analyze compared with the single-label classification task, the construction difficulty and the training difficulty of a label classification model are increased, and the accuracy of a label classification prediction result is low.
In view of the above problems, embodiments of the present specification provide a tag classification scheme, where after data to be processed is acquired, fusion features of the data to be processed can be obtained by extracting semantic features of the data to be processed and performing logical operation on the extracted semantic features and the data to be processed, so as to calculate values of candidate class tags according to the fusion features of the data to be processed, and obtain a class tag prediction set used for representing classification information in the corpus to be processed.
For the purpose of enabling those skilled in the art to more clearly understand and practice the concepts, implementations and advantages of the embodiments of the present disclosure, detailed descriptions are provided below through specific application scenarios with reference to the accompanying drawings.
Referring to a flowchart of a tag classification method shown in fig. 1, in an embodiment of the present specification, the tag classification method may include the following steps:
and S11, acquiring data to be processed, wherein the data to be processed comprises linguistic data to be processed.
In a specific implementation, the to-be-processed data may include to-be-processed corpora of different language types according to actual situations. For example, the to-be-processed data may include chinese to-be-processed corpus, english to-be-processed corpus, and the like.
The corpus to be processed may be manually input text data, or text data acquired from a public network, or text data acquired from a picture by an Optical Character Recognition (OCR) technique.
And S12, extracting the semantic features of the data to be processed, and performing logic operation processing on the extracted semantic features and the data to be processed to obtain the fusion features of the data to be processed.
In practical application, the data to be processed can be understood as belonging to an actual meaning space which can be understood by human beings, the obtained data to be processed is a group of character strings for a computer, and the computer cannot directly understand language information to be transmitted by the data to be processed, so that the data to be processed can be converted into digital data which can be understood and processed by the computer, and the data to be processed which originally belongs to the actual meaning space is mapped to the digital space where the computer is located.
In specific implementation, according to preset feature extraction parameters, operations such as combining, sorting, screening and the like can be performed on part or all of the data to be processed to obtain features capable of representing semantic information in the data to be processed, namely semantic features, so that a computer can understand language information to be transmitted by the data to be processed. And then according to preset logical operation parameters, combining the extracted semantic features with the data to be processed through logical operation to obtain semantic information fusion features, namely fusion features.
The number of the semantic features and the fusion features obtained by the feature extraction parameters and the logical operation parameters set according to the actual situation can be one or more.
It can be understood that the number of semantic features may not be consistent with the number of fusion features according to the actually set logical operation manner. For example, when there are multiple semantic features, each semantic feature may be subjected to logic operation with the data to be processed, respectively, to obtain multiple fusion features, or each semantic feature may be subjected to logic operation with the data to be processed, together, to obtain one fusion feature; for another example, when there is one semantic feature, the semantic feature may be respectively subjected to a logical operation with partial data in the data to be processed to obtain a plurality of fusion features, or the semantic feature may be subjected to a logical operation with the data to be processed to obtain one fusion feature.
And S13, calculating the numerical value of each candidate class label used for labeling the linguistic data to be processed based on the fusion characteristics of the data to be processed so as to represent the association degree of each candidate class label and the linguistic data to be processed.
In specific implementation, a candidate category label set may be preset, including each candidate category label representing classification information, and according to the fusion feature of the to-be-processed data, the association degree between each candidate category label and the to-be-processed corpus may be calculated. Since each candidate class label represents the classification information, the higher the numerical value of the candidate class label is, the stronger the correlation between the classification information represented by the candidate class label and the classification information existing in the corpus to be processed is, and the candidate class label is more suitable for labeling the corpus to be processed.
And S14, acquiring candidate category labels with the numerical values meeting the preset first selection condition based on the numerical values of the candidate category labels to obtain a category label prediction set.
In a specific implementation, the first selection condition may be set according to an actual situation.
For example, the first selection condition may be: the value is greater than a preset threshold. Selecting candidate category labels with numerical values larger than a preset threshold value to obtain a category label prediction set, wherein the category label prediction set is used for labeling the linguistic data to be processed so as to represent the classification information of the linguistic data to be processed.
For another example, the first selection condition may be: the value is maximum. Selecting a candidate class label with the largest value to obtain a class label prediction set, wherein the class label prediction set is used for labeling the linguistic data to be processed so as to represent the classification information of the linguistic data to be processed.
The multi-label classification task can be performed by selecting the candidate class label with the numerical value larger than the preset threshold value, and the single-label classification task can be performed by selecting the candidate class label with the maximum numerical value, so that the method disclosed by the embodiment of the specification can be suitable for the single-label classification task and the multi-label classification task. And performing corresponding classification tasks according to preset selection conditions.
In practical applications, the first selection condition may include: the method comprises the steps of selecting conditions of a preset single-label classification task and selecting conditions of a preset multi-label classification task. And the selection condition of the corresponding label classification task can be obtained through the received classification instruction. If a single label classification instruction is received, the selection condition of the single label classification task can be obtained, so that single label classification processing is realized; and receiving a multi-label classification instruction, and acquiring a selection condition of the multi-label classification task so as to realize multi-label classification processing.
By adopting the scheme, the extracted semantic features of the data to be processed and the extracted semantic information in the semantic features can be fused by performing logical operation on the extracted semantic features of the extracted data to be processed, so that the influence on the label classification prediction result caused by semantic feature extraction errors or key semantic information loss is avoided, the fused features contain rich semantic information, the data to be processed with complex content or variable sources can be represented, the flexible processing of a single-label task or a multi-label classification task is facilitated, the numerical value of each candidate class label can be calculated more accurately, the correct candidate class label is obtained to represent the classification information existing in the corpus to be processed, and the accuracy of the label classification result is improved.
In specific implementation, semantic features of the data to be processed can be extracted according to preset feature extraction parameters, all semantic features may not be extracted through one group of feature extraction parameters, and the extracted semantic features may not reflect all semantic information contained in the data to be processed due to limited extraction range.
In an embodiment of the present specification, three sets of feature extraction parameters may be preset, and based on the preset feature extraction parameters, a feature extraction function, that is, a feature extraction function F, that maps the data to be processed into semantic features may be obtained respectively1、F2And F3. Feature extraction based function F1,F2And F3Can be obtained separatelySemantic feature A of the data to be processed1=F1(x)、A2=F2(x) And A3=F3(x) Wherein x represents the data to be processed. Semantic feature A of each group based on preset logical operation parameters1、A2And A3And carrying out logic operation on the data x to be processed to obtain fusion characteristics.
By adopting the scheme, the semantic features with different granularities can be extracted from the data to be processed by setting different feature extraction parameters, so that the extracted semantic features have diversity and universality, more semantic information contained in the data to be processed can be transmitted through the semantic features with different granularities, the capability of fusing the data to be processed with complicated content or variable sources of feature representation is enhanced, and the generalization capability and the universality of accurately predicting the different data to be processed are improved.
In specific implementation, the closer the semantic information transmitted by the fusion features on the digital space is to the semantic information contained in the data to be processed, the stronger the ability of the fusion features to represent the data to be processed is, and the higher the accuracy is. When the semantic features and the data to be processed are subjected to logic operation, different weight coefficients and offset coefficients are set, and weighted logic operation is performed on the semantic features and the data to be processed of each group, wherein the weight coefficients can be set according to actual situations, so that different fusion features can be obtained.
Therefore, through the weighted logic operation, the importance degree of various semantic features and the data to be processed in the logic operation can be controlled, the accuracy of the logic operation result is improved, the accuracy of the fusion feature representation data to be processed is enhanced, and the reliability of the label classification prediction result is improved.
In specific implementation, in order to quickly and reliably obtain the weight coefficients, at least one group of semantic features may be input into a preset nonlinear function to perform nonlinear mapping processing, and a weight coefficient may be assigned to another group of semantic features and the data to be processed based on a processing result, and then a weighted logic operation may be performed on the other group of semantic features and the data to be processed based on the assigned weight coefficient.
For example, three sets of semantic features A are obtained1、A2And A3To combine semantic features A1Input non-linear function F4The calculation result F can be obtained4(A1) Will F4(A1) Input weight coefficient calculation function F5、F6And F7To obtain a weight coefficient a1=F5[F4(A1)]、a2=F6[F4(A1)]And a3=F7[F4(A1)]Based on the assigned weight coefficient a1、a2And a3A fused feature computation function F mapping semantic features to fused features may be obtained8As semantic features A of other groups2And A3And the data x to be processed is assigned with a weight coefficient a1、a2And a3The semantic features A of other groups2And A3And the data x to be processed is input into a fusion characteristic calculation function F8Performing a weighted logical operation F8(a1x,a2A2,a3A3) And obtaining the fusion characteristics.
Therefore, the weight coefficient is acquired through the semantic features, the weight acquisition efficiency can be improved, and the reliability of the weight coefficient can be improved.
In specific implementation, in order to highlight the key semantic information and facilitate subsequent numerical calculation, iterative optimization may be performed on the fusion features, which may specifically be:
after the fact that preset iteration conditions are met is determined, acquiring fusion features of the current round, extracting semantic features of the fusion features, and performing logic operation on the semantic features extracted by the fusion features and the fusion features to obtain the fusion features after iteration;
and after the situation that the iteration condition is not met is determined, taking the iterated fusion feature as the fusion feature of the data to be processed so as to determine the numerical value of each candidate class label.
The iteration condition may be set as an iteration number threshold, or may be set as another condition. The fusion features obtained in the first round are fusion features obtained through logic operation, and after the fusion features are determined to meet the preset iteration conditions, the fusion features obtained subsequently are fusion features after iteration.
It will be appreciated that the feature extraction parameters used for the iteration may or may not be the same as the feature extraction parameters used to extract the semantic features of the data processing data; similarly, the logical operation parameters for iteration may be the same as or different from the logical operation parameters for the logical operation processing of the data to be processed and the semantic features thereof, and this is not limited in this specification.
By adopting the scheme, the iterative fusion features can highlight key semantic information more by performing semantic extraction and logical operation on the fusion features, so that the characterization capability of the fusion features is enhanced, and the accuracy of the label classification result is improved.
In specific implementation, a plurality of groups of feature extraction parameters for extracting fusion features may be preset, semantic features of the fusion features are respectively extracted based on the feature extraction parameters of each group, so as to obtain the semantic features of each group based on the fusion features, and then, logical operation is performed on the semantic features of each group based on the fusion features and the fusion features, so as to obtain the fusion features after iteration.
In a specific implementation, weighted logic operation may be performed on each group of semantic features based on the fusion features and the fusion features, where the method for obtaining the weight coefficient may refer to the related embodiments, and details are not described here.
In specific implementation, in order to convert the data to be processed into information that can be recognized by a computer, before extracting semantic features of the data to be processed, the data to be processed may be divided to obtain corresponding sequences to be processed. According to different application scenarios and different language types, the data to be processed can adopt different division methods to obtain corresponding data sequences. For convenience of explanation, the minimum component that can be divided according to a preset requirement may be referred to as a division unit. Thus, the division process can divide the data x to be processed into n divisionsUnit x1,x2……xn
For example, the data to be processed includes corpus to be processed in Chinese: { hello. And dividing the linguistic data to be processed into { you/people/good/} by adopting a dividing mode of characters and punctuation marks. Where "you", "people", "good". "all are dividing units of the linguistic data to be processed; the linguistic data to be processed can also be divided into { you/good/by adopting the division mode of words and punctuations. }, where "your", "good", "ok". All the units are the dividing units of the linguistic data to be processed.
It is to be understood that the "/" is merely used to illustrate the effect after the division, and is not a symbol actually existing after the division, and other symbol interval division units may also be used after the division, and the symbol of the interval division unit is not specifically limited in this embodiment of the description.
It should be noted that "{ }" herein is only used to limit the content range of the examples, and is not an essential part in representing the content of the corpus, and those skilled in the art can use other symbols that are not easily confused to limit the content range of the corpus, and the "{ }" is the same as above.
In specific implementation, the richer the sequence information contained in the data to be processed, the more accurate the semantic features can be extracted. Therefore, before extracting the semantic features of the to-be-processed data, based on the semantic structure of the to-be-processed corpus, the attribute information of the to-be-processed corpus may be identified, and a corresponding candidate attribute tag is selected from a preset candidate attribute tag set to obtain an attribute tag sequence, so that the to-be-processed data may further include: and the attribute label sequence, and the dividing unit of the attribute label sequence can be an attribute label.
Wherein the attribute information may include: at least one of position information of each dividing unit in the corpus to be processed and grammar information of the corpus to be processed; the syntax information may include: at least one of part-of-speech information and punctuation information. Accordingly, the attribute tag sequence obtained from the corpus to be processed may include: at least one of a sequence of position tags and a sequence of grammar tags; the sequence of grammar tags may include: at least one of part-of-speech tags and punctuation tags.
The following is a detailed description of several specific embodiments.
In an embodiment of the present specification, the presetting of a candidate location tag set may include: and the position label corresponds to each position information. After the linguistic data to be processed is divided, identifying position information existing in the linguistic data to be processed to obtain position information of each dividing unit, and marking corresponding position labels at each dividing unit according to the distribution positions of the dividing units in the linguistic data to be processed, so that a position label sequence is obtained. For example, the corpus to be processed is: { hello. }, the corresponding position tag sequence may be: { 1234 }, where "1", "2", "3", and "4" are location tags indicating first, second, third, and fourth location information, respectively.
In another embodiment of the present specification, a set of candidate grammar tags is preset, which may include: and grammar labels corresponding to the grammar information. After the grammar information existing in the linguistic data to be processed is identified, the grammar information of each dividing unit can be obtained, and corresponding grammar labels can be marked at each dividing unit according to the grammar information of each dividing unit.
The grammar tag further can include: punctuation labels and part-of-speech labels. Wherein, the punctuation mark label can be marked on the punctuation mark corresponding to the punctuation mark information; part-of-speech tags may include: the initial position labels of the part of speech information are marked at the initial word segmentation units corresponding to the part of speech information, and the non-initial position labels of the part of speech information are marked at the non-initial word segmentation units corresponding to the part of speech information.
Through the tag combination of the initial position tag and the non-initial position tag of each part of speech information, the linguistic data to be processed can be uniformly marked, the initial position and the ending position of the part of speech information in the linguistic data to be processed are obtained, and corresponding tags are marked on each dividing unit of the linguistic data to be processed by combining punctuation marks, so that the obtained grammar tag sequence can fully embody the grammar information of the linguistic data to be processed.
For example, the corpus to be processed may be: the term "leave" refers to singing and composing music by Zhang Yu. }
Then, according to the candidate grammar tag set, the following grammar tag sequences can be obtained:
{W-B NW-B NW-I W-B V-B P-B NR-B NR-I V-B V-I W-B V-B V-I W-B}。
wherein "W-B" represents a punctuation mark label; "NW-B" and "NW-I" denote the starting position tag and non-starting position tag, respectively, of a work noun; "P-B" represents the start position tag of a preposition; "NR-B" and "NR-I" represent the start position tag and the end position tag, respectively, of a person's name; "V-B" and "V-I" denote the start position tag and the end position tag, respectively, of the verb.
By adopting the scheme, the corresponding attribute tag sequence can be obtained according to the attribute information in the processed corpus, and due to the co-occurrence characteristic of the corpus to be processed and the attribute tag sequence, the added attribute tag sequence can not damage the semantic information of the corpus to be processed, and the sequence information contained in the data to be processed can be enriched.
In specific implementation, in order to extract semantic features of data to be processed more accurately, after obtaining attribute sequence tags, the corpus to be processed and the attribute tag sequences may be combined to obtain combined data to be processed, which is used to extract semantic features and perform logical operation processing. Wherein, the Concat function can be adopted for combination processing.
By adopting the scheme, the semantic information of the attribute dimension can be extracted after the linguistic data to be processed and the attribute label sequence are combined, the subsequently processed features also contain the semantic information of the attribute dimension, the dimensionality of the semantic information in the semantic features and the fusion features is expanded, and the numerical value of each candidate category label can be more accurately calculated by combining the multi-dimensional semantic information.
In specific implementation, the fusion features can be represented by numerical values, vectors or matrices according to preset logical operation parameters, and cannot be in one-to-one correspondence with a preset candidate class label set. Therefore, a fusion feature vector can be generated based on the fusion feature of the data to be processed, wherein the dimension of the fusion feature vector is consistent with the total number of candidate category labels in a preset candidate category label set, and the numerical value of each element in the fusion feature vector represents the association degree between the corresponding candidate category label and the corpus to be processed.
For example, the preset candidate category label set is LB ═ LB1,lb2,lb3And if so, obtaining a fused feature vector RX ═ RX through the feature vector generation function1,rx2,rx3And d, wherein the dimension of the fusion feature vector is 3, and is consistent with the total number of candidate category labels in a preset candidate category label set. Fusing elements rx in feature vectors1,rx2,rx3Respectively characterizing candidate class labels lb1,lb2,lb3And the numerical value of the linguistic data to be processed is marked.
Correspondingly, after the fusion feature vector is generated, obtaining, based on the numerical value of each candidate category label, a candidate category label whose numerical value meets a preset first selection condition to obtain a category label prediction set may include:
and determining the distribution positions of the elements which accord with a preset first selection condition in the fusion characteristic vector, and acquiring candidate category labels corresponding to the distribution positions in a preset candidate category label set to obtain the category label prediction set.
By adopting the scheme, the fusion feature vector with the dimension consistent with the total number of the candidate class labels is generated, so that the fusion feature vector is convenient to correspond to the distribution position of each candidate class label, and the method is favorable for accurately acquiring the candidate class labels meeting the first selection condition in the candidate class label set.
In specific implementation, dimension reduction can be realized by performing dimension conversion processing on the fusion features, and the vectors subjected to dimension reduction are converted to specified intervals, so that a first selection condition can be set conveniently and candidate category labels meeting the condition can be selected conveniently.
Specifically, a feature vector generating function is formed by preset feature vector generating parameters, the fusion features are input into the feature vector generating function, data dimension transformation processing is performed on the fusion features through the feature vector generating function, and q-dimensional feature transformation vectors can be obtained, wherein q is the total number of candidate category labels.
As an optional example, by performing nonlinear conversion processing on a q-dimensional feature transformation vector, a numerical value of each element in the q-dimensional feature transformation vector can be transformed into a specified numerical value interval, so that the feature transformation vector after the nonlinear conversion processing is used as a fusion feature vector, a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling a numerical value of the corpus to be processed.
The feature transformation vector may be subjected to nonlinear conversion processing by using a numerical calculation function such as Sigmoid. The Sigmoid numerical calculation function can convert the numerical value of each element in the q-dimensional feature transformation vector into a numerical value interval [0,1], the numerical value of each element in the q-dimensional feature transformation vector is independently calculated, and the feature transformation vector after nonlinear conversion can be used for a multi-label classification task.
As another optional example, a q-dimensional feature transformation vector may be subjected to probability transformation processing, and a numerical value of each element in the q-dimensional feature transformation vector is transformed into a specified probability interval, so that the feature transformation vector after the probability transformation processing is used as a fusion feature vector, a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling a numerical value of the corpus to be processed.
Wherein, probability conversion processing can be performed on the feature transformation vector by adopting numerical calculation functions such as Softmax and the like. The Softmax numerical calculation function can convert numerical values of all elements in the q-dimensional feature transformation vector into a probability interval [0,1], the numerical values of all elements in the q-dimensional feature transformation vector are mutually constrained, the sum of the numerical values of all elements in the feature transformation vector after probability conversion processing is 1, and the Softmax numerical calculation function can be used for single-label classification tasks.
In a specific implementation, if the attribute information existing in the corpus to be processed is identified by a tag classification method, and corresponding candidate attribute tags are labeled at each dividing unit of the corpus to be processed, the processing parameters may further include: the attributes identify parameters.
In a specific implementation, in order to convert the data to be processed into information that can be recognized by a computer, before extracting semantic features of the data to be processed, Embedding (Embedding) processing may be performed on the data to be processed, and vectorizing the dividing unit of the data to be processed. Specifically, each partition unit in the corpus to be processed and each candidate attribute tag in the attribute tag sequence may be characterized in a vector manner, so that both the corpus to be processed and the attribute tag sequence may be characterized in a matrix manner. The processing parameters may further include: embedding process parameters for implementing the embedding process.
By adopting the scheme, through vectorization of each dividing unit and each candidate attribute tag, a matrix with higher accuracy can be obtained, the linguistic data to be processed and the attribute tag sequence in the matrix form are convenient for subsequent feature extraction and logic operation, and the data processing efficiency is improved.
In specific implementation, the vector obtained after the embedding processing is a static vector, and the static vector has no polysemy, so that the data to be processed can be encoded, the static vector is converted into a dynamic vector, the dynamic vector can be changed according to the context information of the corpus, and the polysemy is achieved, and then the semantic features of the encoded data to be processed are extracted. The processing parameters may further include: and encoding the processing parameters.
In a specific implementation, before the semantic features of the data to be processed are extracted, preset processing parameters may be acquired.
Wherein the processing parameters include: extracting characteristic parameters, logic operation parameters and numerical calculation parameters; the preset processing parameters are obtained by adjusting initial processing parameters through preset training data, a class label real set of the training data and a preset Loss Function (Loss Function); the loss function is established based on label classification prediction results of the training data, the training data comprising: training corpus, wherein the category label real set of the training data comprises: and actually labeling the candidate category label of the linguistic data to be processed.
By adopting the scheme, the processing parameters are adjusted through training, so that the numerical values of the processing parameters are converged to an ideal state, and the accuracy of the label classification prediction result is improved.
In a specific implementation, the obtained training data may form a training data set, the training data set may be divided into a plurality of batches for training the processing parameters, and the tag classification prediction operation may be performed, where each batch may include a training corpus, i.e., a sentence list, and the size of the list is determined by actual conditions. Or, the training data set may be divided into training data at a sentence level according to a preset sentence end landmark symbol set, and the training data at the sentence level is divided into a plurality of times according to the division result to perform iterative training on the processing parameters, and the label classification prediction operation is performed respectively.
In a specific implementation, the training data may further include: the attribute label sequence of the corpus can be obtained by manually labeling the attribute labels of the dividing units, the attribute information existing in the corpus can be identified through a preset attribute labeling model, and corresponding candidate attribute labels are labeled at the dividing units of the corpus.
Based on the semantic structure of the corpus, the attribute information may include: at least one of position information of each partition unit in the corpus and grammar information of the corpus, where the grammar information may include: at least one of part-of-speech information and punctuation information. Accordingly, the sequence of attribute tags obtained by the corpus may include: at least one of a sequence of position tags and a sequence of grammar tags; the sequence of grammar tags may include: at least one of part-of-speech tags and punctuation tags.
For the processing process of the training data including the attribute tag sequence, reference may be specifically made to the description of the relevant part of the tag classification method, and details are not described here again.
In specific implementation, the vector obtained through the embedding processing is a static vector, and the static vector has no polysemy, so that the training data can be encoded, and the static vector is converted into a dynamic vector, so that the training data can be changed according to the context information of the corpus, and the training data has polysemy.
In order to determine whether the training data is accurately encoded and obtain the correctly encoded training data, the encoded training data may be decoded, and whether the encoding result is accurate is verified according to prediction of the encoded training data.
In an implementation, the encoded training data may be decoded using a Conditional Random Field network (CRF).
The conditional random field network may be pre-populated with a state transition matrix [ A ]]a,bAnd a transmission matrix
Figure BDA0002537555190000101
[A]a,bRepresenting the state transition probability of two time steps from the a-th state to the b-th state,
Figure BDA0002537555190000102
representation matrix
Figure BDA0002537555190000103
The t position after input is output as a candidate label [ v]tWherein θ comprises a processing parameter. Conditional random field score
Figure BDA0002537555190000104
At the highest, the predicted sequence is obtained. Furthermore, the conditional random field model can be calculated by using a Viterbi (Viterbi) method
Figure BDA0002537555190000105
Thereby obtaining the prediction sequence corresponding to the optimal path.
Therefore, a loss function can be established jointly according to the label classification prediction result and the coding processing result, the initial processing parameters can be adjusted in a multi-dimensional mode, the initial processing parameters can be converged quickly, and the parameter adjusting efficiency is improved.
Taking the encoded attribute tag sequence as an example, in order to distinguish the attribute tag sequences before encoding processing and after decoding processing, the attribute tag sequence before encoding processing may be regarded as an attribute tag actual sequence, and the attribute tag sequence predicted by decoding processing may be regarded as an attribute tag prediction sequence.
According to a preset candidate attribute label set, obtaining a plurality of candidate attribute label labeling sequences after permutation and combination, predicting the probability of each candidate attribute label in the candidate attribute label labeling sequences for labeling a corresponding division unit in the attribute label sequences after coding processing by adopting a conditional random field network according to the attribute label sequences after coding processing, thereby obtaining the probability value of each candidate attribute label labeling sequence, and obtaining the candidate attribute label labeling sequences with the probability value meeting a preset second selection condition as the attribute label prediction sequences.
By matching the attribute tag true sequence and the attribute tag predicted sequence, whether the encoding processing result is accurate can be determined.
It should be understood that the above embodiments are only examples, and in an application, the corresponding encoded training data may be selected for decoding according to an actual situation, for example, an encoded syntax tag sequence, a position tag sequence, a corpus, and the like may be selected.
It is understood that, in practical applications, the processing parameters may further include, according to the steps of the tag classification method: the method comprises the following steps of embedding processing parameters, encoding processing parameters, attribute identification parameters, feature extraction parameters for iteration, logical operation parameters for iteration and the like, wherein the parameters can also be obtained by adjusting preset training data, a class label real set of the training data and a preset loss function, and the embodiment of the specification does not limit the specific parameter types included by the processing parameters.
In specific implementation, after the data to be processed is obtained, the class label prediction set of the data to be processed can be obtained through a preset label classification model.
Specifically, referring to a flowchart of another tag classification method shown in fig. 2, in an embodiment of this specification, the method may specifically include the following steps:
and S21, acquiring data to be processed, wherein the data to be processed comprises linguistic data to be processed.
S22, inputting the data to be processed into a preset label classification model to extract semantic features of the data to be processed, performing logic operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed, calculating values of candidate class labels based on the fusion features of the data to be processed, obtaining candidate class labels of which the values meet preset first selection conditions, and obtaining a class label prediction set.
By adopting the tag classification scheme, after the data to be processed is obtained, the original semantic information in the data to be processed and the extracted semantic information in the semantic features can be fused through the preset tag classification model, so that the influence on the tag classification prediction result caused by semantic feature extraction errors or key semantic information loss is avoided, the fused features contain rich semantic information, the data to be processed with complicated content or variable sources can be represented, the numerical value of each candidate class tag can be calculated more accurately, the correct candidate class tag is obtained to represent the classification information in the corpus to be processed, and the accuracy of the tag classification result is improved.
In a specific implementation, a tag classification model may be constructed through the acquired preset processing parameters, and as shown in fig. 3, the tag classification model 30 may include an input layer 31, an encoding layer 32, a feature extraction layer 33, a feature fusion layer 34, a decoding layer 35, and an output layer 36. Wherein, the feature extraction layer 33 is adapted to extract semantic features of the data to be processed.
As an alternative example, the feature extraction layer 33 may employ a convolutional neural network architecture. The feature extraction parameters include: convolutional Neural Network parameters, which may be a common Convolutional Neural Network (CNN) or a variation thereof, by extracting parameters through related features.
In an embodiment of the present specification, by setting a Dilation Rate parameter in the feature extraction parameter, the tag classification model may extract semantic features of the data to be processed through a variant of a convolutional Neural Network, namely, a Dilated Convolutional Neural Network (DCNN).
The feature extraction layer 33 may include at least one expansion convolutional neural network, parameters of each convolutional neural network may be set separately, dimensions of each expansion convolutional neural network may be one-dimensional or multi-dimensional, and when parameter values such as a convolution Kernel (Kernel), a Window (Window), and an expansion rate of each expansion convolutional neural network are the same, a receptive field of each expansion convolutional neural network is the same.
For example, the dimension of the dilated convolutional neural network is one-dimensional, that is, the dilated convolutional neural network is a one-dimensional dilated convolutional neural network, and when the convolution kernel size is 3 and the dilation rate is 2, the receptive field of each dilated convolutional neural network is 7 × 1.
For another example, the dimension of the convolutional neural network is two-dimensional, that is, the convolutional neural network is a two-dimensional convolutional neural network, and when the convolution kernel size is 3 and the expansion ratio is 4, the receptive field of each convolutional neural network is 15 × 15.
By adopting the scheme, the semantic features of the data to be processed are extracted through the expanded convolutional neural network, and the semantic information with longer distance can be extracted from the linguistic data to be processed under the condition that the parameter number is not increased and the data to be processed is not subjected to invalid character elimination preprocessing, so that the semantic features contain wider semantic information.
In specific implementation, as shown in fig. 3, the tag classification model 30 may further include a feature fusion layer 34, which is connected to the feature extraction layer 33 and the encoding layer 32, and is adapted to perform logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed, where the feature fusion layer 34 may adopt any Neural network architecture capable of implementing logical operation, for example, a perceptual Neural network (perceptual Neural Networks) architecture, and parameters of the feature fusion layer may be set through logical operation parameters.
It is to be understood that, in describing the embodiment of the present disclosure, in order to facilitate description of data interaction relationships among the neural networks, a neural network that independently implements a corresponding function may be regarded as a sub-model in the tag classification model, for example, a convolutional neural network that can independently implement a function of extracting semantic features of the data to be processed may be regarded as a semantic feature extraction sub-model; a neural network capable of independently implementing a logic operation processing function can be regarded as a logic operator model.
In practical application, each semantic feature extraction sub-model can be obtained based on each preset group of feature extraction parameters. And then, the logic operator model can carry out logic operation on the semantic features based on the data to be processed of each group and the data to be processed to obtain fusion features.
By adopting the scheme, the semantic features with different granularities can be extracted from the data to be processed by setting different feature extraction parameters, so that the extracted semantic features have diversity and universality, more semantic information contained in the data to be processed can be transmitted through the semantic features with different granularities, the capability of fusing the data to be processed with complicated content or variable sources of feature representation is enhanced, and the generalization capability and the universality of accurately predicting the different data to be processed are improved.
In a specific implementation, based on preset logical operation parameters, the logical operation submodel may perform weighted logical operation on each group of semantic features and the data to be processed.
Therefore, through the weighted logic operation, the importance degree of various semantic features and the data to be processed in the logic operation can be controlled, the accuracy of the logic operation result is improved, the accuracy of the fusion feature representation data to be processed is enhanced, and the reliability of the label classification prediction result is improved.
In specific implementation, in order to quickly and reliably obtain the weight coefficients, the logic operator model may input at least one group of semantic features into a preset nonlinear function to perform nonlinear mapping processing, allocate weight coefficients to other groups of semantic features and the data to be processed based on a processing result, and perform weighted logic operation on the other groups of semantic features and the data to be processed based on the allocated weight coefficients. Wherein, nonlinear mapping processing can be performed on at least one group of semantic features by adopting nonlinear functions such as Sigmoid, Tanh, ReLU and the like.
In an embodiment of the present specification, the two semantic feature extraction submodels respectively output extracted semantic features, and the logical operation submodel may use Sigmoid nonlinear function to perform nonlinear mapping processing on a group of semantic features through a preset neural network to obtain Sigmoid (E)1) And with another set of semantic features E2And carrying out weighted logic operation on the data X to be processed to obtain fusion characteristics Y.
As an alternative example, the weighted logic operation may be performed using the following formula:
Figure BDA0002537555190000131
wherein σ ═ Sigmoid (E)1),
Figure BDA0002537555190000132
The sign is a tensor product operation.
It should be understood that the above description embodiments are only examples, and in practical applications, different numbers of semantic feature extraction submodels, nonlinear functions, and logical operation formulas may be selected according to actual situations, and this description embodiment does not limit this.
Therefore, the weight coefficient is acquired through the semantic features, the weight acquisition efficiency can be improved, and the reliability of the weight coefficient can be improved.
In a specific implementation, as shown in fig. 3, as an optional example, the tag classification model 30 may further include an iteration layer 37 located between the feature fusion layer 34 and the decoding layer 35, and is adapted to, after it is determined that a preset iteration condition is satisfied, obtain a fusion feature of the current round, extract a semantic feature of the fusion feature, and perform a logical operation on the semantic feature extracted from the fusion feature and the fusion feature to obtain an iterated fusion feature; and after the situation that the iteration condition is not met is determined, taking the iterated fusion feature as the fusion feature of the data to be processed so as to determine the numerical value of each candidate class label.
The semantic feature extraction submodel for extracting the semantic features of the fusion features and the semantic feature extraction submodel for extracting the semantic features of the data to be processed can adopt the same neural network architecture, can be a common convolutional neural network or an expanded convolutional neural network according to the expansion rate, and the parameters of the semantic feature extraction submodel for extracting the semantic features of the fusion features can be the same as or different from the parameters of the semantic feature extraction submodel for extracting the semantic features of the data to be processed; similarly, the logic operation sub-model for iteratively processing the fusion feature and the semantic extraction feature thereof and the logic operation sub-model for processing the data to be processed and the fusion feature thereof can adopt the same neural network architecture.
It can be understood that, in describing the embodiment of the present disclosure, in order to distinguish between the semantic feature extraction sub-model for extracting the semantic features of the fusion features and the semantic feature extraction sub-model for extracting the semantic features of the data to be processed, the semantic feature extraction sub-model for extracting the semantic features of the data to be processed may be referred to as a first semantic feature extraction sub-model, and the semantic feature extraction sub-model for extracting the semantic features of the fusion features may be referred to as a second semantic feature extraction sub-model. Similarly, the logic operator model for processing the data to be processed and the fusion characteristics thereof can be referred to as a first logic operator model, and the logic operator model for iteratively processing the fusion characteristics and the semantic extraction characteristics thereof can be referred to as a second logic operator model.
In practical application, according to a preset iteration time threshold, one or more sublayers can be preset in the iteration layer, the sublayers can be connected in series to form a multiple iteration relation, the first sublayer receives input fusion features, extracts semantic features of the fusion features, and performs logical operation on the semantic features extracted from the fusion features and the fusion features to obtain fusion features after one iteration; and the second sublayer receives the fusion features after the first iteration, extracts the semantic features of the fusion features after the first iteration, performs logical operation on the semantic features extracted from the fusion features and the fusion features to obtain the fusion features after the second iteration, and so on, and can obtain the fusion features after multiple iterations after multiple sublayers.
By adopting the scheme, the iterative fusion features can highlight key semantic information more by performing semantic extraction and logical operation on the fusion features, so that the characterization capability of the fusion features is enhanced, and the accuracy of the label classification result is improved.
In specific implementation, a plurality of groups of feature extraction parameters for extracting fusion features may be preset, semantic features of the fusion features are respectively extracted based on the feature extraction parameters of each group, so as to obtain the semantic features of each group based on the fusion features, and then, logical operation is performed on the semantic features of each group based on the fusion features and the fusion features, so as to obtain the fusion features after iteration.
For example, in an embodiment of the present specification, referring to fig. 4, the iterative layer 40 may include two sublayers, namely a first sublayer 41 and a second sublayer 42, and the first sublayer 41 may include: a second semantic feature extraction submodels 411 and 412 and a second logical operator model 413, wherein the inputs of the second semantic feature extraction submodels 411 and 412 are connected with the input of the second logical operator model 413, and the outputs of the second semantic feature extraction submodels 411 and 412 are also connected with the input of the second logical operator model 413; the second sublayer 42 may include: the input of the second semantic feature extraction submodels 421 and 422 are connected with the input of the second logic operation submodel 423, and the output of the second semantic feature extraction submodels 421 and 422 is also connected with the input of the second logic operation submodel 423.
Fusing the features X400As the input features of the first sublayer 41, the second semantic feature extraction submodel 411, the second semantic feature extraction submodel 412, and the second logical operator model 413 are respectively input, and the semantic feature X of the first sublayer 41 is obtained by the second semantic feature extraction submodel 411 and the second semantic feature extraction submodel 412411And X412The semantic feature X of the first sublayer411And X412And fusion feature X400Performing logic operation through the second logic operation sub-model 413 to obtain the fusion characteristic of the first sub-layer 41, i.e. the fusion characteristic X after one iteration413
Fusing the feature matrix X after one iteration413As the input features of the second sub-layer 42, a second semantic feature extraction sub-model 421, a second semantic feature extraction sub-model 422, and a second logical operator model 423 are respectively input, and the semantic feature X of the second sub-layer 42 is obtained through the second semantic feature extraction sub-model 421 and the second semantic feature extraction sub-model 422421And X422The semantic feature X of the second sub-layer 42421And X422And a fused feature matrix X after one iteration413Performing logic operation through the second logic operation submodel 423 to obtain the fusion characteristic of the second sublayer 42, i.e. the fusion characteristic X after the second iteration423
It should be understood that the foregoing embodiment is only an example, and the iteration layer may set the number of sub-layers and the number of semantic feature extraction sub-models and logical operation sub-models included in each sub-layer according to an actual situation, which is not limited in this embodiment of the present specification.
In a specific implementation, the parameters of each second semantic feature extraction submodel may be set respectively, and the parameters of each second semantic feature extraction submodel of the same sub-layer may be the same or different. For example, the iteration layer may include three sublayers. Wherein the expansion ratio of the first sub-layer may be 2, the expansion ratio of the second sub-layer may be 4, and the expansion ratio of the third sub-layer may be 1.
In a specific implementation, with continued reference to fig. 3, the tag classification model 30 may further include: the input layer 31. The input layer 31 is adapted to perform a division process on the data to be processed to obtain a corresponding data sequence to be processed before extracting the semantic features of the data to be processed. The data sequence to be processed may include one or more partitioning units, where the partitioning unit is a minimum unit into which the data to be processed may be partitioned according to a preset requirement.
As an optional example, since the partition units of the data to be processed have various expression forms, in order to improve the extraction efficiency of the semantic features, before the semantic features of the data to be processed are extracted through the semantic extraction layer in the tag classification model, the data to be processed may be subjected to an embedding process, and the partition units of the data to be processed may be subjected to vectorization. Specifically, each partition unit in the corpus to be processed and each candidate attribute tag in the attribute tag sequence may be characterized in a vector manner, so that both the corpus to be processed and the attribute tag sequence may be characterized in a matrix manner.
For example, the embedding process may be performed by using a dictionary mapping method. And acquiring index values of the dividing units in the data to be processed in the mapping dictionary through a preset mapping dictionary to obtain the data to be processed after dictionary mapping processing. The to-be-processed data after the dictionary mapping process comprises the index values of all the dividing units, so the to-be-processed data after the dictionary mapping process can be represented in a vector mode.
In a specific implementation, with continued reference to fig. 3, the tag classification model 30 may further include: an encoding layer 32. The method is suitable for coding the dividing units in the data to be processed to obtain the coded data to be processed. The method includes the steps that each partition unit can be coded according to context information of data to be processed based on preset coding processing parameters to obtain coding feature vectors of each partition unit, the dimensionality of each coding feature vector is determined through the preset coding processing parameters, and the coded data to be processed are composed of the coding feature vectors of each partition unit, so that the coded data to be processed can be represented in a matrix mode.
As an optional example, the encoding layer 32 may encode the data to be processed by using any one of the following encoding processing manners:
1) adopting a time series neural network sub-model;
2) a preset mapping matrix is used.
Wherein the time-series neural network submodel may include: a transducer network model with self-attention mechanism (self-attention), a Bi-directional long short-Term Memory (BiLstm) network model, a gru (gated regenerative unit) network model, etc. The total number of row vectors or the total number of column vectors in the mapping matrix is not less than the total number of the dividing units in the data to be processed.
In a specific implementation, when the encoding layer includes the time-series neural network submodel, the time-series neural network submodel may be pre-trained before the data to be processed is encoded, so that the pre-trained time-series neural network submodel can deeply capture context information in the data to be processed. The following is illustrated by the following two methods:
the first method is to adopt a Language Model (LM) training method to perform pre-training.
Specifically, random pre-training corpora are obtained from a pre-training corpus set, an initial time sequence neural network submodel is input, the time sequence neural network submodel predicts the next word segmentation unit of the pre-training corpora under the condition of giving the above information, and when the prediction accuracy probability reaches a preset pre-training threshold value, pre-training is determined, and the pre-training time sequence neural network submodel is obtained. Otherwise, after adjusting the parameters passing through the time series neural network submodel, continuing pre-training through the pre-training corpus until the probability of accurate prediction reaches a preset pre-training threshold value.
And secondly, pre-training by using a Mask Language Model (MLM) training method.
Acquiring pre-training corpora randomly covering a preset proportion part from a pre-training corpus set, inputting the time sequence neural network submodel, predicting the covered preset proportion part by the time sequence neural network submodel under the condition of giving context information, and determining that pre-training is good when the prediction accuracy probability reaches a preset pre-training threshold value to obtain the pre-training time sequence neural network submodel. Otherwise, after adjusting the parameters passing through the time series neural network submodel, continuing pre-training through the pre-training corpus until the probability of accurate prediction reaches a preset pre-training threshold value.
It should be understood that the above-mentioned pre-training method is only an example, and in practical applications, the above-mentioned method or other pre-training methods may be selected according to a usage scenario, and this is not limited in this embodiment of the present disclosure.
In an embodiment of this specification, the pre-trained time series neural network sub-model may be a pre-trained BERT (Bidirectional Encoder for representing transformations) sub-model, and before the data to be processed is input into the tag classification model, the data to be processed may be pre-processed according to an input rule of the BERT sub-model, and specifically, the pre-trained time series neural network sub-model may be: and adding a first tag CLS before the start position of the data to be processed, and adding an end tag SEP before the end position of the data to be processed.
By adopting the scheme, the head label CLS has semantic information of the whole data to be processed after encoding, feature extraction and feature fusion, and is beneficial to a label classification model to acquire rich semantic information.
As an optional example, when the data to be processed is divided into multiple batches and input into the tag classification model for processing, a length threshold may be preset for the tag classification model, and if the length of the data to be processed in one batch does not satisfy the length threshold, Padding (Padding) processing may be performed on the data to be processed.
Therefore, the data to be processed is coded through the pre-training time sequence neural network submodel, and coding efficiency and coding result accuracy can be improved.
In a specific implementation, in order to improve the processing efficiency of the tag classification model, dimension reduction processing may be performed on the features in the tag classification model. For example, the fused features may be dimension reduced.
In specific implementation, the richer the sequence information contained in the data to be processed, the more accurate the semantic features can be extracted. Therefore, before extracting the semantic features of the to-be-processed data, based on the semantic structure of the to-be-processed corpus, the attribute information of the to-be-processed corpus may be identified, and a corresponding candidate attribute tag is selected from a preset candidate attribute tag set to obtain an attribute tag sequence, so that the to-be-processed data may further include: and the attribute label sequence, and the dividing unit of the attribute label sequence can be an attribute label.
The attribute tag sequence may be obtained through a tag classification model or a preset attribute labeling model, and the attribute tag sequence obtaining method may refer to the related embodiments in the tag classification method, which is not described herein again.
By adopting the scheme, the corresponding attribute tag sequence can be obtained according to the attribute information in the processed corpus, and due to the co-occurrence characteristic of the corpus to be processed and the attribute tag sequence, the semantic information of the corpus to be processed can not be damaged by increasing the attribute tag sequence, and the sequence information contained in the data to be processed can be enriched.
In a specific implementation, as shown in fig. 5, as an optional example, the tag classification model 30 may include: the combination layer 38, which is located between the encoding layer 32 and the feature extraction layer 33, differs from fig. 3 in that: the encoding layer 32 does not establish a connection with the feature fusion layer 34, while the combination layer establishes a connection with the feature fusion layer 34. The combination layer 38 is adapted to, after obtaining the attribute sequence tags, perform combination processing on the corpus to be processed and the attribute tag sequence to obtain combined data to be processed, so as to extract semantic features and perform logical operation processing.
By adopting the scheme, the semantic information of the attribute dimension can be extracted after the linguistic data to be processed and the attribute label sequence are combined, the subsequently processed features also contain the semantic information of the attribute dimension, the dimensionality of the semantic information in the semantic features and the fusion features is expanded, and the numerical value of each candidate category label can be more accurately calculated by combining the multi-dimensional semantic information.
In specific implementation, since the encoded corpus to be processed and the encoded attribute tag sequence can be represented in a matrix manner, a row vector or column vector splicing method can be adopted to perform combination processing according to the row vector or the column vector to obtain combined data to be processed, and the combined data to be processed can also be represented in a matrix manner.
For example, the Concat function may be used to apply n m of the encoded corpus to be processed1Dimension row vector is respectively matched with n m corresponding distribution positions in the coded attribute label sequence2Combining the dimensional row vectors to obtain n (m)1+m2) And (5) carrying out dimensional row vector, thereby obtaining combined data to be processed. Wherein n and m1And m2Is a natural number, and m1And m2May or may not be equal.
Or, a splicing method of matrix operation may be adopted to perform matrix operation processing on the encoded corpus to be processed and the encoded attribute tag sequence, so as to obtain combined data to be processed.
For example, n m-dimensional row vectors in the encoded corpus to be processed and corresponding n m-dimensional row vectors in the encoded attribute tag sequence may be added to obtain n m-dimensional row vectors, where n and m are natural numbers.
In a specific implementation, as shown in fig. 3 or 4, the tag classification model 30 may further include a decoding layer 35, adapted to calculate a value of each candidate class tag according to the fusion feature of the data to be processed, so as to represent the degree of association between each candidate class tag and the corpus to be processed, and obtain, according to the value of each candidate class tag, a candidate class tag whose value meets a preset first selection condition, so as to obtain a class tag prediction set.
In practical application, the fusion features can be represented by numerical values, vectors or matrixes according to preset logical operation parameters, and cannot be in one-to-one correspondence with a preset candidate class label set. Therefore, the decoding layer 35 may generate a fused feature vector based on the fused feature of the to-be-processed data, where a dimension of the fused feature vector is consistent with a total number of candidate category tags in a preset candidate category tag set, and a numerical value of each element in the fused feature vector represents a degree of association between a corresponding candidate category tag and the to-be-processed corpus.
And determining the distribution positions of the elements which accord with a preset first selection condition in the fusion characteristic vector according to the fusion characteristic vector, and obtaining candidate category labels corresponding to the distribution positions in a preset candidate category label set to obtain the category label prediction set.
As an optional example, a feature vector generation function is formed by preset feature vector generation parameters, the fusion features are input into the feature vector generation function, and data dimension transformation processing is performed on the fusion features through the feature vector generation function to obtain a q-dimensional feature transformation vector, where q is the total number of candidate class labels.
Wherein, the existing data dimension transformation methods such as Reshape function, Resize function, Swapaxes function, Flatten function unscqueeze function, and expand function can be adopted; or, a data dimension transformation method can be customized, and the fusion features are transformed into fusion feature vectors according to a preset data dimension transformation rule.
As an optional example, if the fusion feature is represented by a matrix, that is, a fusion feature matrix, the performing data dimension transformation on the fusion feature may specifically be: and performing position conversion on each element in the fusion feature matrix according to a preset sequence to obtain a position conversion vector, performing dimension reduction on the position conversion vector to reduce the dimension of the position conversion vector to be consistent with the total number of candidate category labels in a preset candidate category label set, taking the position conversion vector after dimension reduction as a feature conversion vector, and enabling the dimension of the obtained feature conversion vector to be consistent with the total number of candidate category labels in the preset candidate category label set.
The position conversion vector may be subjected to dimension reduction by using a neural network architecture, for example, the position conversion vector may be subjected to dimension reduction by using a Multi Layer Perceptual (MLP) neural network architecture.
As an optional example, by performing nonlinear conversion processing on a q-dimensional feature transformation vector, a numerical value of each element in the q-dimensional feature transformation vector can be transformed into a specified numerical value interval, so that the feature transformation vector after the nonlinear conversion processing is used as a fusion feature vector, a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling a numerical value of the corpus to be processed.
The feature transformation vector may be subjected to nonlinear conversion processing by using a numerical calculation function such as Sigmoid. The Sigmoid numerical calculation function can convert the numerical value of each element in the q-dimensional feature transformation vector into a numerical value interval [0,1], the numerical value of each element in the q-dimensional feature transformation vector is independently calculated, and the feature transformation vector after nonlinear conversion can be used for a multi-label classification task.
As an optional example, a q-dimensional feature transformation vector may be subjected to probability transformation processing, and a numerical value of each element in the q-dimensional feature transformation vector is transformed into a specified probability interval, so that the feature transformation vector after the probability transformation processing is used as a fusion feature vector, a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling a numerical value of the corpus to be processed.
Wherein, probability conversion processing can be performed on the feature transformation vector by adopting numerical calculation functions such as Softmax and the like. The Softmax numerical calculation function can convert numerical values of all elements in the q-dimensional feature transformation vector into a probability interval [0,1], the numerical values of all elements in the q-dimensional feature transformation vector are mutually constrained, the sum of the numerical values of all elements in the feature transformation vector after probability conversion processing is 1, and the Softmax numerical calculation function can be used for single-label classification tasks.
In order that those skilled in the art will better understand and appreciate the foregoing aspects, the detailed description and specific examples are set forth below in connection with the drawings.
In an embodiment of the present specification, as shown in fig. 6, a schematic structural diagram of another tag classification model according to the present specification is shown. The label classification model 60 includes:
(1) the input layer 61 is adapted to perform preprocessing on the received data, and may specifically include: dividing the corpus S to be processed to obtain a corpus sequence { S to be processed consisting of division units1,s2…smMapping according to a preset mapping dictionary, respectively obtaining index values of each dividing unit in the corpus sequence to be processed in the mapping dictionary, converting the dividing units in the corpus sequence to be processed into corresponding numerical values, and obtaining the corpus to be processed after dictionary mapping processing, namely a corpus vector SID (single-scale { SID) } of the corpus to be processed1,sid2...sidmIn which s is1,s2...smThe linguistic data sequence to be processed is divided into units, and m is the sum of all the divided units in the linguistic data sequence to be processed.
(2) The encoding layer 62 is adapted to perform attribute identification and encoding on the received data, and specifically may include:
(2.1) in the first area 621, the corpus vector SID to be processed is input to a preset first time sequence neural network sub-model for coding processing, so as to obtain first coding feature vectors corresponding to each division unit in the corpus S to be processed, and form a corpus feature matrix
Figure BDA0002537555190000181
Wherein the content of the first and second substances,
Figure BDA0002537555190000182
in the first encoding feature vector ES1,ES2...ESmMay be a dense vector of dimension k, the value of k being determined by the parameters of the time series neural network submodel.
(2.2) in the second area 622, through the to-be-processed corpus vector SID, enabling the first attribute labeling sub-model to identify the grammar information of the to-be-processed corpus S, labeling corresponding grammar tags at each division unit of the to-be-processed corpus S to obtain a grammar tag sequence, and performing dictionary mapping processing on the grammar tag sequence to obtain a grammar tag sequence vector PID ═ { PID ═ PID1,pid2…pidm}。
(2.3) in the second area 622, the syntax label sequence vector PID is input to a preset second time sequence neural network submodel for coding processing, second coding feature vectors corresponding to all the dividing units in the syntax label sequence are obtained, and a syntax label feature matrix is formed
Figure BDA0002537555190000191
Wherein the content of the first and second substances,
Figure BDA0002537555190000192
Figure BDA0002537555190000193
of the second encoded feature vectors EP1,EP2…EPmMay be a dense vector of dimension j, the value of j being determined by the parameters of the second time series neural network submodel.
(2.4) in the third area 623, enabling the second attribute labeling sub-model to identify the position information of the corpus S to be processed through the corpus vector SID to be processed, labeling corresponding position tags at each division unit of the corpus S to be processed to obtain a position tag sequence, and performing dictionary mapping processing on the position tag sequence to obtain a position tag sequence vector QID ═ QID { (QID [ q ] id [ ]1,qid2…qidm}。
(2.5) in the third area 623, the position label sequence vector QID is input into a preset mapping matrix for coding processing, and a grammar mark is obtainedThird coding feature vectors corresponding to all division units in the label sequence and forming a position label feature matrix
Figure BDA0002537555190000194
Wherein the content of the first and second substances,
Figure BDA0002537555190000195
Figure BDA0002537555190000196
in each third encoded feature vector EP1,EP2…EPmMay be dense vectors of dimension h, the value of h being determined by the parameters of the mapping matrix.
(3) The combining layer 63 is adapted to perform combining processing on the received data, and may specifically include: to the material feature matrix
Figure BDA0002537555190000197
Grammar tag feature matrix
Figure BDA0002537555190000198
And location tag feature matrix
Figure BDA0002537555190000199
Performing combination processing to obtain a combination feature matrix
Figure BDA00025375551900001910
Wherein the feature matrix is combined
Figure BDA00025375551900001911
Each combined feature vector of (1) can be represented by corpus feature matrix
Figure BDA00025375551900001912
Grammar tag feature matrix
Figure BDA00025375551900001913
And location tag feature matrix
Figure BDA00025375551900001914
In which the feature-coded vectors of corresponding distributed positions are concatenated, e.g. Ei={ESi,EPi,EQiI is a natural number, and i ∈ [1, m]And is and
Figure BDA00025375551900001915
in each combined vector E1,E2…EmMay be dense vectors of dimensions h + j + k.
(4) The first fully connected layer 64 is adapted to perform dimension reduction processing on the received data, and may specifically include: using the first multi-layer perceptron sub-model 641, the feature matrices may be combined
Figure BDA00025375551900001916
In each combined feature vector E1,E2…EmDimension reduction processing is carried out, so that a combined feature matrix after dimension reduction processing is obtained
Figure BDA00025375551900001917
Wherein, the combined feature vector E after the dimension reduction processing1′,E2′…EmThe dimension of' may be p ═ h + j + k)/2dD is a natural number, and 2dIs a divisor of (h + j + k).
(5) The feature extraction layer 65 is adapted to perform semantic feature extraction on the received data, and may specifically include: the combined feature matrix after dimension reduction can be respectively processed by two expansion convolution submodels 651 and 652
Figure BDA00025375551900001918
Semantic feature extraction processing is carried out to obtain two semantic feature matrixes
Figure BDA00025375551900001919
And
Figure BDA00025375551900001920
wherein the content of the first and second substances,
Figure BDA00025375551900001921
each semantic feature vector in (a) may be a dense vector of dimension p,
Figure BDA00025375551900001922
each semantic feature vector in the semantic feature vector is a dense vector with dimension p.
(6) The feature fusion layer 66 is adapted to perform logical operation processing on the received data, and may specifically include: mapping semantic feature matrices
Figure BDA00025375551900001923
And
Figure BDA00025375551900001924
and the combined feature matrix after dimension reduction processing
Figure BDA00025375551900001925
Performing logic operation to obtain a fusion characteristic matrix
Figure BDA00025375551900001926
Wherein the content of the first and second substances,
Figure BDA00025375551900001927
Figure BDA00025375551900001928
in each fused feature vector D1,D2…DmMay be a dense vector of dimensions p.
(7) The iteration layer 67 is adapted to perform iterative processing on the received data, and may specifically include: by four sub-layers, for the fusion feature matrix
Figure BDA0002537555190000206
Performing four iterations to obtain a fusion feature matrix after the four iterations
Figure BDA0002537555190000204
Wherein, the dimension of each iterated fusion feature vector in the iterated fusion feature matrix may be p-dimension, the expansion rate of each first sublayer may be 2, and the expansion of the second sublayer may beThe ratio may be 4, the expansion ratio of the third sublayer may be 8, and the expansion ratio of the fourth sublayer may be 1.
(8) The decoding layer 68 is adapted to predict the received data to obtain a class label prediction set, and specifically may include:
(8.1) fusion feature matrix after four iterations
Figure BDA0002537555190000205
The position conversion processing is carried out on each element according to the preset sequence to obtain a position conversion vector f ═ a1,a2…am×pWhere the position conversion vector f may be a dense vector of dimension (m × p), a1,a2…am×pAre elements in the fused feature vector f.
(8.2) using the second multi-layer perceptron sub-model 681, the position conversion vector f may be subjected to dimensionality reduction processing, so as to obtain a position conversion vector f' after dimensionality reduction { a ═ b1′,a2′...aq', where the dimension of the position conversion vector f' after dimension reduction is q, q is the total number of candidate category labels in the preset candidate category label set, a1′,a2′…aq' for the elements in the fused feature vector f ' after the dimensionality reduction, the position conversion vector f ' after the dimensionality reduction is taken as a feature transformation vector.
(8.3) carrying out nonlinear conversion processing on the position conversion vector f 'to obtain a characteristic conversion vector f' after the nonlinear conversion processing, taking the characteristic conversion vector f 'after the nonlinear conversion processing as a fusion characteristic vector, determining the distribution position of an element in the fusion characteristic vector f' which meets a preset first selection condition, obtaining a candidate class label corresponding to the distribution position in a preset candidate class label set, and obtaining the class label prediction set Y1
(9) Outputting the class label prediction set Y of the data to be processed through the output layer 691
In one embodiment of the present specification, as shown in fig. 7, a schematic diagram of another tag classification model in the present specification is shown. The label classification model 70 differs from the label classification model 60 in fig. 6 in that: an input layer 71 and an encoding layer 72.
Specifically, after the corpus S to be processed is obtained, the grammar tag sequence PO and the position tag sequence QO are obtained through a preset attribute labeling model, and then the corpus S to be processed, the grammar tag sequence PO and the position tag sequence QO are input to the tag classification model 70 as data to be processed.
The input layer 71 divides the corpus S to be processed to obtain a corpus sequence { S ] to be processed composed of division units1,s2…smMapping according to a preset mapping dictionary, respectively obtaining index values of each division unit in the mapping dictionary in the corpus sequence to be processed, the grammar tag sequence and the position tag sequence, converting each division unit into corresponding numerical values, and obtaining the corpus to be processed, the grammar tag sequence, the position tag sequence and the classification tag sequence after dictionary mapping processing, namely a corpus vector SID to be processed is { SID ═ SID }1,sid2...sidmPID, PID ═ PID, syntax label sequence vector1,pid2...pidmAnd position tag sequence vector QID ═ qiid1,qid2...qidmIn which s is1,s2...smThe linguistic data sequence to be processed is divided into units, and m is the sum of all the divided units in the linguistic data sequence to be processed.
Since the encoding layer 72 does not need to identify and label the attribute information, it can directly encode the received data.
In the first area 721, the corpus vector SID to be processed is input to a preset first time-series neural network sub-model for encoding, so as to obtain first encoding feature vectors corresponding to each partition unit in the corpus S to be processed, and form a corpus feature matrix
Figure BDA0002537555190000201
Wherein the content of the first and second substances,
Figure BDA0002537555190000202
Figure BDA0002537555190000203
in the first encoding feature vector ES1,ES2… ESm may each be a dense vector of dimension k, the value of k being determined by the parameters of the time series neural network submodel.
In the second region 722, the syntax label sequence vector PID is input to a preset second time sequence neural network submodel for coding processing, so as to obtain second coding feature vectors corresponding to each division unit in the syntax label sequence and form a syntax label feature matrix
Figure BDA0002537555190000211
Wherein the content of the first and second substances,
Figure BDA0002537555190000212
Figure BDA0002537555190000213
of the second encoded feature vectors EP1,EP2…EPmMay be a dense vector of dimension j, the value of j being determined by the parameters of the second time series neural network submodel.
In the third region 723, the position tag sequence vector QID is input to a preset mapping matrix for encoding processing to obtain third encoding feature vectors corresponding to each partition unit in the syntax tag sequence, and the third encoding feature vectors form a position tag feature matrix
Figure BDA0002537555190000214
Wherein the content of the first and second substances,
Figure BDA0002537555190000215
Figure BDA0002537555190000216
in each third encoded feature vector EP1,EP2…EPmMay be dense vectors of dimension h, the value of h being determined by the parameters of the mapping matrix.
The rest of the label classification model 70 can refer to the above description of the label classification model 60 of fig. 6, and is not repeated here.
In practical application, the type of the candidate category label set can be set according to specific requirements, so that the label classification method can be applied to the identification field of corresponding types.
For example, a candidate category label set of the relationship type may be set, and the candidate category label set may include: colleague labels, friend labels, couple labels, singing labels, nationality labels, residence labels, work labels, and category waiting labels. Therefore, the label classification method can be applied to the field of relation identification.
For another example, a candidate category label set of emotion types may be set, and the candidate category label set may include: happy tags, calm tags, angry tags, question tags, wait for category tags to be selected. Therefore, the label classification method can be applied to the emotion recognition field.
The wider the coverage of the candidate class labels of the relevant types in the candidate class label set is, the richer the label classification method can perform identification. Taking the relationship identification field as an example, the linguistic data to be processed is as follows:
{ Xiaoming, 2020 birth, Sanyuan of Shanxi, Han nationality. };
by executing the label classification method described in the related embodiment, a class label prediction set can be obtained: { birth date, place of birth, nationality }.
In specific implementation, in order to improve the accuracy of the label classification prediction result, an initial label classification model may be trained, model parameters of the label classification model may be adjusted through preset training data, a class label real set of the training data, and a preset loss function, so that the label classification model converges to an ideal state, model training is completed, and the trained label classification model is used as a preset label classification model, thereby implementing the label classification method. In order to make the embodiments of the present disclosure more clearly understood and implemented by those skilled in the art, the following description is made with reference to the accompanying drawings in the embodiments of the present disclosure.
Referring to a flowchart of a training method for a label classification model shown in fig. 8, in an embodiment of this specification, the method may specifically include the following steps:
s81, acquiring training data and a category label real set of the training data, wherein the training data comprises training corpora.
In a specific implementation, the corpus may include, but is not limited to, chinese and chinese punctuation marks, and the corpus of the corresponding language category may be selected according to the language category actually predicted by the tag classification model.
The training data in different fields can be acquired, so that the source of the training data is wider, and the calibrated training data can be acquired, so that the format of the training data is more uniform and standard. And the training data can be manually arranged data or data acquired from a public network.
S82, inputting the training data and the category label real set into an initial label classification model to extract semantic features of the training data, performing logic operation on the extracted semantic features and the training data to obtain fusion features of the training data, calculating values of all candidate category labels based on the fusion features to represent the association degree of the candidate category labels and the training corpus, and acquiring candidate category labels of which the values meet preset first selection conditions to obtain a category label prediction set of the training data.
And S83, carrying out error calculation on the category label real set and the category label prediction set to obtain a result error value.
After the prediction of the label classification model, a class label prediction set of the training corpus can be obtained, and a result error value between the class label real set and the class label prediction set can be obtained through calculation by a preset loss function.
S84, determining whether the label classification model meets the training completion condition or not based on the result error value, and adjusting the parameters of the label classification model when the label classification model does not meet the training completion condition.
In a specific implementation, a result error threshold and an error coincidence time threshold may be preset, so as to determine whether the parameters of the tag classification model are adjusted.
Specifically, when the result error value is greater than the result error threshold, the tag classification model does not meet the first preset condition, and the parameters of the tag classification model may be adjusted. And when the result error value is smaller than the result error threshold value, adding one to the error coincidence times, and determining whether the error coincidence times is larger than or equal to the error coincidence time threshold value, if so, determining that the label classification model is in accordance with a first preset condition, and determining that the label classification model completes training, otherwise, determining that the label classification model is not in accordance with the first preset condition, and adjusting parameters of the label classification model.
Wherein, the parameters of the label classification model can be adjusted by adopting one of a gradient descent method and a back propagation method.
And S85, inputting the training data and the class label prediction set into the adjusted label classification model until the label classification model meets the training completion condition.
In a specific implementation, in order to verify whether the adjusted label classification model completes training, the training data and the label labeling real sequence of the training data may be input into the adjusted label classification model again, and the adjusted label classification model performs the above steps again until the label classification model meets the condition of completing training.
According to the scheme, the extracted semantic features of the training data and the training data are subjected to logical operation, so that original semantic information in the training data and extracted semantic information in the semantic features can be fused, the diversity of the semantic information in the fusion features is reserved, the label classification model can obtain richer feature information from the fusion features, the generalization capability and universality of the label classification model are enhanced, and the accuracy of the label classification prediction result is improved.
In a specific implementation, the training data may further include: the attribute label sequence of the corpus can be obtained by manually labeling the attribute labels of the dividing units, the attribute information existing in the corpus can be identified through the label classification model or a preset attribute labeling model, and corresponding candidate attribute labels are labeled at the dividing units of the corpus.
Based on the semantic structure of the corpus, the attribute information may include: at least one of position information of each dividing unit in the corpus to be processed and grammar information of the corpus to be processed; the syntax information may include: at least one of part-of-speech information and punctuation information. Accordingly, the attribute tag sequence obtained from the corpus to be processed may include: at least one of a sequence of position tags and a sequence of grammar tags; the sequence of grammar tags may include: at least one of part-of-speech tags and punctuation tags. For the processing process of the training data including the attribute tag sequence, reference may be specifically made to the description of the relevant part of the tag classification method, and details are not described here again.
In a specific implementation, after the label labeling prediction sequence of the training data is obtained, an error between the class label real set and the class label prediction set may be calculated through a preset loss function. And the loss function can be established according to the global or local prediction result of the label classification model.
For example, the following first loss function loss may be established based on the label classification prediction result1And applying the first loss function loss1And the calculated numerical value is used as a result error value between the label labeling prediction sequence and the label labeling real sequence:
Figure BDA0002537555190000231
wherein, yiRepresenting the ith element in the label classification real vector y corresponding to the category label real set,
Figure BDA0002537555190000232
and representing the ith element in the fused feature vector corresponding to the class label prediction set.
Further optionally, the tag classification true vector corresponding to the category tag true set may be obtained by:
according to the classification information actually existing in the training corpus and the distribution position of the candidate class label actually used for labeling the training corpus in a preset candidate class label set, a real class label vector can be generated. For example, the candidate category label set may be { origin, birth date, nationality of the country of origin of the friend, the candidate category label actually used for labeling the corpus corresponding to the classification information may be { origin, birth date, nationality of the country of origin of the friend of the country of the friend of.
Wherein "1" may indicate that the candidate category label at the corresponding location is valid, that is, the candidate category label at the corresponding location is actually used for labeling the corpus, and "0" may indicate that the candidate category label at the corresponding location is invalid, that is, the candidate category label at the corresponding location is not actually used for labeling the corpus. It is understood that, in the implementation, other values may be used to represent the valid bit and the invalid bit, and the embodiment of the present specification is not limited thereto.
For example, in order to determine whether the training data is accurately encoded and obtain an accurate encoded attribute tag sequence, the encoded training data may be decoded, and the encoded training data may be predicted to verify whether the encoding result is accurate. Therefore, a loss function can be established jointly according to the label classification prediction result and the coding processing result, the initial processing parameters can be adjusted in a multi-dimensional mode, the initial processing parameters can be converged quickly, and the parameter adjusting efficiency is improved.
In an embodiment of the present specification, as shown in fig. 9, the training method of the label classification model may include:
s91, acquiring training data and a category label real set of the training data, wherein the training data comprises training corpora and attribute label sequences.
And S92, inputting the training data and the category label real set into an initial label classification model to encode the training data, extracting semantic features of the training data, and performing logical operation on the extracted semantic features and the training data to obtain fusion features of the training data.
S93, calculating the numerical value of each candidate class label based on the fusion characteristics to represent the degree of association between each candidate class label and the training corpus, and obtaining the candidate class label of which the numerical value meets a preset first selection condition to obtain the class label prediction set of the training data.
And S94, based on the coded attribute label sequences, the label classification model calculates the probability value of each candidate attribute label labeling sequence, obtains the candidate attribute label labeling sequence with the probability value meeting the preset second selection condition, and obtains the attribute label prediction sequence of the training data.
And S95, calculating a first error between the category label real set and the category label prediction set, calculating a second error between the attribute label sequence and the attribute label prediction sequence, and calculating the first error and the second error to obtain a result error value.
S96, determining whether the label classification model meets the training completion condition or not based on the result error value, and adjusting the parameters of the label classification model when the label classification model does not meet the training completion condition.
And S97, inputting the training data and the class label real set of the training data into the adjusted label classification model until the label classification model meets the training completion condition.
In specific implementation, a loss function may be jointly established based on a prediction result output by the label classification model and a decoding processing result, and a parameter of the label classification model may be adjusted by using a gradient descent method or a back propagation method based on a preset jointly established loss function.
Optionally, the conditional random field network is adopted to decode the encoded attribute tag sequence, so as to obtain an attribute tag prediction sequence.
Wherein the conditional random field network can be preset with a state transition matrix [ A ]]a,bAnd a transmission matrix
Figure BDA0002537555190000241
[A]a,bRepresenting the state transition probability of two time steps from the a-th state to the b-th state,
Figure BDA0002537555190000242
representing encoded attribute tag sequences (i.e., attribute tag feature matrix)
Figure BDA0002537555190000243
Outputting as candidate attribute label at t position after inputting
Figure BDA0002537555190000244
Wherein θ contains the parameters of the entire label classification model. When conditional random field score
Figure BDA0002537555190000245
And when the maximum value is reached, obtaining the attribute label prediction sequence. Moreover, the conditional random field model can be calculated by a Viterbi method
Figure BDA0002537555190000246
Thereby, a candidate attribute label labeling sequence corresponding to the optimal path can be obtained as an attribute label prediction sequence.
First loss sub-function loss established based on the label classification prediction resultlabelAnd a second loss sub-function loss established based on the prediction result of the decoding processclassifyJointly establishing a second loss function loss2Second loss function loss2The method specifically comprises the following steps:
loss2=λ1losslabel2lossclassify
wherein the content of the first and second substances,
Figure BDA0002537555190000247
Figure BDA0002537555190000248
representing an attribute tag feature matrix containing T attribute tag feature vectors,
Figure BDA0002537555190000249
representing an attribute tag prediction sequence comprising T candidate attribute tags;
Figure BDA00025375551900002410
Figure BDA00025375551900002411
λ1and λ2Is a positive number
It is understood that the above-described embodiments are merely exemplary, and the loss function may be established according to actual situations.
In the specific implementation, after the loss functions are jointly established, the weight coefficients of the loss sub-functions can be adjusted, so that the parameter adjusting direction and parameter adjusting force of the model can be automatically controlled, for example, if the loss functions are loss2=λ1losslabel2lossc1assifyWhen lambda is1Greater than λ2The control gradient descent method and the back propagation method tend to adjust the parameters of the class label prediction when lambda is1Less than λ2In time, controlling the gradient descent method and the back propagation method tends to adjust the parameters of the attribute label prediction.
In order that those skilled in the art will better understand and appreciate the foregoing aspects, the detailed description and specific examples are set forth below in connection with the drawings.
In an embodiment of the present specification, as shown in fig. 10, a schematic structural diagram of another tag classification model of the present specification is shown. The difference from the label classification model shown in fig. 6 and 7 is that: a decoding layer 101 and an output layer 102.
Specifically, training data is input into a label classification model to obtain an iterated fusion feature matrix [ D ]train]Then, the decoding layer 101 may perform position conversion processing on each element in the iterated fusion feature matrix according to a preset order to obtain a position conversion vector ftrainWith the second multi-layer perceptron sub-model 1011, the decoding layer 101 may also convert the vector f to a positiontrainDimension reduction is carried out, so that a position conversion vector f after dimension reduction is obtainedtrain' As a feature transformation vector, a feature transformation vector f is transformedtrain' performing nonlinear conversion processing to obtain a characteristic transformation vector f after the nonlinear conversion processingtrain", as the fusion feature vector.
Based on the encoded attribute tag sequence, i.e., the attribute tag feature matrix [ ET ], the decoding layer 101 may calculate a probability value of each candidate attribute tag label sequence by using the conditional random field model 1012, obtain a candidate attribute tag label sequence having a probability value meeting a preset second selection condition, and obtain an attribute tag prediction sequence of the training data.
Then, the decoding layer 101 may calculate an error between the category label real set and the category label prediction set according to a preset loss function, so as to obtain a result error value loss. The output layer 102 may output the result error value loss, so as to determine whether the label classification model is trained. The loss function may refer to the related embodiments described above, and will not be described herein again.
It can be understood that, the process of obtaining the corresponding fusion feature matrix and attribute tag feature matrix according to the training data by the tag classification model may refer to the description of the related embodiments of the tag classification method, and is not described herein again.
In practical application, the type of the candidate class label set can be set according to specific requirements, so that the label classification model trained by adopting the training method can be applied to the identification field of corresponding types.
For example, a candidate category label set of the relationship type may be set, and the candidate category label set may include: colleague labels, friend labels, couple labels, singing labels, nationality labels, residence labels, work labels, and category waiting labels. Therefore, the trained label classification model can be applied to the field of relation recognition.
For another example, a candidate category label set of emotion types may be set, and the candidate category label set may include: happy tags, calm tags, angry tags, question tags, wait for category tags to be selected. Therefore, the trained label classification model can be applied to the emotion recognition field
The wider the coverage of the candidate class labels of the relevant types in the candidate class label set is, the richer the label classification model completing the training can perform identification.
It is to be understood that the terms first, second, etc. may be used for convenience in this description to distinguish one from another. And the terms "first," "second," "third," etc. prefix herein are used merely to distinguish one term from another, and do not denote any order, size, or importance, etc.
The present specification also provides a data processing apparatus, including a memory and a processor, where the memory may store one or more computer-executable instructions, and the processor may call the one or more computer-executable instructions to perform the label classification method or the training method of the label classification model provided in the embodiments of the present specification.
In a specific implementation, the data processing device may further include a display interface and a display accessed through the display interface. The display may display result information obtained by the processor executing the label classification method or the training method of the label classification model provided in the embodiments of the present specification.
Wherein the result information may include: the label labeling prediction sequence of the data to be processed, the class label prediction set of the data to be processed, the label labeling prediction sequence of the training data, the class label prediction set of the training data and the like.
The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the steps of the method according to any of the above embodiments of the present invention may be executed. The computer readable storage medium may be various suitable readable storage media such as an optical disc, a mechanical hard disc, a solid state hard disc, and the like. The instructions stored in the computer-readable storage medium may be used to execute the method according to any of the embodiments, which may specifically refer to the embodiments described above and will not be described again.
The computer-readable storage medium may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, compact disk read Only memory (CD-ROM), compact disk recordable (CD-R), compact disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like.
The computer instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
Although the disclosed embodiments are disclosed above, the disclosed embodiments are not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the embodiments of the present disclosure, and it is therefore intended that the scope of the embodiments of the present disclosure be limited only by the terms of the appended claims.

Claims (10)

1. A method of tag classification, comprising:
acquiring data to be processed, wherein the data to be processed comprises corpora to be processed;
extracting semantic features of the data to be processed, and performing logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed;
calculating the numerical value of each candidate category label based on the fusion characteristics of the data to be processed so as to represent the association degree of each candidate category label and the corpus to be processed;
and obtaining candidate category labels with the numerical values meeting a preset first selection condition based on the numerical values of the candidate category labels to obtain a category label prediction set.
2. The label classification method according to claim 1, wherein the extracting semantic features of the data to be processed, and performing logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed comprises:
respectively extracting semantic features of the data to be processed based on preset feature extraction parameters of each group to obtain the semantic features of each group;
and performing logic operation on the semantic features of each group and the data to be processed to obtain fusion features.
3. The label classification method according to claim 2, wherein the performing a logical operation on the semantic features of the groups and the data to be processed comprises:
inputting at least one group of semantic features into a preset nonlinear function for nonlinear mapping processing, and distributing weight coefficients for other groups of semantic features and the data to be processed based on processing results;
and performing weighted logic operation on the semantic features of the other groups and the data to be processed based on the distributed weight coefficients.
4. The tag classification method according to any one of claims 1 to 3, wherein before the computing, based on the fused features of the to-be-processed data, each candidate class tag for labeling the numerical value of the to-be-processed corpus, the method further comprises:
after the fact that preset iteration conditions are met is determined, acquiring fusion features of the current round, extracting semantic features of the fusion features, and performing logic operation on the semantic features extracted by the fusion features and the fusion features to obtain the fusion features after iteration;
and after the situation that the iteration condition is not met is determined, taking the iterated fusion feature as the fusion feature of the data to be processed so as to determine the numerical value of each candidate class label.
5. The label classification method according to claim 1, further comprising, before the extracting semantic features of the data to be processed:
identifying attribute information existing in the linguistic data to be processed, and acquiring an attribute tag corresponding to the attribute information to obtain an attribute tag sequence;
combining the linguistic data to be processed and the attribute tag sequence to obtain combined data to be processed for extracting semantic features;
wherein the attribute information includes: and at least one of position information of each dividing unit in the corpus to be processed and grammar information of the corpus to be processed.
6. The label classification method according to claim 5, wherein the calculating the numerical value of each candidate class label based on the fusion feature of the data to be processed comprises:
generating a fusion feature vector based on the fusion feature of the data to be processed, wherein the dimension of the fusion feature vector is consistent with the total number of candidate category labels in a preset candidate category label set, and the numerical value of each element in the fusion feature vector represents the association degree of the corresponding candidate category label and the corpus to be processed;
the obtaining of the candidate category label with the value meeting a preset first selection condition based on the value of each candidate category label to obtain a category label prediction set includes:
and determining the distribution position of the element with the numerical value meeting the preset first selection condition in the fusion feature vector, and acquiring the candidate category label corresponding to the distribution position in the preset candidate category label set to obtain the category label prediction set.
7. The label classification method according to claim 1, characterized in that, before the extracting semantic features of the data to be processed, it comprises:
acquiring preset processing parameters;
wherein the processing parameters include: extracting characteristic parameters, logic operation parameters and numerical calculation parameters; the preset processing parameters are obtained by adjusting initial processing parameters through preset training data, a category label real set of the training data and a preset loss function; the loss function is established based on label classification prediction results of the training data, the training data comprising: training corpus, wherein the category label real set of the training data comprises: and actually labeling candidate category labels of the training corpus.
8. A method of tag classification, comprising:
acquiring data to be processed, wherein the data to be processed comprises corpora to be processed;
inputting the data to be processed into a preset label classification model to extract semantic features of the data to be processed, performing logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed, calculating values of all candidate class labels based on the fusion features of the data to be processed, obtaining candidate class labels of which the values meet preset first selection conditions, and obtaining a class label prediction set.
9. A data processing apparatus comprising a memory and a processor; wherein the memory is adapted to store one or more computer instructions, wherein the processor when executing the computer instructions performs the steps of the method of any one of claims 1 to 8.
10. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions when executed perform the steps of the method of any one of claims 1 to 8.
CN202010537640.1A 2020-06-12 2020-06-12 Label classification method, data processing device and readable storage medium Pending CN111695052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010537640.1A CN111695052A (en) 2020-06-12 2020-06-12 Label classification method, data processing device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010537640.1A CN111695052A (en) 2020-06-12 2020-06-12 Label classification method, data processing device and readable storage medium

Publications (1)

Publication Number Publication Date
CN111695052A true CN111695052A (en) 2020-09-22

Family

ID=72480672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010537640.1A Pending CN111695052A (en) 2020-06-12 2020-06-12 Label classification method, data processing device and readable storage medium

Country Status (1)

Country Link
CN (1) CN111695052A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163377A (en) * 2020-10-13 2021-01-01 北京智芯微电子科技有限公司 Method and device for acquiring transformer temperature early warning model and temperature prediction method
CN112817560A (en) * 2021-02-04 2021-05-18 深圳市永达电子信息股份有限公司 Method and system for processing calculation task based on table function and computer readable storage medium
CN112860900A (en) * 2021-03-23 2021-05-28 上海壁仞智能科技有限公司 Text classification method and device, electronic equipment and storage medium
CN113095604A (en) * 2021-06-09 2021-07-09 平安科技(深圳)有限公司 Fusion method, device and equipment of product data and storage medium
CN113434575A (en) * 2021-06-30 2021-09-24 平安普惠企业管理有限公司 Data attribution processing method and device based on data warehouse and storage medium
CN113627568A (en) * 2021-08-27 2021-11-09 广州文远知行科技有限公司 Bidding supplementing method, device, equipment and readable storage medium
CN113672727A (en) * 2021-07-28 2021-11-19 重庆大学 Financial text entity relation extraction method and system
CN113704519A (en) * 2021-08-26 2021-11-26 北京市商汤科技开发有限公司 Data set determination method and device, computer equipment and storage medium
CN113987187A (en) * 2021-11-09 2022-01-28 重庆大学 Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN114860905A (en) * 2022-04-24 2022-08-05 支付宝(杭州)信息技术有限公司 Intention identification method, device and equipment
CN115080689A (en) * 2022-06-15 2022-09-20 昆明理工大学 Label association fused hidden space data enhanced multi-label text classification method
WO2023273720A1 (en) * 2021-06-28 2023-01-05 京东科技控股股份有限公司 Method and apparatus for training model, and device, and storage medium
CN117746167A (en) * 2024-02-20 2024-03-22 四川大学 Training method and classifying method for oral panorama image swing bit error classification model
CN113627568B (en) * 2021-08-27 2024-07-02 广州文远知行科技有限公司 Method, device and equipment for supplementing marks and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN109871444A (en) * 2019-01-16 2019-06-11 北京邮电大学 A kind of file classification method and system
CN110717039A (en) * 2019-09-17 2020-01-21 平安科技(深圳)有限公司 Text classification method and device, electronic equipment and computer-readable storage medium
CN111241842A (en) * 2018-11-27 2020-06-05 阿里巴巴集团控股有限公司 Text analysis method, device and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN111241842A (en) * 2018-11-27 2020-06-05 阿里巴巴集团控股有限公司 Text analysis method, device and system
CN109871444A (en) * 2019-01-16 2019-06-11 北京邮电大学 A kind of file classification method and system
CN110717039A (en) * 2019-09-17 2020-01-21 平安科技(深圳)有限公司 Text classification method and device, electronic equipment and computer-readable storage medium

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163377A (en) * 2020-10-13 2021-01-01 北京智芯微电子科技有限公司 Method and device for acquiring transformer temperature early warning model and temperature prediction method
CN112817560A (en) * 2021-02-04 2021-05-18 深圳市永达电子信息股份有限公司 Method and system for processing calculation task based on table function and computer readable storage medium
CN112817560B (en) * 2021-02-04 2023-07-04 深圳市永达电子信息股份有限公司 Computing task processing method, system and computer readable storage medium based on table function
CN112860900A (en) * 2021-03-23 2021-05-28 上海壁仞智能科技有限公司 Text classification method and device, electronic equipment and storage medium
CN113095604A (en) * 2021-06-09 2021-07-09 平安科技(深圳)有限公司 Fusion method, device and equipment of product data and storage medium
CN113095604B (en) * 2021-06-09 2021-09-10 平安科技(深圳)有限公司 Fusion method, device and equipment of product data and storage medium
WO2023273720A1 (en) * 2021-06-28 2023-01-05 京东科技控股股份有限公司 Method and apparatus for training model, and device, and storage medium
CN113434575A (en) * 2021-06-30 2021-09-24 平安普惠企业管理有限公司 Data attribution processing method and device based on data warehouse and storage medium
CN113434575B (en) * 2021-06-30 2023-09-08 上海赢链通网络科技有限公司 Data attribution processing method, device and storage medium based on data warehouse
CN113672727A (en) * 2021-07-28 2021-11-19 重庆大学 Financial text entity relation extraction method and system
CN113672727B (en) * 2021-07-28 2024-04-05 重庆大学 Financial text entity relation extraction method and system
CN113704519A (en) * 2021-08-26 2021-11-26 北京市商汤科技开发有限公司 Data set determination method and device, computer equipment and storage medium
CN113704519B (en) * 2021-08-26 2024-04-12 北京市商汤科技开发有限公司 Data set determining method and device, computer equipment and storage medium
CN113627568A (en) * 2021-08-27 2021-11-09 广州文远知行科技有限公司 Bidding supplementing method, device, equipment and readable storage medium
CN113627568B (en) * 2021-08-27 2024-07-02 广州文远知行科技有限公司 Method, device and equipment for supplementing marks and readable storage medium
CN113987187A (en) * 2021-11-09 2022-01-28 重庆大学 Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN114860905A (en) * 2022-04-24 2022-08-05 支付宝(杭州)信息技术有限公司 Intention identification method, device and equipment
CN115080689B (en) * 2022-06-15 2024-05-07 昆明理工大学 Hidden space data enhanced multi-label text classification method based on fusion label association
CN115080689A (en) * 2022-06-15 2022-09-20 昆明理工大学 Label association fused hidden space data enhanced multi-label text classification method
CN117746167A (en) * 2024-02-20 2024-03-22 四川大学 Training method and classifying method for oral panorama image swing bit error classification model
CN117746167B (en) * 2024-02-20 2024-04-19 四川大学 Training method and classifying method for oral panorama image swing bit error classification model

Similar Documents

Publication Publication Date Title
CN111695052A (en) Label classification method, data processing device and readable storage medium
CN113656570B (en) Visual question-answering method and device based on deep learning model, medium and equipment
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN112100346B (en) Visual question-answering method based on fusion of fine-grained image features and external knowledge
CN111695053A (en) Sequence labeling method, data processing device and readable storage medium
CN111694924A (en) Event extraction method and system
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN114168709B (en) Text classification method based on lightweight pre-training language model
CN116151256A (en) Small sample named entity recognition method based on multitasking and prompt learning
CN113806645A (en) Label classification system and training system of label classification model
CN113886626B (en) Visual question-answering method of dynamic memory network model based on multi-attention mechanism
CN113806646A (en) Sequence labeling system and training system of sequence labeling model
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN114153971A (en) Error-containing Chinese text error correction, identification and classification equipment
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
Selvam et al. A transformer-based framework for scene text recognition
CN113723111B (en) Small sample intention recognition method, device, equipment and storage medium
CN113705222B (en) Training method and device for slot identification model and slot filling method and device
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
Ambili et al. Siamese Neural Network Model for Recognizing Optically Processed Devanagari Hindi Script
Zhang et al. Open-domain document-based automatic QA models based on CNN and attention mechanism
Le et al. An Attention-Based Encoder–Decoder for Recognizing Japanese Historical Documents
CN115422934B (en) Entity identification and linking method and system for space text data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination