CN116453137A - Expression semantic extraction method, device, equipment and storage medium - Google Patents

Expression semantic extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116453137A
CN116453137A CN202310426259.1A CN202310426259A CN116453137A CN 116453137 A CN116453137 A CN 116453137A CN 202310426259 A CN202310426259 A CN 202310426259A CN 116453137 A CN116453137 A CN 116453137A
Authority
CN
China
Prior art keywords
expression
semantic
model
semantic classification
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310426259.1A
Other languages
Chinese (zh)
Inventor
姜敏华
张茜
张莉
任杰
于佳玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310426259.1A priority Critical patent/CN116453137A/en
Publication of CN116453137A publication Critical patent/CN116453137A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses an expression semantic extraction method, which comprises the following steps: carrying out semantic text semantic classification on images in the expression image set to obtain a semantic classification annotation set, carrying out continuous pre-training treatment on the pre-trained corpus recognition model by utilizing a preset expression corpus set to obtain an expression characterization semantic classification model, carrying out semi-supervised bi-directional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification annotation set to obtain an expression semantic extraction model, and carrying out semantic extraction on the expression image set and the expression corpus set by utilizing the expression semantic extraction model to obtain an expression semantic result. The invention also relates to a blockchain technique, and the expression semantic result can be stored in a node of the blockchain. The invention further provides an expression semantic extraction device, electronic equipment and a readable storage medium. The invention can accurately extract expression semantics.

Description

Expression semantic extraction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an expression semantic extraction method, an expression semantic extraction device, an electronic device, and a readable storage medium.
Background
Along with the diversification of semantic expression modes in the Internet age, people increasingly prefer to use vivid expressions as ideological transmission carriers in comparison with characters so as to make up for the ideographic defect of the characters, and various expressions which are active on the Internet in a certain sense are obviously a novel 'network character'. The general expression semantic information mainly comprises three components: context information, expression images and expression characters, wherein the context information, namely the context information where the expression is located, often contains the ideographic range of the expression itself; the expression image refers to image information containing expressions and can be divided into a static image and a dynamic image; the emoji character is the incidental information contained in a part of the emoji, and is added to the emoji image in a character form.
However, from a semantic understanding perspective, the broad substitution of expressions for words presents challenges to traditional NLP language models. On one hand, the expression is greatly emerged, so that the NLP model based on pure text representation has degraded performance when processing the corpus containing the expression, and even the true meaning of the corpus cannot be accurately identified; on the other hand, there is also a need for an effective way of extracting fusion information to effectively deconstruct the true semantic meaning of the expression in a specific context. Therefore, a method for accurately extracting expression semantics is needed.
Disclosure of Invention
The invention provides an expression semantic extraction method, an expression semantic extraction device, electronic equipment and a readable storage medium, and the method and the device are mainly used for accurately extracting expression semantics.
In order to achieve the above object, the present invention provides a method for extracting expression semantics, comprising:
acquiring an expression image set, and carrying out expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set;
continuously pre-training the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model;
performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model;
and carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
Optionally, the performing the semantic classification on the emotion text for the image in the emotion image set to obtain a semantic classification label set includes:
recognizing the expression text of the images in the expression image set by utilizing an OCR (optical character recognition) technology;
Carrying out expression recognition on the expression text by using a preset text semantic recognition model to obtain an expression recognition result, and receiving an expression labeling result obtained by labeling the expression text by a user;
and summarizing the semantic classification labeling result, the expression text and the expression image set to obtain a semantic classification labeling set by taking the expression recognition result and the expression labeling result as semantic classification labeling results.
Optionally, the performing the pre-training processing on the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model includes:
performing classification training on the corpus recognition model by using the expression corpus set to obtain a first classification model;
performing classification training on the first classification model by using expression corpus in a preset field in the expression corpus set to obtain a second classification model;
and connecting the full connection layer after the second classification model to obtain the expression characterization semantic classification model.
Optionally, the performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model includes:
Outputting text prediction labels of the expression texts in the semantic classification label set and corresponding prediction probabilities by using the expression characterization semantic classification model;
outputting image prediction labels and corresponding prediction probabilities of images in the expression image set in the semantic classification label set by using the expression image semantic classification model;
and taking all prediction labels with the prediction probability larger than or equal to a preset prediction threshold value as pseudo labels, and carrying out iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo labels to obtain the expression semantic extraction model.
Optionally, the performing iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo label to obtain the expression semantic extraction model includes:
calculating the prediction accuracy of the pseudo labels by using the semantic classification labeling result;
if the prediction accuracy is smaller than a preset accuracy threshold, adjusting model parameters in the expression characterization semantic classification model and the expression image semantic classification model, and returning to the step of outputting text prediction labels and corresponding prediction probabilities of the expression texts in the semantic classification label set by using the expression characterization semantic classification model;
And stopping training when the prediction accuracy is greater than or equal to the accuracy threshold, determining that the expression representation semantic classification model and the expression image semantic classification model are converged, and taking the converged expression representation semantic classification model as the expression semantic extraction model.
Optionally, the semantic extraction of the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result includes:
carrying out vector characterization on the expression image set and the expression corpus set by using a second classification model in the expression semantic extraction model to obtain an expression semantic characterization vector set;
outputting the predicted vector labels of the vectors in the expression semantic representation vector set by using a full connection layer in the expression semantic extraction model, wherein the predicted vector probabilities correspond to the predicted vector labels;
and taking the predicted vector label and the expression semantic representation vector with the predicted vector probability larger than or equal to a preset vector probability threshold as the expression semantic result.
Optionally, the prediction accuracy of the pseudo-annotation is calculated by the following formula:
the Accuracy represents the prediction Accuracy, T represents the quantity of the pseudo labels consistent with the labels in the semantic classification label result, and T represents the quantity of all labels in the semantic classification label set.
In order to solve the above problems, the present invention further provides an expression semantic extraction device, which includes:
the expression text semantic annotation module is used for acquiring an expression image set, carrying out expression text semantic classification on images in the expression image set, and obtaining a semantic classification annotation set;
the expression semantic classification model training module is used for carrying out continuous pre-training treatment on the pre-trained corpus recognition model by utilizing a preset expression corpus set to obtain an expression representation semantic classification model;
the model optimization training module is used for performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification annotation set to obtain an expression semantic extraction model;
and the expression semantic extraction module is used for carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one computer program; and
And the processor executes the computer program stored in the memory to realize the expression semantic extraction method.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the expression semantic extraction method described above.
According to the invention, through the way of carrying out the semantic classification of the expression text on the images in the expression image set, a large number of manual labels are not needed, the semantic classification label set can be obtained, the model training efficiency is improved, and the semi-supervised bidirectional optimization training is carried out on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification label set, so that the expression semantic extraction model capable of carrying out information fusion extraction on the expression self-contained images and the expression text information is obtained. Meanwhile, the expression semantic extraction model can fully fuse the image and the text information of the expression through semi-supervised interactive training, so that the accuracy of expression semantic extraction is improved. Therefore, the expression semantic extraction method, the expression semantic extraction device, the electronic equipment and the computer readable storage medium can accurately extract the expression semantic.
Drawings
Fig. 1 is a flow chart of an expression semantic extraction method according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of an expression semantic extraction device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing the expression semantic extraction method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides an expression semantic extraction method. The execution subject of the expression semantic extraction method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the invention. In other words, the expression semantic extraction method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flowchart of an expression semantic extraction method according to an embodiment of the present invention is shown. In this embodiment, the expression semantic extraction method includes the following steps S1 to S4:
s1, acquiring an expression image set, and carrying out expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set.
In the embodiment of the invention, the expression image set refers to an image set containing expressions and characters in actual business in different fields, for example, a conversation screenshot set containing expressions of customer service and users in the financial field.
In the embodiment of the present invention, performing semantic classification on the expression text of the image in the expression image set to obtain a semantic classification annotation set includes:
recognizing the expression text of the images in the expression image set by utilizing an OCR (optical character recognition) technology;
carrying out expression recognition on the expression text by using a preset text semantic recognition model to obtain an expression recognition result, and receiving an expression labeling result obtained by labeling the expression text by a user;
and summarizing the semantic classification labeling result, the expression text and the expression image set to obtain a semantic classification labeling set by taking the expression recognition result and the expression labeling result as semantic classification labeling results.
In an alternative embodiment of the present invention, the OCR (Optical Character Recognition ) recognition technology refers to a technology of performing analysis and recognition processing on an image file containing text data to obtain text and layout information, and extracting expression text from an expression image by using the OCR technology. Since the emoji text in the emoji image is often very short, the preset text semantic recognition model may use a simple NLP (Natural Language Processing ) classification model (such as LDA, recognition model defining recognition rules, etc.). Meanwhile, the labels of the expression packages to which the expressions belong can also be used as semantic expression labeling results, and only a small amount of manually labeled expression labeling results are introduced to complement each other when necessary through OCR (optical character recognition), so that the model training efficiency is improved.
S2, carrying out continuous pre-training treatment on the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model.
In the embodiment of the present invention, the preset expression corpus set may be a set containing expression corpora in different fields crawled from the internet. The pre-trained predictive recognition model may be an open-source generic Bert model. The continuous pre-training (continuous pre-train) means that training is performed by using a corpus in a specific field and in a task continuously to improve the model performance, so that the model performance is more suitable for actual services. For example, in the financial field, a preset expression corpus is used for training, and then a financial corpus is selected from the expression corpus for retraining, so as to obtain a Bert model after training.
In detail, the performing the pre-training processing on the pre-trained corpus recognition model by using the preset expression corpus set to obtain the expression characterization semantic classification model includes:
performing classification training on the corpus recognition model by using the expression corpus set to obtain a first classification model;
performing classification training on the first classification model by using expression corpus in a preset field in the expression corpus set to obtain a second classification model;
and connecting the full connection layer after the second classification model to obtain the expression characterization semantic classification model.
In an alternative embodiment of the invention, the Bert model (second classification model) trained in the specific field is used as a backbone network (backbone), and a full-connection layer is connected at the back, so that semantic classification prediction can be performed on N words of sentences or expression contexts where the expression is located.
S3, performing semi-supervised bidirectional optimization training on the expression representation semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification annotation set to obtain an expression semantic extraction model.
In the embodiment of the invention, the expression characterization semantic classification model is used for carrying out expression semantic prediction (such as happiness, sadness, anger, and praise) on sentences or expression context texts where the expression is located. The pre-constructed expression image semantic classification model performs semantic category prediction on the expression image based on an image classification algorithm (such as a ResNet model), performs interactive training on the expression characterization semantic classification model and the expression image semantic classification model based on semi-supervised bidirectional optimization, can perform interactive training on three characteristics of expression semantics (namely expression context information, expression image information and expression text), and greatly improves the accuracy of model identification.
In detail, the performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model comprises the following steps:
outputting text prediction labels of the expression texts in the semantic classification label set and corresponding prediction probabilities by using the expression characterization semantic classification model;
outputting image prediction labels and corresponding prediction probabilities of images in the expression image set in the semantic classification label set by using the expression image semantic classification model;
and taking all prediction labels with the prediction probability larger than or equal to a preset prediction threshold value as pseudo labels, and carrying out iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo labels to obtain the expression semantic extraction model.
In the embodiment of the invention, the semantic classification based on the expression image is consistent with the standard based on the semantic classification of the expression text, namely the text prediction annotation and the image prediction annotation belong to the same set of annotation standard.
In an optional embodiment of the present invention, the expression semantic classification model and the expression image semantic classification model are used to perform expression semantic prediction on the expression text and the expression image respectively, and a prediction label with high confidence of prediction (for example, prediction probability of a certain category > =0.95) is regarded as a pseudo label, and is added into a semantic classification label set for training of two models in the next round until the two models converge.
Specifically, the performing iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo label to obtain the expression semantic extraction model includes:
calculating the prediction accuracy of the pseudo labels by using the semantic classification labeling result;
if the prediction accuracy is smaller than a preset accuracy threshold, adjusting model parameters in the expression characterization semantic classification model and the expression image semantic classification model, and returning to the step of outputting text prediction labels and corresponding prediction probabilities of the expression texts in the semantic classification label set by using the expression characterization semantic classification model;
and stopping training when the prediction accuracy is greater than or equal to the accuracy threshold, determining that the expression representation semantic classification model and the expression image semantic classification model are converged, and taking the converged expression representation semantic classification model as the expression semantic extraction model.
In an alternative embodiment of the present invention, the prediction accuracy of the pseudo-label is calculated by the following formula:
the Accuracy represents the prediction Accuracy, T represents the quantity of the pseudo labels consistent with the labels in the semantic classification label result, and T represents the quantity of all labels in the semantic classification label set.
In an optional embodiment of the invention, as the expression representation semantic classification model and the expression image semantic classification model have different 'visual angles', the model parameters can be continuously adjusted by a semi-supervised interaction optimization mechanism, and the respective performances of the double models on the expression semantic classification task are perfected. For example, in parameter setting, parameters of a second classification model and a full-connection layer of the expression representation semantic classification model are all set to be 'updatable', so that a semi-supervised training mechanism can effectively correct model parameters, and accuracy of model identification is improved.
S4, carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
In the embodiment of the invention, the expression semantic result is a representation vector fusing knowledge of expression context, expression image and expression text.
In detail, the semantic extraction of the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result comprises:
carrying out vector characterization on the expression image set and the expression corpus set by using a second classification model in the expression semantic extraction model to obtain an expression semantic characterization vector set;
Outputting the predicted vector labels of the vectors in the expression semantic representation vector set by using a full connection layer in the expression semantic extraction model, wherein the predicted vector probabilities correspond to the predicted vector labels;
and taking the predicted vector label and the expression semantic representation vector with the predicted vector probability larger than or equal to a preset vector probability threshold as the expression semantic result.
In another optional embodiment of the present invention, the expression semantic result may be used in each NLP scene downstream in the financial field, and since the semantic classification model based on the expression image is also optimized in the semi-supervised interactive training, the present invention also has a certain semantic reasoning capability for OOV expression (i.e. new expression that has never appeared in the training corpus).
According to the invention, through the way of carrying out the semantic classification of the expression text on the images in the expression image set, a large number of manual labels are not needed, the semantic classification label set can be obtained, the model training efficiency is improved, and the semi-supervised bidirectional optimization training is carried out on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification label set, so that the expression semantic extraction model capable of carrying out information fusion extraction on the expression self-contained images and the expression text information is obtained. Meanwhile, the expression semantic extraction model can fully fuse the image and the text information of the expression through semi-supervised interactive training, so that the accuracy of expression semantic extraction is improved. Therefore, the expression semantic extraction method provided by the invention can accurately extract the expression semantic.
Fig. 2 is a functional block diagram of an expression semantic extraction device according to an embodiment of the present invention.
The expression semantic extraction device 100 of the present invention may be installed in an electronic device. According to the implemented functions, the expression semantic extraction device 100 may include an expression text semantic labeling module 101, an expression semantic classification model training module 102, a model optimization training module 103, and an expression semantic extraction module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the expression text semantic annotation module 101 is configured to obtain an expression image set, and perform expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set;
the expression semantic classification model training module 102 is configured to perform continuous pre-training processing on the pre-trained corpus recognition model by using a preset expression corpus set, so as to obtain an expression representation semantic classification model;
the model optimization training module 103 is configured to perform semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model;
The expression semantic extraction module 104 is configured to perform semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model, so as to obtain an expression semantic result.
In detail, the specific implementation modes of the modules of the expression semantic extraction device 100 are as follows:
step one, acquiring an expression image set, and carrying out expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set.
In the embodiment of the invention, the expression image set refers to an image set containing expressions and characters in actual business in different fields, for example, a conversation screenshot set containing expressions of customer service and users in the financial field.
In the embodiment of the present invention, performing semantic classification on the expression text of the image in the expression image set to obtain a semantic classification annotation set includes:
recognizing the expression text of the images in the expression image set by utilizing an OCR (optical character recognition) technology;
carrying out expression recognition on the expression text by using a preset text semantic recognition model to obtain an expression recognition result, and receiving an expression labeling result obtained by labeling the expression text by a user;
And summarizing the semantic classification labeling result, the expression text and the expression image set to obtain a semantic classification labeling set by taking the expression recognition result and the expression labeling result as semantic classification labeling results.
In an alternative embodiment of the present invention, the OCR (Optical Character Recognition ) recognition technology refers to a technology of performing analysis and recognition processing on an image file containing text data to obtain text and layout information, and extracting expression text from an expression image by using the OCR technology. Since the emoji text in the emoji image is often very short, the preset text semantic recognition model may use a simple NLP (Natural Language Processing ) classification model (such as LDA, recognition model defining recognition rules, etc.). Meanwhile, the labels of the expression packages to which the expressions belong can also be used as semantic expression labeling results, and only a small amount of manually labeled expression labeling results are introduced to complement each other when necessary through OCR (optical character recognition), so that the model training efficiency is improved.
And secondly, carrying out continuous pre-training treatment on the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model.
In the embodiment of the present invention, the preset expression corpus set may be a set containing expression corpora in different fields crawled from the internet. The pre-trained predictive recognition model may be an open-source generic Bert model. The continuous pre-training (continuous pre-train) means that training is performed by using a corpus in a specific field and in a task continuously to improve the model performance, so that the model performance is more suitable for actual services. For example, in the financial field, a preset expression corpus is used for training, and then a financial corpus is selected from the expression corpus for retraining, so as to obtain a Bert model after training.
In detail, the performing the pre-training processing on the pre-trained corpus recognition model by using the preset expression corpus set to obtain the expression characterization semantic classification model includes:
performing classification training on the corpus recognition model by using the expression corpus set to obtain a first classification model;
performing classification training on the first classification model by using expression corpus in a preset field in the expression corpus set to obtain a second classification model;
and connecting the full connection layer after the second classification model to obtain the expression characterization semantic classification model.
In an alternative embodiment of the invention, the Bert model (second classification model) trained in the specific field is used as a backbone network (backbone), and a full-connection layer is connected at the back, so that semantic classification prediction can be performed on N words of sentences or expression contexts where the expression is located.
And thirdly, performing semi-supervised bidirectional optimization training on the expression representation semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification annotation set to obtain an expression semantic extraction model.
In the embodiment of the invention, the expression characterization semantic classification model is used for carrying out expression semantic prediction (such as happiness, sadness, anger, and praise) on sentences or expression context texts where the expression is located. The pre-constructed expression image semantic classification model performs semantic category prediction on the expression image based on an image classification algorithm (such as a ResNet model), performs interactive training on the expression characterization semantic classification model and the expression image semantic classification model based on semi-supervised bidirectional optimization, can perform interactive training on three characteristics of expression semantics (namely expression context information, expression image information and expression text), and greatly improves the accuracy of model identification.
In detail, the performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model comprises the following steps:
outputting text prediction labels of the expression texts in the semantic classification label set and corresponding prediction probabilities by using the expression characterization semantic classification model;
outputting image prediction labels and corresponding prediction probabilities of images in the expression image set in the semantic classification label set by using the expression image semantic classification model;
and taking all prediction labels with the prediction probability larger than or equal to a preset prediction threshold value as pseudo labels, and carrying out iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo labels to obtain the expression semantic extraction model.
In the embodiment of the invention, the semantic classification based on the expression image is consistent with the standard based on the semantic classification of the expression text, namely the text prediction annotation and the image prediction annotation belong to the same set of annotation standard.
In an optional embodiment of the present invention, the expression semantic classification model and the expression image semantic classification model are used to perform expression semantic prediction on the expression text and the expression image respectively, and a prediction label with high confidence of prediction (for example, prediction probability of a certain category > =0.95) is regarded as a pseudo label, and is added into a semantic classification label set for training of two models in the next round until the two models converge.
Specifically, the performing iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo label to obtain the expression semantic extraction model includes:
calculating the prediction accuracy of the pseudo labels by using the semantic classification labeling result;
if the prediction accuracy is smaller than a preset accuracy threshold, adjusting model parameters in the expression characterization semantic classification model and the expression image semantic classification model, and returning to the step of outputting text prediction labels and corresponding prediction probabilities of the expression texts in the semantic classification label set by using the expression characterization semantic classification model;
and stopping training when the prediction accuracy is greater than or equal to the accuracy threshold, determining that the expression representation semantic classification model and the expression image semantic classification model are converged, and taking the converged expression representation semantic classification model as the expression semantic extraction model.
In an alternative embodiment of the present invention, the prediction accuracy of the pseudo-label is calculated by the following formula:
the Accuracy represents the prediction Accuracy, T represents the quantity of the pseudo labels consistent with the labels in the semantic classification label result, and T represents the quantity of all labels in the semantic classification label set.
In an optional embodiment of the invention, as the expression representation semantic classification model and the expression image semantic classification model have different 'visual angles', the model parameters can be continuously adjusted by a semi-supervised interaction optimization mechanism, and the respective performances of the double models on the expression semantic classification task are perfected. For example, in parameter setting, parameters of a second classification model and a full-connection layer of the expression representation semantic classification model are all set to be 'updatable', so that a semi-supervised training mechanism can effectively correct model parameters, and accuracy of model identification is improved.
And fourthly, carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
In the embodiment of the invention, the expression semantic result is a representation vector fusing knowledge of expression context, expression image and expression text.
In detail, the semantic extraction of the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result comprises:
carrying out vector characterization on the expression image set and the expression corpus set by using a second classification model in the expression semantic extraction model to obtain an expression semantic characterization vector set;
Outputting the predicted vector labels of the vectors in the expression semantic representation vector set by using a full connection layer in the expression semantic extraction model, wherein the predicted vector probabilities correspond to the predicted vector labels;
and taking the predicted vector label and the expression semantic representation vector with the predicted vector probability larger than or equal to a preset vector probability threshold as the expression semantic result.
In another optional embodiment of the present invention, the expression semantic result may be used in each NLP scene downstream in the financial field, and since the semantic classification model based on the expression image is also optimized in the semi-supervised interactive training, the present invention also has a certain semantic reasoning capability for OOV expression (i.e. new expression that has never appeared in the training corpus).
According to the invention, through the way of carrying out the semantic classification of the expression text on the images in the expression image set, a large number of manual labels are not needed, the semantic classification label set can be obtained, the model training efficiency is improved, and the semi-supervised bidirectional optimization training is carried out on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification label set, so that the expression semantic extraction model capable of carrying out information fusion extraction on the expression self-contained images and the expression text information is obtained. Meanwhile, the expression semantic extraction model can fully fuse the image and the text information of the expression through semi-supervised interactive training, so that the accuracy of expression semantic extraction is improved. Therefore, the expression semantic extraction device provided by the invention can accurately extract the expression semantic.
Fig. 3 is a schematic structural diagram of an electronic device for implementing the expression semantic extraction method according to an embodiment of the present invention.
The electronic device may comprise a processor 10, a memory 11, a communication interface 12 and a bus 13, and may further comprise a computer program, such as an expressive semantic extraction program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in an electronic device and various data, such as codes of expression and semantic extraction programs, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing Unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and processes data by running or executing programs or modules (e.g., expression semantic extraction programs, etc.) stored in the memory 11, and calling data stored in the memory 11.
The communication interface 12 is used for communication between the electronic device and other devices, including network interfaces and user interfaces. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), or alternatively a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
The bus 13 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus 13 may be classified into an address bus, a data bus, a control bus, and the like. The bus 13 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.
Further, the electronic device may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the electronic device may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The expression semantic extraction program stored in the memory 11 of the electronic device is a combination of instructions that, when executed in the processor 10, can implement:
Acquiring an expression image set, and carrying out expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set;
continuously pre-training the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model;
performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model;
and carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
In particular, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of the drawings, which is not repeated herein.
Further, the electronic device integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:
acquiring an expression image set, and carrying out expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set;
continuously pre-training the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model;
performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model;
and carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. An expression semantic extraction method, which is characterized by comprising the following steps:
acquiring an expression image set, and carrying out expression text semantic classification on images in the expression image set to obtain a semantic classification annotation set;
continuously pre-training the pre-trained corpus recognition model by using a preset expression corpus set to obtain an expression characterization semantic classification model;
performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification annotation set to obtain an expression semantic extraction model;
And carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
2. The expression semantic extraction method of claim 1, wherein performing the expression text semantic classification on the images in the expression image set to obtain a semantic classification label set comprises:
recognizing the expression text of the images in the expression image set by utilizing an OCR (optical character recognition) technology;
carrying out expression recognition on the expression text by using a preset text semantic recognition model to obtain an expression recognition result, and receiving an expression labeling result obtained by labeling the expression text by a user;
and summarizing the semantic classification labeling result, the expression text and the expression image set to obtain a semantic classification labeling set by taking the expression recognition result and the expression labeling result as semantic classification labeling results.
3. The expression semantic extraction method as claimed in claim 1, wherein the performing the pre-training process on the pre-trained corpus recognition model by using the preset expression corpus set to obtain the expression characterization semantic classification model includes:
Performing classification training on the corpus recognition model by using the expression corpus set to obtain a first classification model;
performing classification training on the first classification model by using expression corpus in a preset field in the expression corpus set to obtain a second classification model;
and connecting the full connection layer after the second classification model to obtain the expression characterization semantic classification model.
4. The expression semantic extraction method according to claim 2, wherein the performing semi-supervised bi-directional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by using the semantic classification label set to obtain an expression semantic extraction model comprises:
outputting text prediction labels of the expression texts in the semantic classification label set and corresponding prediction probabilities by using the expression characterization semantic classification model;
outputting image prediction labels and corresponding prediction probabilities of images in the expression image set in the semantic classification label set by using the expression image semantic classification model;
and taking all prediction labels with the prediction probability larger than or equal to a preset prediction threshold value as pseudo labels, and carrying out iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo labels to obtain the expression semantic extraction model.
5. The expression semantic extraction method of claim 4, wherein performing iterative optimization training on the expression characterization semantic classification model and the expression image semantic classification model by using the pseudo-labels to obtain the expression semantic extraction model comprises:
calculating the prediction accuracy of the pseudo labels by using the semantic classification labeling result;
if the prediction accuracy is smaller than a preset accuracy threshold, adjusting model parameters in the expression characterization semantic classification model and the expression image semantic classification model, and returning to the step of outputting text prediction labels and corresponding prediction probabilities of the expression texts in the semantic classification label set by using the expression characterization semantic classification model;
and stopping training when the prediction accuracy is greater than or equal to the accuracy threshold, determining that the expression representation semantic classification model and the expression image semantic classification model are converged, and taking the converged expression representation semantic classification model as the expression semantic extraction model.
6. The expression semantic extraction method of claim 3, wherein the performing semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result comprises:
Carrying out vector characterization on the expression image set and the expression corpus set by using a second classification model in the expression semantic extraction model to obtain an expression semantic characterization vector set;
outputting the predicted vector labels of the vectors in the expression semantic representation vector set by using a full connection layer in the expression semantic extraction model, wherein the predicted vector probabilities correspond to the predicted vector labels;
and taking the predicted vector label and the expression semantic representation vector with the predicted vector probability larger than or equal to a preset vector probability threshold as the expression semantic result.
7. The expression semantic extraction method of claim 5, wherein the prediction accuracy of the pseudo-annotation is calculated by the following formula:
the Accuracy represents the prediction Accuracy, T represents the quantity of the pseudo labels consistent with the labels in the semantic classification label result, and T represents the quantity of all labels in the semantic classification label set.
8. An expression semantic extraction device, characterized in that the device comprises:
the expression text semantic annotation module is used for acquiring an expression image set, carrying out expression text semantic classification on images in the expression image set, and obtaining a semantic classification annotation set;
The expression semantic classification model training module is used for carrying out continuous pre-training treatment on the pre-trained corpus recognition model by utilizing a preset expression corpus set to obtain an expression representation semantic classification model;
the model optimization training module is used for performing semi-supervised bidirectional optimization training on the expression characterization semantic classification model and the pre-constructed expression image semantic classification model by utilizing the semantic classification annotation set to obtain an expression semantic extraction model;
and the expression semantic extraction module is used for carrying out semantic extraction on the expression image set and the expression corpus set by using the expression semantic extraction model to obtain an expression semantic result.
9. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the emotion semantic extraction method of any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the expression semantic extraction method according to any one of claims 1 to 7.
CN202310426259.1A 2023-04-11 2023-04-11 Expression semantic extraction method, device, equipment and storage medium Pending CN116453137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310426259.1A CN116453137A (en) 2023-04-11 2023-04-11 Expression semantic extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310426259.1A CN116453137A (en) 2023-04-11 2023-04-11 Expression semantic extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116453137A true CN116453137A (en) 2023-07-18

Family

ID=87126985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310426259.1A Pending CN116453137A (en) 2023-04-11 2023-04-11 Expression semantic extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116453137A (en)

Similar Documents

Publication Publication Date Title
US20180068221A1 (en) System and Method of Advising Human Verification of Machine-Annotated Ground Truth - High Entropy Focus
CN113157927B (en) Text classification method, apparatus, electronic device and readable storage medium
CN112988963B (en) User intention prediction method, device, equipment and medium based on multi-flow nodes
CN113722483B (en) Topic classification method, device, equipment and storage medium
CN112131368B (en) Dialogue generation method and device, electronic equipment and storage medium
CN113704429A (en) Semi-supervised learning-based intention identification method, device, equipment and medium
CN113821622B (en) Answer retrieval method and device based on artificial intelligence, electronic equipment and medium
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN115221276A (en) Chinese image-text retrieval model training method, device, equipment and medium based on CLIP
CN115238115A (en) Image retrieval method, device and equipment based on Chinese data and storage medium
CN113553431B (en) User tag extraction method, device, equipment and medium
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN114385817A (en) Entity relationship identification method and device and readable storage medium
CN112269875B (en) Text classification method, device, electronic equipment and storage medium
CN116450829A (en) Medical text classification method, device, equipment and medium
CN114548114B (en) Text emotion recognition method, device, equipment and storage medium
CN116341646A (en) Pretraining method and device of Bert model, electronic equipment and storage medium
CN115114408B (en) Multi-mode emotion classification method, device, equipment and storage medium
CN116340516A (en) Entity relation cluster extraction method, device, equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN114677526A (en) Image classification method, device, equipment and medium
CN113723114A (en) Semantic analysis method, device and equipment based on multi-intent recognition and storage medium
CN116453137A (en) Expression semantic extraction method, device, equipment and storage medium
CN112749264A (en) Problem distribution method and device based on intelligent robot, electronic equipment and storage medium
CN114462411B (en) Named entity recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination