CN115187996B - Semantic recognition method and device, terminal equipment and storage medium - Google Patents

Semantic recognition method and device, terminal equipment and storage medium Download PDF

Info

Publication number
CN115187996B
CN115187996B CN202211102098.2A CN202211102098A CN115187996B CN 115187996 B CN115187996 B CN 115187996B CN 202211102098 A CN202211102098 A CN 202211102098A CN 115187996 B CN115187996 B CN 115187996B
Authority
CN
China
Prior art keywords
image
character
vector
semantic
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211102098.2A
Other languages
Chinese (zh)
Other versions
CN115187996A (en
Inventor
刘博�
杜俊博
屈玉涛
阮威健
何耀彬
胡金晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smart City Research Institute Of China Electronics Technology Group Corp
Southern University of Science and Technology
Original Assignee
Smart City Research Institute Of China Electronics Technology Group Corp
Southern University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smart City Research Institute Of China Electronics Technology Group Corp, Southern University of Science and Technology filed Critical Smart City Research Institute Of China Electronics Technology Group Corp
Priority to CN202211102098.2A priority Critical patent/CN115187996B/en
Publication of CN115187996A publication Critical patent/CN115187996A/en
Application granted granted Critical
Publication of CN115187996B publication Critical patent/CN115187996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/162Quantising the image signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The application is applicable to the technical field of data identification, and provides a semantic identification method, a semantic identification device, terminal equipment and a storage medium. The semantic recognition method specifically comprises the following steps: vectorizing each character in the text to be recognized respectively to obtain a character vector of each character; acquiring an image vector of each character, wherein the image vector is used for representing the global characteristics and the local characteristics of the original pictographic character image of the corresponding character; determining semantic classification results of corresponding characters according to the character vectors and the image vectors, wherein the semantic classification results comprise one or more semantic classifications to which the corresponding characters belong; and combining the words of the same semantic classification to obtain a semantic recognition result of the text to be recognized. The embodiment of the application can improve the accuracy of semantic recognition.

Description

Semantic recognition method and device, terminal equipment and storage medium
Technical Field
The present application belongs to the technical field of data identification, and in particular, to a semantic identification method, apparatus, terminal device, and storage medium.
Background
With the development of science and technology, the application of semantic recognition is more and more extensive, and the semantic recognition can be used for recognizing the meaning of a text and extracting the label of the text. However, when performing semantic classification on each character in the current semantic recognition mode, the current semantic recognition mode generally performs recognition and classification based on information such as characters, pinyin and the like of the character, and for characters with the same pinyin and similar character codes, semantic classification is prone to errors, which results in low semantic recognition accuracy.
Disclosure of Invention
The embodiment of the application provides a semantic recognition method, a semantic recognition device, a terminal device and a storage medium, and can solve the problem of low semantic recognition accuracy.
A first aspect of an embodiment of the present application provides a semantic identification method, including:
vectorizing each character in the text to be recognized respectively to obtain a character vector of each character;
acquiring an image vector of each character, wherein the image vector is used for representing the global characteristics and the local characteristics of the original pictographic character image of the corresponding character;
determining semantic classification results of corresponding characters according to the character vectors and the image vectors, wherein the semantic classification results comprise one or more semantic classifications to which the corresponding characters belong;
and combining the same semantically classified characters to obtain a semantic recognition result of the text to be recognized.
A second aspect of the embodiments of the present application provides a semantic recognition apparatus, including:
the character vector acquisition unit is used for respectively vectorizing each character in the text to be recognized to obtain the character vector of each character;
the image vector acquisition unit is used for acquiring the image vector of each character, and the image vector is used for representing the global characteristic and the local characteristic of the original pictographic character image of the corresponding character;
the semantic classification unit is used for determining semantic classification results of corresponding characters according to the character vectors and the image vectors, and the semantic classification results comprise one or more semantic classifications to which the corresponding characters belong;
and the semantic recognition unit is used for combining the same semantically classified characters to obtain a semantic recognition result of the text to be recognized.
A third aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the semantic recognition method when executing the computer program.
A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the semantic recognition method.
A fifth aspect of embodiments of the present application provides a computer program product, which, when running on a terminal device, enables the terminal device to execute the semantic recognition method described in the first aspect.
In the embodiment of the application, the character vectors of the characters are obtained by vectorizing the characters in the text to be recognized respectively, the image vectors of the characters are obtained, the semantic classification results of the corresponding characters are determined according to the character vectors and the image vectors, and then the characters classified by the same semantic are combined to obtain the semantic recognition result of the text to be recognized, wherein the image vectors are used for representing the global characteristics and the local characteristics of the original pictographic character images of the corresponding characters.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart illustrating an implementation of a semantic recognition method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a word skipping model provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of obtaining a fusion vector based on a cross-attention mechanism provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of a specific implementation of obtaining an image vector according to an embodiment of the present application;
FIG. 5 is a schematic diagram of obtaining an image vector according to an embodiment of the present application;
FIG. 6 is a schematic diagram of obtaining semantic classification results through a CRF target recognition model according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a semantic recognition apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall be protected by the present application.
In order to explain the technical means of the present application, the following description will be given by way of specific examples.
Fig. 1 is a schematic diagram illustrating an implementation flow of a semantic recognition method provided in an embodiment of the present application, where the method can be applied to a terminal device and is applicable to a situation where semantic recognition accuracy needs to be improved. The terminal device may be an intelligent device such as a mobile phone, a robot, a tablet computer, a notebook computer, a desktop computer, and a wearable device, and is not limited in this application.
Specifically, the semantic recognition method may include the following steps S101 to S104.
Step S101, each character in the text to be recognized is vectorized to obtain the character vector of each character.
The text to be recognized is the text needing semantic classification.
In the implementation manner of the application, the terminal device may obtain a text to be recognized, the text to be recognized often includes a plurality of characters, and a character string of a preset length may represent one character, so that each character in the text to be recognized may be obtained in a character segmentation manner, and then each character in the text to be recognized is vectorized. It should be understood that the preset length is related to the encoding mode used by the terminal device, and may be set according to the encoding mode actually used.
The method and the device for acquiring the text to be recognized are not limited, and the terminal device can acquire the text to be recognized input by a user and can also acquire the text to be recognized stored by other devices. For example, the user may submit the text to be recognized to the server through a web page, and the server stores the text to be recognized. And when the number of the texts to be recognized reaches a threshold value, the terminal equipment acquires each text to be recognized from the server.
In order to convert characters in a text to be recognized into a form which is easier to understand by a deep learning model and facilitate the fusion of subsequent different feature information, in the embodiment of the application, the text can be converted into a numerical representation form by vectorization to obtain a character vector. Where vectorization can be implemented in different ways. For example, in some embodiments, the terminal device may be implemented by One-Hot Encoding (One-Hot Encoding). Other embodimentsIn this case, the terminal device may also be implemented by using a Continuous Bag-of-Words Model (CBOW) or a Skip-Gram Model. FIG. 2 is a schematic diagram showing the structure of a skip model, which mainly includes an input layer (input), a prediction layer (projection) and an output layer (output), and the basic principle is to use the input
Figure 133419DEST_PATH_IMAGE001
Vector de-prediction
Figure 600959DEST_PATH_IMAGE001
In the context of
Figure 911855DEST_PATH_IMAGE002
Figure 759594DEST_PATH_IMAGE003
Figure 200065DEST_PATH_IMAGE004
Figure 163473DEST_PATH_IMAGE005
The corresponding vector forms the output character vector. Wherein,
Figure 961665DEST_PATH_IMAGE001
a vector may refer to a vector encoded by one-hot encoding for each word in the text to be recognized.
Step S102, image vectors of all characters are obtained.
In embodiments of the present application, the above-described image vectors may be used to characterize global and local features of the original pictographic character image of the corresponding character. The global features also represent the overall font of the corresponding characters, and the local features also represent the local parts of the font of the corresponding characters, i.e. the detail information of the font.
And step S103, determining semantic classification results of corresponding characters according to the character vectors and the image vectors.
The semantic classification result comprises one or more semantic classifications to which the corresponding characters belong. For example, the classification result of the "bright" word for "tomorrow" may be "B-Time", and the classification result of the "day" word may be "I-Time", where B indicates that it belongs to the start character, I indicates that it belongs to the end character, and Time indicates that it belongs to the Time.
Specifically, the terminal device may splice a character vector and an image vector of the same character to obtain a fusion vector, or fuse the character vector and the image vector based on an attention mechanism to obtain the fusion vector. The Attention mechanism may be a multi-head Attention mechanism, a Cross Attention mechanism (Cross Attention), or the like. Referring to fig. 3, the cross attention mechanism may input the acquired image vector and text vector into a cross-attention module to perform inter-modal association, splice the image vector and text vector, then input the spliced image vector and text vector into another transform unit, and then perform a 1d-CNN and pooling operation to fuse all features, perform similarity matching, so that the fused vector focuses more on similar portions in the image vector and text vector, thereby obtaining a fused vector.
And step S104, combining the characters classified by the same semantic meaning to obtain a semantic meaning recognition result of the text to be recognized.
For example, based on the fact that the classification result of the "bright" word in the "tomorrow" of the text to be recognized is "B-Time", and the classification result of the "day" word is "I-Time", the combination is performed according to the same semantic classification Time ", and the semantic recognition result of the" tomorrow "of the text to be recognized is obtained as follows: the "time" is "tomorrow".
After obtaining the semantic recognition result, in some embodiments of the present application, the terminal device may perform secondary analysis based on the semantic recognition result of the text to be recognized. When the number of the texts to be recognized is multiple, determining semantics of which the occurrence frequency is within a preset frequency range in all the texts to be recognized according to semantic recognition results of the multiple texts to be recognized, for example, analyzing the semantics with the highest occurrence probability in the multiple texts to be recognized so as to obtain a user portrait.
In other embodiments, after the semantic recognition result is obtained, the terminal device may further control the target device based on the semantic recognition result of the text to be recognized, for example, control the robot to execute a task corresponding to the semantic recognition result, or control the smart home to be turned on and turned off.
In the embodiment of the application, the character vectors of all characters are obtained by vectorizing all the characters in the text to be recognized respectively, the image vectors of all the characters are obtained, the semantic classification results of the corresponding characters are determined according to the character vectors and the image vectors, and then the characters classified by the same semantic are combined to obtain the semantic recognition results of the text to be recognized, wherein the image vectors are used for representing the global features and the local features of the original pictographic character images of the corresponding characters.
In some embodiments of the present application, as shown in fig. 4, the image vector may be acquired through the following steps S401 to S404.
Step S401, the original pictograph image of each character is crawled.
In some embodiments of the present application, the terminal device may fill the keyword into a web page input box of the search server to obtain an image search result output by the search server according to the keyword, and images in the image search result may be sorted according to a preset sorting condition. The preset sorting condition may be a sorting condition input by a user, or may also be a sorting condition set by a search server by default, for example, the preset sorting condition may refer to the highest degree of correlation, the largest image size, the largest click amount, and the like.
In the embodiment of the present application, the keywords may include types of single words and pictographs, and it should be understood that the pictographs have many types, such as oracle pictographs, dongba pictographs, etc., and the keywords are composed of the types of words and pictographs in the text to be recognized, for example, "light" and "dongba pictographs" are composed as the keywords to obtain the image search result, so that the original pictograph image corresponding to each word is the image of the same type of pictograph.
After the image search result is obtained, the terminal device can take the image with the highest priority in the image search result as the original pictographic character image of the corresponding character.
Step S402, image segmentation is carried out on the original pictograph image to obtain a segmented image.
The image segmentation method is not limited in the present application, and in some embodiments, the image segmentation method may be performed by using a threshold segmentation method. In other embodiments of the present application, an edge operator may be used for image segmentation. The edge operator may be a Sobel (Sobel) edge operator, a Roberts (Roberts) edge operator, or the like. The edge operator may implement edge detection to segment the original pictograph image into a plurality of segmented images. It should be understood that the segmented image, i.e., a portion of the original pictograph image, then the image features of the segmented image may also represent local features of the original pictograph image.
Step S403, vectorizing the original pictographic image and the divided image to obtain vectors corresponding to the original pictographic image and the divided image, respectively.
Specifically, in some embodiments of the present application, picture embedding may be performed by a depth residual network (ResNet) technology, so as to obtain vectors corresponding to the original pictograph image and the segmented image respectively. The ResNet basic unit is used for convolving an input image (an original pictographic image or a segmented image) through a convolution layer, short-circuiting a convolution result with the input image, and outputting the convolution result through an activation function (ReLU) to realize vectorization.
After vectorization, the vector corresponding to the original pictographic image can be used to characterize the global features of the original pictographic image, while the vector corresponding to each segmented image can be used to characterize the local features of the original pictographic image. That is, the terminal device may acquire an original pictograph image corresponding to each character, and segment the original pictograph image to obtain a segmented image to determine an image vector based on the original pictograph image and the segmented image.
Moreover, the original pictograph image and the segmentation image respectively correspond to the same vector and character vector, so that subsequent vector fusion is convenient to carry out.
Step S404, determining image vectors according to the corresponding vectors of the original pictographic character image and the segmentation image.
In some embodiments of the present application, for the same text, the terminal device may fuse vectors corresponding to the original pictographic text image and the segmented image respectively by adding point-wise addition (point-wise addition) or vector splicing (concatemate) and determine an image vector thereof.
In some embodiments of the present application, the image vector may be obtained by weighted addition, considering that in practical applications, different segmented images contain different information with different degrees of importance.
Specifically, the terminal device may determine the weight of each segmented image according to the reference information of each segmented image, then obtain the weight of the original pictograph image, and further perform weighted addition on vectors corresponding to the original pictograph image and the segmented images respectively according to the weight of the original pictograph image and the weight of each segmented image, so as to obtain an image vector.
Wherein the weight of the original pictograph image may be a preset value, and the weight of each divided image may be determined based on the reference information. The reference information may include at least one of a position of the divided image in the original pictograph image, an area ratio of the divided image in the original pictograph image, and pixel point information in the divided image.
Because the semanteme such as 'hammer' and 'hammer' is expressed in the degree of the first fixed distance of the side part of the Chinese character, the Chinese character is the same in pinyin and similar in character, and the Chinese character is mainly distinguished in the side part of the Chinese character in character form. The radicals are generally located on the periphery of the font, so if the position of the segmented image in the original pictographic character image is located in a preset region, the weight of the segmented image can be set as a first weight, otherwise, the weight is set as a second weight, wherein the preset region can be a region outside the image center region, the image center region is a rectangular region which takes the image center as the center, the length is a preset length, the width is a preset width in the original pictographic character image, and both the preset length and the preset width can be set by a user. And, the first weight is greater than the second weight. In this way, radicals located at the periphery of the glyph can get higher weight.
The area ratio of the segmented image in the original pictographic character image and the pixel point information in the segmented image can represent the complexity of the local font represented by the segmented image. It should be understood that the local font represented by the segmented image is simple, and such local font often appears in different Chinese characters, for example, "kou" may appear in Chinese characters such as "country", "shout", "sell", and the like, and is less helpful for semantic recognition; the local font represented by the segmentation image is more complex, and the local font appears in fewer Chinese characters. Thus, the more complex the local glyph represented by the segmented image, the more conducive to recognition. Specifically, if the area proportion of the segmented image in the original pictograph image is greater than a preset proportion value, or the number of pixels of the pixel value in the segmented image in a preset range is greater than a preset number (for example, the number of pixels of which the pixel value is 0 is greater than the preset number), the weight of the segmented image may be set as a third weight, otherwise, the weight is set as a fourth weight, wherein the preset proportion value and the preset number may both be set by a user, and the third weight is greater than the fourth weight. In this way, among the partial glyphs divided from the whole glyph, complicated partial glyphs can be weighted more heavily.
Based on the position, the area ratio and at least one of the pixel point information, the vectors corresponding to the original pictographic character image and the segmentation image are weighted and added, so that the obtained image vector is more favorable for semantic classification.
Referring to fig. 5, fig. 5 shows a process of image vector acquisition. The terminal device divides an original pictograph image 51 of the character 'bright' in the text to be recognized through a Sobel edge operator to obtain divided images 52 and 53, and after a vector 54 corresponding to the original pictograph image 51 and vectors 55 and 56 corresponding to the divided images 52 and 53 are obtained through ResNet technical processing, vectors are fused to obtain an image vector 57.
After the image vector and the character vector are obtained, the terminal device can fuse the image vector and the character vector to obtain a fusion vector, and at the moment, the fusion vector can represent character features of characters, overall features of fonts and local features simultaneously. The terminal device can input the fusion vector to a pre-trained target recognition network to obtain a semantic classification result of a corresponding character output by the target recognition network. The target recognition network may be a Neural network based on a Conditional Random Field (CRF), a Convolutional Neural Network (CNN), a Long-Short-Term Memory Neural network (LSTM), or the like.
The training process of the target recognition network may include: and acquiring sample service data, taking each character in the sample service data as a sample character, and acquiring a reference classification result of each sample character. For example, the sample business data may be obtained by the aforementioned obtaining manner of the text to be recognized, and then training data is composed of the sample business data and the reference classification result by manually annotating each character belonging to the category of people, time, place, and the like as the reference classification result.
And then, determining a sample fusion vector of each sample character according to the mode of determining the fusion vector, inputting the sample fusion vector to the identification network to be trained for iterative training, if the similarity between the result output by the identification network to be trained and the reference classification result does not meet the similarity condition and the iteration number of the iterative training does not meet the number condition, adjusting the model parameter of the identification network to be trained, and inputting the sample fusion vector to the identification network to be trained again for training until the similarity between the prediction result output by the identification network to be trained and the reference classification result meets the similarity condition or the iteration number of the iterative training meets the number condition, and storing the identification network to be trained in a memory of the terminal device as a target identification network. The similarity condition and the frequency condition can be set by a user according to actual requirements such as accuracy, training efficiency and model generalization degree.
Referring to fig. 6, for each of the characters "bright", "day", "go", "west", and "lake" in the text "tomorrow go to west lake" to be recognized, a fusion vector of each character is determined, each fusion vector is input into the CRF target recognition model, and a semantic classification result of "bright" as "B-Time", a semantic classification result of "day" as "I-Time", a semantic classification result of "go" as "O", a semantic classification result of "west" as "B-Loc", and a semantic classification result of "lake" as "I-Loc" can be output. B represents that the text belongs to a starting character, I represents that the text belongs to an ending character, time represents that the text belongs to Time, loc represents that the text belongs to a place, and O represents that the text belongs to other categories, and characters with the same category are combined to obtain a semantic recognition result of a text to be recognized, namely 'tomorrow to West lake': the time is tomorrow, and the place is West lake.
It should be noted that for simplicity of description, the above-mentioned method embodiments are described as a series of combinations of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts, as some steps may occur in other orders according to the present application.
Fig. 7 is a schematic structural diagram of a semantic recognition apparatus 700 according to an embodiment of the present disclosure, where the semantic recognition apparatus 700 is configured on a terminal device.
Specifically, the semantic recognition apparatus 700 may include:
a character vector obtaining unit 701, configured to separately perform vectorization on each word in a text to be recognized, so as to obtain a character vector of each word;
an image vector obtaining unit 702, configured to obtain an image vector of each text, where the image vector is used to represent global features and local features of an original pictograph image of a corresponding text;
a semantic classification unit 703, configured to determine a semantic classification result of the corresponding text according to the character vector and the image vector, where the semantic classification result includes one or more semantic classifications to which the corresponding text belongs;
and a semantic recognition unit 704, configured to combine the same semantically classified characters to obtain a semantic recognition result of the text to be recognized.
In some embodiments of the present application, the image vector acquiring unit 702 may be specifically configured to: crawling the original pictographic character image of each character; carrying out image segmentation on the original pictographic character image to obtain a segmented image; vectorizing the original pictograph image and the segmentation image respectively to obtain vectors corresponding to the original pictograph image and the segmentation image respectively, wherein the vector corresponding to the original pictograph image is used for representing the global features, and the vector corresponding to the segmentation image is used for representing the local features; and determining the image vector according to the vectors respectively corresponding to the original pictograph image and the segmented image.
In some embodiments of the present application, the image vector acquiring unit 702 may be specifically configured to: for a single character, obtaining an image search result output by a search server according to a keyword, wherein the keyword comprises the single character and the type of a pictograph, and images in the image search result are sorted according to a preset sorting condition; and taking the image with the highest ranking priority in the image searching results as the original pictograph image of the single character.
In some embodiments of the present application, the image vector obtaining unit 702 may be specifically configured to: determining the weight of each segmented image according to the reference information of each segmented image, wherein the reference information comprises at least one of the position of the segmented image in the original pictograph image, the area ratio of the segmented image in the original pictograph image and the pixel point information in the segmented image; acquiring the weight of the original pictographic character image; and weighting and adding vectors corresponding to the original pictograph image and the segmentation images respectively according to the weight of the original pictograph image and the weight of each segmentation image to obtain the image vector.
In some embodiments of the present application, the semantic classifying unit 703 may be specifically configured to: based on a cross attention mechanism, fusing the character vector and the image vector of the same character to obtain a fusion vector of the corresponding character; and determining the semantic classification result of the corresponding characters by using the fusion vector.
In some embodiments of the present application, the semantic classifying unit 703 may be specifically configured to: and inputting the fusion vector to a pre-trained target recognition network, and acquiring a semantic classification result of the corresponding characters output by the target recognition network.
In some embodiments of the present application, the semantic recognition apparatus 700 may include a training unit, configured to use each text in the sample service data as a sample text, and obtain a reference classification result of each sample text; and determining a sample fusion vector of each sample character, inputting the sample fusion vector into a recognition network to be trained for iterative training, and taking the recognition network to be trained as the target recognition network when the similarity between a prediction result output by the recognition network to be trained and the reference classification result meets a similarity condition or the iteration number of the iterative training meets a number condition.
In some embodiments of the present application, the semantic identifying apparatus 700 may include a semantic application unit configured to: controlling the target equipment based on the semantic recognition result; or when the number of the texts to be recognized is multiple, determining semantics of the texts to be recognized, the occurrence frequency of which is within a preset frequency range, according to the semantic recognition results of the multiple texts to be recognized.
It should be noted that, for convenience and simplicity of description, the specific working process of the semantic recognition device 700 may refer to the corresponding process of the method described in fig. 1 to fig. 6, and is not described herein again.
Fig. 8 is a schematic diagram of a terminal device according to an embodiment of the present application. The terminal device 8 may include: a processor 80, a memory 81 and a computer program 82, such as a semantic recognition program, stored in said memory 81 and executable on said processor 80. The processor 80, when executing the computer program 82, implements the steps in the above-described respective embodiments of the semantic recognition method, such as the steps S101 to S103 shown in fig. 1. Alternatively, the processor 80 executes the computer program 82 to implement the functions of the modules/units in the device embodiments, such as the character vector acquisition unit 701, the image vector acquisition unit 702, the semantic classification unit 703 and the semantic recognition unit 704 shown in fig. 7.
The computer program may be divided into one or more modules/units, which are stored in the memory 81 and executed by the processor 80 to complete the application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.
For example, the computer program may be divided into: the device comprises a character vector acquisition unit, an image vector acquisition unit, a semantic classification unit and a semantic identification unit.
The specific functions of each unit are as follows: the character vector acquisition unit is used for respectively vectorizing each character in the text to be recognized to obtain the character vector of each character; the image vector acquisition unit is used for acquiring the image vector of each character, and the image vector is used for representing the global characteristic and the local characteristic of the original pictographic character image of the corresponding character; the semantic classification unit is used for determining semantic classification results of corresponding characters according to the character vectors and the image vectors, and the semantic classification results comprise one or more semantic classifications to which the corresponding characters belong; and the semantic recognition unit is used for combining the same semantically classified characters to obtain a semantic recognition result of the text to be recognized.
The terminal device may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of a terminal device and is not limiting and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.
The Processor 80 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 81 may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 81 may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device. The memory 81 is used for storing the computer programs and other programs and data required by the terminal device. The memory 81 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for convenience and simplicity of description, the structure of the terminal device may also refer to the detailed description of the structure in the method embodiment, and is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may be available in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (9)

1. A method of semantic identification, comprising:
vectorizing each character in the text to be recognized respectively to obtain a character vector of each character;
acquiring an image vector of each character, wherein the image vector is used for representing the global characteristics and the local characteristics of the original pictographic character image of the corresponding character;
determining semantic classification results of corresponding characters according to the character vectors and the image vectors, wherein the semantic classification results comprise one or more semantic classifications to which the corresponding characters belong;
combining the characters of the same semantic classification to obtain a semantic recognition result of the text to be recognized;
the obtaining of the image vector of each character includes:
crawling the original pictographic character image of each character;
carrying out image segmentation on the original pictographic character image to obtain a segmented image;
vectorizing the original pictograph image and the segmentation image respectively to obtain vectors corresponding to the original pictograph image and the segmentation image respectively, wherein the vector corresponding to the original pictograph image is used for representing the global features, and the vector corresponding to the segmentation image is used for representing the local features;
and determining the image vector according to the vectors respectively corresponding to the original pictograph image and the segmented image.
2. The semantic recognition method according to claim 1, wherein in the step of crawling the original pictographic image of each text, the operation of crawling the single text comprises:
acquiring an image search result output by a search server according to a keyword, wherein the keyword comprises the types of the single character and the pictographic character, and images in the image search result are sorted according to a preset sorting condition;
and taking the image with the highest priority in the image searching results as the original pictographic character image of the single character.
3. The semantic recognition method according to claim 1, wherein the determining the image vector according to the vectors corresponding to the original pictograph image and the segmented image respectively comprises:
determining the weight of each segmented image according to the reference information of each segmented image, wherein the reference information comprises at least one of the position of the segmented image in the original pictographic character image, the area ratio of the segmented image in the original pictographic character image and the pixel point information in the segmented image;
acquiring the weight of the original pictographic character image;
and according to the weight of the original pictographic character image and the weight of each segmented image, carrying out weighted addition on vectors corresponding to the original pictographic character image and the segmented images respectively to obtain the image vector.
4. The semantic recognition method according to any one of claims 1 to 3, wherein the determining the semantic classification result of the corresponding text according to the character vector and the image vector comprises:
based on a cross attention mechanism, fusing the character vector and the image vector of the same character to obtain a fusion vector of the corresponding character;
and determining the semantic classification result of the corresponding characters by using the fusion vector.
5. The method for semantic recognition according to claim 4, wherein the determining the semantic classification result of the corresponding text by using the fused vector comprises:
inputting the fusion vector to a pre-trained target recognition network, and acquiring semantic classification results of corresponding characters output by the target recognition network;
the training process of the target recognition network comprises the following steps:
acquiring sample service data;
taking each character in the sample service data as a sample character, and acquiring a reference classification result of each sample character;
and determining a sample fusion vector of each sample character, inputting the sample fusion vector into a recognition network to be trained for iterative training, and taking the recognition network to be trained as the target recognition network when the similarity between a prediction result output by the recognition network to be trained and the reference classification result meets a similarity condition or the iteration times of iterative training meets a time condition.
6. The method according to any one of claims 1 to 3, wherein after the words of the same semantic classification are combined to obtain a semantic recognition result of the text to be recognized, the method further comprises:
controlling the target equipment based on the semantic recognition result;
or when the number of the texts to be recognized is multiple, determining the semantics of which the occurrence frequency is within a preset frequency range in all the texts to be recognized according to the semantic recognition results of the multiple texts to be recognized.
7. A semantic recognition apparatus, comprising:
the character vector acquisition unit is used for respectively vectorizing each character in the text to be recognized to obtain the character vector of each character;
the image vector acquisition unit is used for acquiring the image vector of each character, and the image vector is used for representing the global characteristic and the local characteristic of the original pictographic character image of the corresponding character;
the semantic classification unit is used for determining semantic classification results of corresponding characters according to the character vectors and the image vectors, and the semantic classification results comprise one or more semantic classifications to which the corresponding characters belong;
the semantic recognition unit is used for combining the same semantically classified characters to obtain a semantic recognition result of the text to be recognized;
the image vector acquisition unit is specifically configured to: crawling the original pictographic character image of each character; carrying out image segmentation on the original pictographic character image to obtain a segmented image; vectorizing the original pictograph image and the segmentation image respectively to obtain vectors corresponding to the original pictograph image and the segmentation image respectively, wherein the vector corresponding to the original pictograph image is used for representing the global features, and the vector corresponding to the segmentation image is used for representing the local features; and determining the image vector according to the vectors respectively corresponding to the original pictographic image and the segmented image.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor when executing the computer program realizes the steps of the semantic recognition method according to any one of claims 1 to 6.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the semantic recognition method according to any one of claims 1 to 6.
CN202211102098.2A 2022-09-09 2022-09-09 Semantic recognition method and device, terminal equipment and storage medium Active CN115187996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211102098.2A CN115187996B (en) 2022-09-09 2022-09-09 Semantic recognition method and device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211102098.2A CN115187996B (en) 2022-09-09 2022-09-09 Semantic recognition method and device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115187996A CN115187996A (en) 2022-10-14
CN115187996B true CN115187996B (en) 2023-01-06

Family

ID=83524449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211102098.2A Active CN115187996B (en) 2022-09-09 2022-09-09 Semantic recognition method and device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115187996B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134064A (en) * 2013-05-02 2014-11-05 百度国际科技(深圳)有限公司 Character recognition method and device
CN104484666A (en) * 2014-12-17 2015-04-01 中山大学 Advanced image semantic parsing method based on human-computer interaction
CN106023993A (en) * 2016-07-29 2016-10-12 西安旭天电子科技有限公司 Robot control system based on natural language and control method thereof
CN108877839A (en) * 2018-08-02 2018-11-23 南京华苏科技有限公司 The method and system of perceptual evaluation of speech quality based on voice semantics recognition technology
CN109948148A (en) * 2019-02-28 2019-06-28 北京学之途网络科技有限公司 A kind of text information emotion determination method and decision maker
CN110263324A (en) * 2019-05-16 2019-09-20 华为技术有限公司 Text handling method, model training method and device
CN110414498A (en) * 2019-06-14 2019-11-05 华南理工大学 A kind of natural scene text recognition method based on intersection attention mechanism
CN110852368A (en) * 2019-11-05 2020-02-28 南京邮电大学 Global and local feature embedding and image-text fusion emotion analysis method and system
CN111275046A (en) * 2020-01-10 2020-06-12 中科鼎富(北京)科技发展有限公司 Character image recognition method and device, electronic equipment and storage medium
CN112149642A (en) * 2020-10-28 2020-12-29 腾讯科技(深圳)有限公司 Text image recognition method and device
CN112686265A (en) * 2021-01-07 2021-04-20 南京大学 Hierarchic contour extraction-based pictograph segmentation method
CN112817996A (en) * 2021-02-23 2021-05-18 杭州安恒信息技术股份有限公司 Illegal keyword library updating method, device, equipment and storage medium
CN113723426A (en) * 2021-07-28 2021-11-30 北京工业大学 Image classification method and device based on deep multi-flow neural network
CN113836298A (en) * 2021-08-05 2021-12-24 合肥工业大学 Text classification method and system based on visual enhancement
CN113901803A (en) * 2021-10-08 2022-01-07 中山大学 Small sample OOV learning method and system based on Chinese character structure
CN114398505A (en) * 2022-01-19 2022-04-26 腾讯科技(深圳)有限公司 Target word determining method, model training method and device and electronic equipment
CN114579723A (en) * 2022-03-02 2022-06-03 平安科技(深圳)有限公司 Interrogation method and apparatus, electronic device, and storage medium
CN114625877A (en) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 Text classification method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218191A1 (en) * 2004-08-31 2006-09-28 Gopalakrishnan Kumar C Method and System for Managing Multimedia Documents
CN102968637B (en) * 2012-12-20 2015-06-03 山东科技大学 Complicated background image and character division method
CN104778242B (en) * 2015-04-09 2018-07-13 复旦大学 Cartographical sketching image search method and system based on image dynamic partition
CN110717498A (en) * 2019-09-16 2020-01-21 腾讯科技(深圳)有限公司 Image description generation method and device and electronic equipment
CN112418209B (en) * 2020-12-15 2022-09-13 润联软件***(深圳)有限公司 Character recognition method and device, computer equipment and storage medium
CN113449081A (en) * 2021-07-08 2021-09-28 平安国际智慧城市科技股份有限公司 Text feature extraction method and device, computer equipment and storage medium

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134064A (en) * 2013-05-02 2014-11-05 百度国际科技(深圳)有限公司 Character recognition method and device
CN104484666A (en) * 2014-12-17 2015-04-01 中山大学 Advanced image semantic parsing method based on human-computer interaction
CN106023993A (en) * 2016-07-29 2016-10-12 西安旭天电子科技有限公司 Robot control system based on natural language and control method thereof
CN108877839A (en) * 2018-08-02 2018-11-23 南京华苏科技有限公司 The method and system of perceptual evaluation of speech quality based on voice semantics recognition technology
CN109948148A (en) * 2019-02-28 2019-06-28 北京学之途网络科技有限公司 A kind of text information emotion determination method and decision maker
CN110263324A (en) * 2019-05-16 2019-09-20 华为技术有限公司 Text handling method, model training method and device
CN110414498A (en) * 2019-06-14 2019-11-05 华南理工大学 A kind of natural scene text recognition method based on intersection attention mechanism
CN110852368A (en) * 2019-11-05 2020-02-28 南京邮电大学 Global and local feature embedding and image-text fusion emotion analysis method and system
CN111275046A (en) * 2020-01-10 2020-06-12 中科鼎富(北京)科技发展有限公司 Character image recognition method and device, electronic equipment and storage medium
CN112149642A (en) * 2020-10-28 2020-12-29 腾讯科技(深圳)有限公司 Text image recognition method and device
CN112686265A (en) * 2021-01-07 2021-04-20 南京大学 Hierarchic contour extraction-based pictograph segmentation method
CN112817996A (en) * 2021-02-23 2021-05-18 杭州安恒信息技术股份有限公司 Illegal keyword library updating method, device, equipment and storage medium
CN113723426A (en) * 2021-07-28 2021-11-30 北京工业大学 Image classification method and device based on deep multi-flow neural network
CN113836298A (en) * 2021-08-05 2021-12-24 合肥工业大学 Text classification method and system based on visual enhancement
CN113901803A (en) * 2021-10-08 2022-01-07 中山大学 Small sample OOV learning method and system based on Chinese character structure
CN114398505A (en) * 2022-01-19 2022-04-26 腾讯科技(深圳)有限公司 Target word determining method, model training method and device and electronic equipment
CN114579723A (en) * 2022-03-02 2022-06-03 平安科技(深圳)有限公司 Interrogation method and apparatus, electronic device, and storage medium
CN114625877A (en) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 Text classification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115187996A (en) 2022-10-14

Similar Documents

Publication Publication Date Title
Palm et al. Attend, copy, parse end-to-end information extraction from documents
CN112632225B (en) Semantic searching method and device based on case and event knowledge graph and electronic equipment
CN111275107A (en) Multi-label scene image classification method and device based on transfer learning
US11790675B2 (en) Recognition of handwritten text via neural networks
CN114495129B (en) Character detection model pre-training method and device
US20220327816A1 (en) System for training machine learning model which recognizes characters of text images
CA3119249C (en) Querying semantic data from unstructured documents
CN114140673B (en) Method, system and equipment for identifying violation image
CN110084172A (en) Character recognition method, device and electronic equipment
CN112560506B (en) Text semantic analysis method, device, terminal equipment and storage medium
CN113221718A (en) Formula identification method and device, storage medium and electronic equipment
CN114612921A (en) Form recognition method and device, electronic equipment and computer readable medium
CN114708595A (en) Image document structured analysis method, system, electronic device, and storage medium
CN114444508A (en) Date identification method and device, readable medium and electronic equipment
CN110070042A (en) Character recognition method, device and electronic equipment
CN117235605A (en) Sensitive information classification method and device based on multi-mode attention fusion
CN115187996B (en) Semantic recognition method and device, terminal equipment and storage medium
CN117056474A (en) Session response method and device, electronic equipment and storage medium
CN115630166A (en) Information extraction method and device, electronic equipment and storage medium
CN117235205A (en) Named entity recognition method, named entity recognition device and computer readable storage medium
CN114328884A (en) Image-text duplication removing method and device
CN113822143A (en) Text image processing method, device, equipment and storage medium
CN112231473A (en) Commodity classification method based on multi-mode deep neural network model
CN115618968B (en) New idea discovery method and device, electronic device and storage medium
Honzík Multi-modální zpracování dokumentů

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221209

Address after: 518000 19 / F, building C, Shenzhen International Innovation Center, 1006 Shennan Avenue, Huafu street, Futian District, Shenzhen City, Guangdong Province

Applicant after: THE SMART CITY RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY Group Corp.

Applicant after: Southern University of Science and Technology

Address before: 518000 19 / F, building C, Shenzhen International Innovation Center, 1006 Shennan Avenue, Huafu street, Futian District, Shenzhen City, Guangdong Province

Applicant before: THE SMART CITY RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY Group Corp.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant