CN116246276A

CN116246276A - Information identification method, device, equipment and readable storage medium

Info

Publication number: CN116246276A
Application number: CN202211659334.0A
Authority: CN
Inventors: 唐锦阳; 郭亚; 祝慧佳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-12-22
Filing date: 2022-12-22
Publication date: 2023-06-09

Abstract

The specification discloses an information identification method, device, equipment and readable storage medium, wherein a target image is generated as a training sample through a target word pair formed at least according to target words with corresponding relations, the corresponding relations among the target words in the target word pair are used as first labels of the training sample, the predicted corresponding relations among the undetermined words are obtained through inputting characters contained in the target words and coordinate information of the characters in the target image into an identification model, and the difference between the predicted corresponding relations and the first labels is minimized to be a training target, so that the identification model is trained. Therefore, the problem of insufficient training samples is solved by generating the target image based on the target word pairs as the training samples, and the corresponding relation exists among the target words used for generating the training samples, so that the recognition model can output the corresponding relation among the words contained in the image, the efficiency of extracting the structured information from the image is improved, and the safety of the privacy information is improved.

Description

Information identification method, device, equipment and readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an information identification method, an information identification device, an information identification apparatus, and a readable storage medium.

Background

With the improvement of people's attention to privacy data and the continuous development of information technology, more and more users acquire services provided by service providers through a service platform. To ensure compliance of a service provider providing a service to a user, a service platform requires compliance auditing of the service provider. In the process, the method of extracting information from the image uploaded by the service provider and auditing can be adopted, so that auditing efficiency is improved. Currently, compliance verification may be performed on a service provider by first extracting characters from an image by an optical character recognition technique and then parsing structured information from the extracted characters. How to extract information from images more quickly and effectively becomes a problem to be solved.

Based on this, the present specification provides an information recognition method.

Disclosure of Invention

The present specification provides an information recognition method, apparatus, device, and readable storage medium, to partially solve the above-mentioned problems of the prior art.

The technical scheme adopted in the specification is as follows:

The specification provides an information identification method, which comprises the following steps:

generating a target image as a training sample at least according to a target word pair consisting of target words with corresponding relations, and determining a first label of the training sample according to the corresponding relations among the target words in the target word pair;

identifying each character in the training sample, and determining coordinate information of each identified character in the target image;

inputting each recognized character and coordinate information of each character into a recognition model to be trained, and obtaining a prediction corresponding relation between each word to be defined, which is output by the recognition model and is formed by each character;

training the recognition model by taking the difference between the prediction corresponding relation and the first label of the training sample as a training target;

and responding to an identification request carrying an image to be identified, and identifying words contained in the image to be identified as an identification result through training the completed identification model.

The present specification provides an information identifying apparatus including:

the generating module is used for generating a target image as a training sample at least according to a target word pair formed by target words with corresponding relations, and determining a first label of the training sample according to the corresponding relations among the target words in the target word pair;

The first recognition module is used for recognizing each character in the training sample and determining coordinate information of each recognized character in the target image;

the prediction corresponding relation determining module is used for inputting the recognized characters and the coordinate information of the characters into a recognition model to be trained to obtain the undetermined words formed by the characters and the prediction corresponding relation among the undetermined words output by the recognition model;

the training module is used for training the identification model by taking the difference between the prediction corresponding relation and the first label of the training sample as a training target;

the second recognition module is used for responding to a recognition request carrying an image to be recognized, and recognizing words contained in the image to be recognized as a recognition result through a recognition model which is completed through training.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described information identifying method.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above information identification method when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

according to the information identification method provided by the specification, a target image is generated as a training sample according to at least a target word pair formed by target words with corresponding relations, the corresponding relations among the target words in the target word pair are used as first labels of the training sample, characters contained in the target words and coordinate information of the characters in the target image are used as input, the input is input into an identification model to be trained, the prediction corresponding relations among the undetermined words are determined, and the difference between the prediction corresponding relations and the first labels is minimized to be a training target, so that the identification model is trained. Therefore, the problem of insufficient training samples can be solved by generating the target image as the training samples based on the target word pairs, and the corresponding relation exists among the target words used for generating the training samples, so that the trained recognition model can recognize the corresponding relation among words contained in the image, and the efficiency of extracting the structured information from the image is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly.

In the drawings:

fig. 1 is a schematic flow chart of an information identification method in the present specification;

FIG. 2 is a schematic flow chart of an information identification method in the present specification;

FIG. 3 is a schematic diagram of an information recognition model in the present specification;

FIG. 4 is a schematic flow chart of an information identification method in the present specification;

FIG. 5 is a schematic diagram of an information recognition device provided in the present specification;

fig. 6 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In addition, it should be noted that, all actions of acquiring signals, information or data are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

When providing services for users, the service platform can conduct compliance verification on service providers in order to ensure compliance of the service providers providing services for users. The auditing content at least comprises business license, industry qualification certificate, door license and the like of the service provider, and of course, the service platform can also provide the service provider with the credentials, such as invoice, which are reserved when the service provider provides the service to the user.

In order to simplify the auditing process and improve auditing efficiency, generally, a service provider may upload content to be audited to a service platform in the form of an image, and the service platform audits the content to be audited presented by the image to determine compliance of the service provider. Thus, in order to rapidly review the contents to be reviewed contained in the submitted image, it is necessary to extract text from the image. How to extract information from images quickly and effectively is a problem to be solved.

Based on this, the present specification provides an information recognition method for achieving the purpose of extracting text from an image.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an information identification method provided in the present specification.

S100: generating a target image as a training sample at least according to a target word pair consisting of target words with corresponding relations, and determining a first label of the training sample according to the corresponding relations among the target words in the target word pair.

In the embodiment of the present disclosure, an information recognition method is provided, and a training process of a recognition model is related, where the training process of the recognition model may be performed by an electronic device such as a server for training a model. The electronic device performing the recognition model training process may be the same as or different from the electronic device performing the information recognition method, which is not limited in this specification.

In practice, to ensure compliance of a service provider providing a service to a user, a service platform may conduct compliance audits on the service provider. The auditing content at least comprises information which can represent compliance qualification of the service provider for providing service for the user, such as business license, industry qualification certificate, door license and the like of the service provider, and of course, the service platform can audit a certificate (such as an invoice) which is reserved when the service provider provides service for the user.

Further, in order to simplify the auditing process, generally, the service provider may upload the content to be audited to the service platform in the form of an image, and the service platform audits the content to be audited presented by the image. However, since the images with the auditing contents and the accurate labels are fewer in number, it is difficult to train to obtain an identification model with high accuracy based on the images. Therefore, the recognition model with higher recognition accuracy in the field of recognizing the audit content can be obtained by adopting a mode of fine tuning the pre-trained natural language model based on a small number of images with accurate labels and containing the audit content. However, this method is not suitable for use with even small numbers of annotated images.

In view of the above, in the information recognition method provided in the present specification, a plurality of target words are selected from an existing target word stock, and a target image is generated at least from a target word pair composed of target words having a correspondence. The method for generating the target image according to the target word pair may be a method for filling the target word pair into a designated position on the background image, and the method for determining the designated position for filling the target word pair may be a method for randomly filling, may be a method for filling based on an existing template of the image containing the audit content, or may be a method for replacing at least one word pair in the original image containing the target word pair with the target word pair selected from the target word library, which is not limited in the specification.

The recognition model can be used for extracting structural information containing characters from an image to be recognized, for example, extracting text information of an invoice from the image of the invoice, such as extracting names, tax payer identification numbers and the like in the invoice. In general, the structured information contained in the image to be identified may be presented in pairs, such as a pair of target words in a document image, i.e., a "name" and a "Zhang Sano". For this reason, in generating the target image used as the training sample, it is possible to generate at least a target word pair composed of target words having correspondence.

In addition, in practical applications, the structured information contained in the image identified by the identification model does not necessarily have to appear in pairs entirely, that is, the image identified in practice may contain target words that appear alone. For example, the specification model in the invoice is an option, and there may be some cases where the invoice does not fill in the specification model, in which case the "specification model" in the image of the invoice does not have structural information corresponding thereto, and the "specification model" is a target word that appears separately. In order to make the structured information contained in the generated target image more approximate to the actual situation, in one or more embodiments of the present disclosure, the target image may be generated according to the target word pairs formed by the target words having the correspondence and the target words having no correspondence, so that the training samples adopted in the training process of the recognition model include the target word groups appearing in pairs and the target words appearing alone, which can be more approximate to the actual recognition scene, and improve the accuracy of the structured information in the recognition image.

In the embodiment of the present specification, the structured information that is recognized as a pair exists from the image is taken as one of training targets of the recognition model. Therefore, the corresponding relation among the target words in the target word pair can be used as a first label of the training sample. That is, there is a correspondence between the target words in the same target word pair, but there is no correspondence between the target words in the same target word pair. For example, the target image is a photograph of a certificate, the target words "name" and "Zhang San" are a target word pair, and the target words "sex" and "man" are a target word pair, and then, there may be a correspondence between "name" and "Zhang San", there may be a correspondence between "sex" and "man", but there may not be a correspondence between "sex" and "Zhang San", and accordingly, there is a low probability of a correspondence between "name" and "man".

S102: and identifying each character in the training sample, and determining coordinate information of each identified character in the target image.

Currently, compliance verification may be performed on service providers by extracting characters from images using optical character recognition (Optical Character Recognition, OCR) techniques and then parsing structured information from the extracted characters. However, the above scheme generally analyzes the structural information from the extracted characters in a manner of classifying each character to obtain the structural information of the phrase, that is, determining which words contained in the image according to the characters identified from the image.

However, in the scenario of the present embodiment, it may be more focused on extracting the word pairs having correspondence from the image. For example, there are two target words of "business scope" and "dining" in the image of the business license uploaded by the service provider, but if there is no correspondence between "business scope" and "dining" is recognized, it is impossible to extract structured information about whether "business scope" of the service provider is "dining" from the image of the business license uploaded by the service provider.

Based on this, the recognition model provided in the embodiment of the present specification may learn the ability to extract the word pairs having the correspondence relationship from the image further based on the condition that the characters are first extracted from the image by the optical character recognition (Optical Character Recognition, OCR) technique, the training sample and the first tag of the training sample determined in the above step S100.

Thus, in the embodiment of the present specification, each character included in the target word is still determined first. However, since the target word is selected from the target word library, which characters are contained in the target word can be directly determined, and since the target image is generated according to the target word, the specific position of the target word on the target image is determined in the generating process, and therefore, the coordinate information of the characters contained in the target word in the target image can also be directly determined. Of course, OCR techniques may also be used to identify characters in the training samples and to determine coordinate information for each character in the target image.

S104: and inputting the recognized characters and the coordinate information of the characters into a recognition model to be trained to obtain the prediction corresponding relation between the to-be-determined words formed by the characters and output by the recognition model.

Specifically, the recognition model may determine each pending word composed of each character through the input each character and coordinate information of each character, where one pending word may include a plurality of characters, and each character may be the same or different, and the characters may be any type of existing text characters, such as chinese characters, numbers, and the like, which is not limited in this specification.

Furthermore, the recognition model can extract the characteristics of each word to be determined in terms of semantics and the characteristics of coordinates, and accordingly whether the corresponding relation exists among each word to be determined is determined.

S106: and training the identification model by taking the difference between the prediction corresponding relation and the first label of the training sample as a training target.

S108: and responding to an identification request carrying an image to be identified, and identifying words contained in the image to be identified as an identification result through training the completed identification model.

In the practical application process, the method can be similar to the training process of the recognition model, the OCR technology can be adopted to extract characters and coordinate information of the characters from the image to be recognized, then the extracted characters and the coordinate information of the characters from the image to be recognized are input into the recognition model after training, the first feature extraction layer and the second feature extraction layer of the recognition model are used for obtaining the characters of each word and each word in the image to be recognized, and finally the classification layer is used for obtaining the corresponding relation among the words. Meanwhile, since the recognition model determines that the premise of determining the predicted correspondence between the words is to determine the words composed of the characters, the determined words may be output as the recognition model. Thus, each word and the correspondence between the words are used as the recognition result. Therefore, the recognition model trained by determining the first label of the training sample is capable of predicting the characters in the image to obtain the words in the image and the corresponding relation between the words, so that the structured information can be extracted from the image more quickly and efficiently.

According to the information identification method provided by the specification, the problem of insufficient training samples can be solved by generating the target image as the training samples based on the target word pairs, and the corresponding relation exists among the target words used for generating the training samples, so that the identification model after training can identify the corresponding relation among words contained in the image, and the accuracy and efficiency of extracting the structured information from the image are improved.

In this embodiment of the present disclosure, as shown in step S104 of fig. 1, the recognized characters and the coordinate information of the characters are input into a recognition model to be trained, so as to obtain the predicted correspondence between the to-be-defined words formed by the characters and output by the recognition model, which may be specifically implemented by the following steps, as shown in fig. 2:

s200: inputting each recognized character and the coordinates of each character into a recognition model to be trained, and obtaining the characteristics of each character through a first characteristic extraction layer of the recognition model.

As shown in fig. 3, the model structure of the recognition model in the embodiment of the present disclosure includes at least a first feature extraction layer, a second feature extraction layer, and a classification layer.

Specifically, the characters and the coordinate information of the characters in the target image are input into a first feature extraction layer of a recognition model, and the features of the characters are determined through the output of the first feature extraction layer, wherein the features of the characters can comprise semantic features and position features of the characters.

S202: and determining each pending word formed by each character according to the characteristics of each character.

Further, classifying each character according to its characteristics may be regarded as: and according to the semantic represented by the character and the position of the character on the target image, aggregating the semantic approximation represented by the character and each character with similar positions on the target image to obtain a plurality of pending words. The aggregation mode can be that the characteristics of each character are respectively input into a classification subnet in the recognition model, and the pending word to which each character belongs is determined according to the output of the classification subnet; the method can also be used for combining the characters two by two to determine the difference between the characteristics of the characters, and combining the characters with the difference smaller than a threshold value into a word to be defined. Of course, other existing polymerization methods are also possible, and the present specification is not limited thereto.

S204: and inputting the characteristics of each character contained in each pending word into a second characteristic extraction layer of the recognition model aiming at each pending word to obtain the characteristics of the pending word.

Specifically, the second feature extraction layer of the recognition model may determine the feature of the pending word according to the feature of each character belonging to the same pending word, and the second feature extraction layer may be a pooling layer, so that parameters and calculation amount are reduced while main features of each character are reserved in the pooling process, and overfitting is prevented.

S206: and determining the prediction corresponding relation among the undetermined words according to the characteristics of the undetermined words and the classification layer of the recognition model.

Further, the features of the undetermined words can represent the features of the undetermined words in terms of semantics and the features of the undetermined words in terms of positions on the target image, according to the features of the undetermined words, through the classification layer of the recognition model, whether the corresponding relationship exists among the undetermined words or not can be determined through the classification layer of the recognition model, and the output result of the classification layer is used as the prediction corresponding relationship among the undetermined words.

In one or more embodiments of the present disclosure, in determining, according to the characteristics of each pending word and the classification layer of the recognition model, as shown in step S206 in fig. 2, a prediction correspondence between each pending word and a pair of pending words may be determined by combining each pending word two by two to obtain a pair of each pending word, which specifically includes the following steps:

firstly, combining the undetermined words in pairs to determine the undetermined word pairs.

Specifically, one word to be defined can be selected from the words to be defined, then the selected word to be defined and each word to be defined except for the selected word to be defined form a word to be defined pair, and then each word to be defined is traversed to obtain a plurality of word to be defined pairs with different combinations.

Of course, in the embodiment of the present disclosure, only a scheme example of combining two to obtain a pair of pending words is given, that is, a scheme in which a pair of pending words only includes two pending words, and of course, it is also possible to determine a pair of pending words including more than two pending words, and further determine a correspondence between a plurality of pending words included in the pair of pending words, which is not limited in this disclosure.

Secondly, aiming at each pending word pair, splicing the characteristics of each pending word contained in the pending word pair to obtain the target characteristics of the pending word pair.

The splicing mode of the features of each pending word can be any existing splicing scheme of the features, and the specification does not limit the scheme.

And then, inputting the target characteristics of the word pair to be determined into the classification layer to obtain the prediction corresponding relation among the words to be determined contained in the word pair to be determined output by the classification layer.

Specifically, the classification layer may be a classification network for performing two classifications, input the target features of the pairs of pending words into the classification layer, and determine whether there is a correspondence between each pair of pending words in the pairs of pending words according to the output result of the classification layer; of course, the method can also be a classification network for performing multi-classification, and if the probability is high, the probability that the corresponding relation exists between each pair of the pending words is high.

In one or more embodiments of the present disclosure, the recognition model trained as shown in steps S100 to S106 in fig. 1 may use, as a training task, not only the words in the recognition image and the correspondence between the words, but also the coordinate information of the determined words in the image as another training task, and specifically includes the following steps:

the first step: and determining coordinate information of each target word in the target image to serve as a second label of the training sample.

The coordinate information of each target word in the target image may be determined according to the generation process of the training sample, or the target image serving as the training sample may be segmented to redetermine the coordinate information of each target word in the target image.

In addition, in general, a target word may include a plurality of characters, and for a target word including a plurality of characters, the coordinate information of each character on the target image is used to represent the coordinate information of the target word on the target image, which is complex and is not beneficial to training of the recognition model.

Specifically, for each target word, a first designated character may be selected from the characters included in the target word, where a selection scheme of the first designated character may be preset, for example, a first character from left to right in the target word is taken as the first designated character, which is not limited in this specification.

And determining the coordinate information of the target word in the target image according to the coordinate information of the first designated character. In general, the coordinate information of the first designated character on the target image may be directly used as the coordinate information of the target word in the target image, or of course, other schemes may be adopted, for example, the coordinate information of each first designated character on the target image included in each target word is rearranged, for example, three target words in total, the coordinate information of the first designated character of each target word is originally (0.9,1.2), (0.8,2.5) and (3,1.1), and then the coordinate information of each target word may be normalized to obtain (1, 1), (1, 2) and (2, 1) after rearrangement. Of course, other existing alternatives are possible, and this is not a limitation of the present specification.

And a second step of: and determining the predicted coordinate information of each pending word in the target image according to the characters contained in each pending word and the coordinate information of each character in the target image.

Specifically, similar to the determination of the second tag in the above step, the second specified character may be selected from the characters included in the word to be determined, and the predicted coordinate information of the word to be determined may be determined using the coordinate information of the second specified character. The selection scheme of the first designated character is generally the same as the selection scheme of the second designated character, but may be adaptively adjusted according to a specific application scenario, which is not limited in this specification.

And a third step of: training the recognition model with the difference between the predicted correspondence and the first label of the training sample minimized and the difference between the predicted coordinate information and the second label of the training sample minimized as training targets.

In this way, the training process of the recognition model is a multi-task learning process, and the training target includes that the difference between the prediction correspondence and the first label of the training sample is minimized, and the difference between the prediction coordinate information and the second label of the training sample is minimized. The trained recognition model is then able to learn the recognition of words from the image, the correspondence between words, and the ability of the words to coordinate information in the image.

In an optional embodiment of the present disclosure, when generating the target image as the training sample according to the target word pairs formed by the target words having the correspondence as shown in step S100 of fig. 1, in order to increase the number of training samples, the types of the acquired target words may not be limited, but in order to further improve the accuracy of the recognition model in recognizing the image including the structural information of a specific field, in an optional embodiment of the present disclosure, before generating the target image, the target type of the image to be recognized by the recognition model to be trained may be determined, after that, the word corresponding to the target type is selected from the target word library as the target word, and further, the training sample is generated based on the target word corresponding to the selected target type. Based on the method, the recognition model can learn the capability of recognizing the structured information of the target type from the training sample, so that the trained recognition model can extract the structured information more accurately when recognizing the image containing the structured information of the target type.

In an alternative embodiment of the present specification, in generating the target image according to each target word as shown in step S100 of fig. 1, in order to generate a large number of target images for a training sample as a recognition model by a limited number of target words, the following scheme may be adopted, as shown in fig. 4:

S300: a background image is acquired.

Specifically, the background image may be a blank image with a preset size, that is, an image that does not contain any target word, and of course, the background image may also be an image with a preset size that contains the target word. There may be different ways of filling the target word in the background image for different types of background images. For example, for a background image that does not contain any target words, after determining the position of the target words on the background image, the background image may be directly filled in; and for the background image containing the target words, the original target words on the background image can be replaced with the target words which need to be filled in the cost scheme.

Therefore, different types of background images correspond to different filling modes of target words, and the two types of information can be determined in advance according to specific application scenes, so that the specification is not limited.

S302: and determining the positions of the target word pairs with the corresponding relation and the target words without the corresponding relation on the background image respectively.

Specifically, in practical application, because target words which do not appear in pairs may exist in the image to be recognized as the recognition object of the recognition model, in order to enable the recognition model to better adapt to the practical application scene, a mode that a target word pair with a corresponding relation and a mode that target words without a corresponding relation jointly generate a target image are adopted when a training sample is generated, and the generated target image may include the target word pair which appears in pairs and also include the target words which do not appear in pairs.

S304: and filling each target word pair and each target word into the background image according to the positions of the target word pair and the target word on the background image respectively to obtain a first image.

Similar to the step S300, the target word pair and the positions of the target words on the background images may have different position determining schemes according to different types of the background images, which are not described herein.

S306: and deforming the first image to a plurality of different angles to obtain a plurality of second images.

In practical applications, there may be situations in which the shooting angle of the image uploaded by the service provider on the service platform is skewed. In order to make the recognition model have recognition robustness on images with skewed shooting angles, and further increase the body volume of a training sample, the first image can be deformed to a plurality of different angles to obtain a plurality of second images. Of course, the deformation of the first image by multiple angles is only an alternative scheme in the embodiment of the present specification, and the image processing such as noise adding, shielding or light ray transformation may be performed on the first image to simulate different situations of the image to be identified by the identification model in practical application, which is not specifically limited in this specification.

S308: and taking the first image and each second image as the target image.

Fig. 5 is a schematic diagram of an information identifying apparatus provided in the present specification, specifically including:

the generating module 400 is configured to generate a target image as a training sample according to at least a target word pair composed of target words having a correspondence, and determine a first tag of the training sample according to the correspondence between the target words in the target word pair;

a first recognition module 402, configured to recognize each character in the training sample, and determine coordinate information of the recognized each character in the target image;

the prediction correspondence determining module 404 is configured to input each recognized character and coordinate information of each character into a recognition model to be trained, so as to obtain each pending word composed of each character and a prediction correspondence between each pending word output by the recognition model;

a training module 406, configured to train the recognition model with a difference between the prediction correspondence and the first label of the training sample minimized as a training target;

the second recognition module 408 is configured to recognize, in response to a recognition request carrying an image to be recognized, a word included in the image to be recognized as a recognition result through training a completed recognition model.

Optionally, the prediction correspondence determining module 404 is specifically configured to input each identified character and coordinates of each character into an identification model to be trained, and obtain, through a first feature extraction layer of the identification model, a feature of each character; determining each pending word formed by each character according to the characteristics of each character; inputting the characteristics of each character contained in each word to be determined into a second characteristic extraction layer of the recognition model aiming at each word to be determined to obtain the characteristics of the word to be determined; and determining the prediction corresponding relation among the undetermined words according to the characteristics of the undetermined words and the classification layer of the recognition model.

Optionally, the prediction correspondence determining module 404 is specifically configured to combine the pending words two by two to determine each pending word pair; aiming at each pending word pair, splicing the characteristics of each pending word contained in the pending word pair to obtain target characteristics of the pending word pair; and inputting the target characteristics of the word pairs to be determined into the classification layer to obtain the prediction corresponding relation among the words to be determined contained in the word pairs to be determined output by the classification layer.

Optionally, the second recognition module 408 is specifically configured to input the image to be recognized into a pre-trained character recognition model, so as to obtain each character included in the image to be recognized and output by the character recognition model, and coordinate information of each character in the image to be recognized; and inputting each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized into a recognition model after training is completed, and obtaining the corresponding relation among words contained in the image to be recognized output by the recognition model as a recognition result.

Optionally, the apparatus further comprises:

the predicted coordinate information determining module 410 is specifically configured to determine coordinate information of each target word in the target image, as a second label of the training sample; according to the characters contained in each word to be determined and the coordinate information of each character in the target image, determining the predicted coordinate information of each word to be determined in the target image;

the training module 406 is specifically configured to train the recognition model with a difference between the predicted correspondence and the first label of the training sample minimized, and a difference between the predicted coordinate information and the second label of the training sample minimized as a training target.

Optionally, the second recognition module 408 is specifically configured to input the image to be recognized into a pre-trained character recognition model, so as to obtain each character included in the image to be recognized and output by the character recognition model, and coordinate information of each character in the image to be recognized; and inputting each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized into a recognition model after training is completed, and obtaining the coordinate information of each word contained in the image to be recognized and the corresponding relation among the words output by the recognition model as a recognition result.

Optionally, the predicted coordinate information determining module 410 is specifically configured to, for each target word, select a first specified character from the characters contained in the target word; and determining the coordinate information of the target word in the target image according to the coordinate information of the first designated character.

Optionally, the predicted coordinate information determining module 410 is specifically configured to select, for each word to be determined, a second specified character from the characters contained in the word to be determined; and determining the predicted coordinate information of the undetermined word in the target image according to the coordinate information of the second designated character in the target image.

Optionally, the apparatus further comprises:

the target word determining module 412 is specifically configured to determine a target type of an image to be identified by the identification model to be trained; and screening words corresponding to the target type from a target word stock to serve as target words.

Optionally, the generating module 400 is specifically configured to acquire a background image; determining target word pairs with corresponding relations and positions of a plurality of target words without corresponding relations on the background image respectively; filling each target word pair and each target word into the background image according to the positions of the target word pair and the target word on the background image respectively to obtain a first image; deforming the first image to a plurality of different angles to obtain a plurality of second images; and taking the first image and each second image as the target image.

The present specification also provides a computer-readable storage medium storing a computer program operable to execute the information identifying method shown in fig. 1 described above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 6. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 6, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the information identification method shown in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. An information identification method, the method comprising:

2. The method of claim 1, wherein the step of inputting the recognized characters and the coordinates of the characters into a recognition model to be trained to obtain the predicted correspondence between the to-be-defined words composed of the characters output by the recognition model, specifically comprises the following steps:

inputting each recognized character and the coordinates of each character into a recognition model to be trained, and obtaining the characteristics of each character through a first characteristic extraction layer of the recognition model;

determining each pending word formed by each character according to the characteristics of each character;

inputting the characteristics of each character contained in each word to be determined into a second characteristic extraction layer of the recognition model aiming at each word to be determined to obtain the characteristics of the word to be determined;

And determining the prediction corresponding relation among the undetermined words according to the characteristics of the undetermined words and the classification layer of the recognition model.

3. The method of claim 2, wherein determining the predicted correspondence between the pending words according to the feature of the pending words and the classification layer of the recognition model specifically comprises:

combining the undetermined words in pairs to determine the undetermined word pairs;

aiming at each pending word pair, splicing the characteristics of each pending word contained in the pending word pair to obtain target characteristics of the pending word pair;

and inputting the target characteristics of the word pairs to be determined into the classification layer to obtain the prediction corresponding relation among the words to be determined contained in the word pairs to be determined output by the classification layer.

4. The method according to claim 1, wherein the recognition of the words contained in the image to be recognized as the recognition result by training the completed recognition model specifically comprises:

inputting the image to be recognized into a pre-trained character recognition model to obtain each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized, which are output by the character recognition model;

and inputting each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized into a recognition model after training is completed, and obtaining the corresponding relation among words contained in the image to be recognized output by the recognition model as a recognition result.

5. The method of claim 1, the method further comprising:

determining coordinate information of each target word in the target image, and taking the coordinate information as a second label of the training sample;

according to the characters contained in each word to be determined and the coordinate information of each character in the target image, determining the predicted coordinate information of each word to be determined in the target image;

training the recognition model by taking the difference between the prediction corresponding relation and the first label of the training sample as a training target, wherein the training method specifically comprises the following steps:

training the recognition model with the difference between the predicted correspondence and the first label of the training sample minimized and the difference between the predicted coordinate information and the second label of the training sample minimized as training targets.

6. The method according to claim 5, wherein the recognition of the words contained in the image to be recognized as the recognition result by training the completed recognition model specifically comprises:

And inputting each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized into a recognition model after training is completed, and obtaining the coordinate information of each word contained in the image to be recognized and the corresponding relation among the words output by the recognition model as a recognition result.

7. The method of claim 5, determining coordinate information of each target word in the target image, specifically comprising:

for each target word, selecting a first designated character from the characters contained in the target word;

and determining the coordinate information of the target word in the target image according to the coordinate information of the first designated character.

8. The method of claim 5, wherein determining the predicted coordinate information of each pending word in the target image according to the character included in each pending word and the coordinate information of each character in the target image specifically includes:

selecting a second specified character from the characters contained in each word to be specified for each word to be specified;

and determining the predicted coordinate information of the undetermined word in the target image according to the coordinate information of the second designated character in the target image.

9. The method of claim 1, further comprising, prior to generating the target image, at least from the target word pairs of target words for which correspondence exists:

determining the target type of an image to be recognized, which is to be recognized by a recognition model to be trained;

and screening words corresponding to the target type from a target word stock to serve as target words.

10. The method according to claim 1, generating a target image at least according to a target word pair composed of target words having a correspondence relation, specifically comprising:

acquiring a background image;

determining target word pairs with corresponding relations and positions of a plurality of target words without corresponding relations on the background image respectively;

filling each target word pair and each target word into the background image according to the positions of the target word pair and the target word on the background image respectively to obtain a first image;

deforming the first image to a plurality of different angles to obtain a plurality of second images;

and taking the first image and each second image as the target image.

11. An information identifying apparatus, comprising:

12. The device of claim 11, wherein the prediction correspondence determining module is specifically configured to input each recognized character and coordinates of each character into a recognition model to be trained, and obtain features of each character through a first feature extraction layer of the recognition model; determining each pending word formed by each character according to the characteristics of each character; inputting the characteristics of each character contained in each word to be determined into a second characteristic extraction layer of the recognition model aiming at each word to be determined to obtain the characteristics of the word to be determined; and determining the prediction corresponding relation among the undetermined words according to the characteristics of the undetermined words and the classification layer of the recognition model.

13. The apparatus of claim 12, wherein the prediction correspondence determining module is specifically configured to combine the pending words two by two to determine each pending word pair; aiming at each pending word pair, splicing the characteristics of each pending word contained in the pending word pair to obtain target characteristics of the pending word pair; and inputting the target characteristics of the word pairs to be determined into the classification layer to obtain the prediction corresponding relation among the words to be determined contained in the word pairs to be determined output by the classification layer.

14. The device of claim 11, wherein the second recognition module is specifically configured to input the image to be recognized into a pre-trained character recognition model, so as to obtain each character included in the image to be recognized output by the character recognition model, and coordinate information of each character in the image to be recognized; and inputting each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized into a recognition model after training is completed, and obtaining the corresponding relation among words contained in the image to be recognized output by the recognition model as a recognition result.

15. The apparatus of claim 11, the apparatus further comprising:

The predicted coordinate information determining module is specifically configured to determine coordinate information of each target word in the target image, and use the coordinate information as a second label of the training sample; according to the characters contained in each word to be determined and the coordinate information of each character in the target image, determining the predicted coordinate information of each word to be determined in the target image;

the training module is specifically configured to train the recognition model with a minimized difference between the predicted correspondence and the first label of the training sample and a minimized difference between the predicted coordinate information and the second label of the training sample as a training target.

16. The apparatus of claim 15, wherein the second recognition module is specifically configured to input the image to be recognized into a pre-trained character recognition model, so as to obtain each character included in the image to be recognized output by the character recognition model, and coordinate information of each character in the image to be recognized; and inputting each character contained in the image to be recognized and the coordinate information of each character in the image to be recognized into a recognition model after training is completed, and obtaining the coordinate information of each word contained in the image to be recognized and the corresponding relation among the words output by the recognition model as a recognition result.

17. The apparatus of claim 15, wherein the predicted coordinate information determining module is specifically configured to, for each target word, select a first specified character from characters included in the target word; and determining the coordinate information of the target word in the target image according to the coordinate information of the first designated character.

18. The apparatus of claim 11, the generating module specifically configured to obtain a background image; determining target word pairs with corresponding relations and positions of a plurality of target words without corresponding relations on the background image respectively; filling each target word pair and each target word into the background image according to the positions of the target word pair and the target word on the background image respectively to obtain a first image; deforming the first image to a plurality of different angles to obtain a plurality of second images; and taking the first image and each second image as the target image.

19. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the preceding claims 1 to 10.

20. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-10 when executing the program.