CN112101367A

CN112101367A - Text recognition method, image recognition and classification method and document recognition processing method

Info

Publication number: CN112101367A
Application number: CN202010968750.3A
Authority: CN
Inventors: 徐青松; 李青
Original assignee: Hangzhou Glority Software Ltd
Current assignee: Hangzhou Glority Software Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-18
Also published as: WO2022057707A1

Abstract

The invention provides a text recognition method, an image recognition and classification method and a document recognition processing method. When the text is identified, firstly, the text line in the text to be identified in the text image is labeled by using a text line frame, then, the character identification model is adopted to identify each text line to obtain a preliminary identification result of the text to be identified, then, the language type of the preliminary identification result is identified, and the corresponding language identification model is called according to the identified language type to further identify the character part corresponding to the language type to obtain an optimized character identification result. According to the embodiment, after the initial recognition result of the text to be recognized is obtained, the independent language recognition model is adopted for accurate recognition according to the language type involved in the initial recognition result, so that the accuracy of text recognition is improved.

Description

Text recognition method, image recognition and classification method and document recognition processing method

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a text recognition method, an image recognition and classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.

Background

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

During OCR, characters in a document can be recognized by adopting a recognition model, however, documents in different languages cannot be recognized by using the same model, and it is necessary to know which language the document is in to call the corresponding recognition model, and if the document is a mixed language, the document is more difficult to recognize, so that the problem that the text recognition accuracy is low for the documents in different languages by using the existing OCR recognition technology is seen.

In addition, the problem that the identified documents cannot be effectively classified exists, so that the identified documents are messy to manage and are inconvenient to find; due to the fact that the document to be recognized has the problems of curve radian and the like, the situation that the typesetting after recognition is inconsistent with the original document occurs, and even messy codes occur.

Disclosure of Invention

The invention aims to provide a text recognition method, an image recognition and classification method, a document recognition processing method, an electronic device and a computer-readable storage medium. The specific technical scheme is as follows:

in order to achieve the above object, the present invention provides a text recognition method, including:

recognizing text lines in a text to be recognized in a text image, and labeling each text line with a general text line frame;

identifying characters in each text line by adopting a character identification model to obtain a preliminary identification result of the text to be identified;

performing language recognition on the preliminary recognition result by adopting a language classification model, acquiring a language type related in the preliminary recognition result, and dividing the preliminary recognition result into a plurality of different character parts according to the language type;

and calling a corresponding language recognition model according to the language type, and recognizing the corresponding character part to obtain a target recognition result of the text to be recognized.

Optionally, in the text recognition method, the method further includes: identifying the direction of a text to be identified in a text image, and if the direction does not accord with a preset condition, correcting the direction of the text to be identified;

wherein, the direction of the text to be recognized in the text image is recognized, including:

and identifying the direction of the text to be identified in the text image by adopting a direction identification model, wherein the direction identification model is a neural network model based on CNN.

Optionally, in the text recognition method, the character recognition model is a neural network model based on a CTC join-sense time classification technique and an Attention mechanism.

Optionally, in the text recognition method, the character recognition model is trained by using a training sample set including a CJK character set and an ISO 88591-16 character set.

Optionally, in the text recognition method, the language classification model is a fasttext < N-Gram > language classification model based on the wiki dataset.

Based on the same inventive concept, the invention also provides an image identification and classification method, which comprises the following steps:

identifying the image to be classified by adopting an image identification model, and identifying a text image or a non-text image;

recognizing the text in the text type image or the non-text type image by adopting the text recognition method to obtain a text recognition result of the text type image or the non-text type image;

determining a keyword according to the text recognition result, determining a first subdivision type of the content of the text type image or a second subdivision type of the content of the non-text type image according to the keyword, classifying the text type image into a folder corresponding to the first subdivision type, and classifying the non-text type image into a folder corresponding to the second subdivision type.

Optionally, in the image recognition and classification method, after determining the keyword, the method further includes:

and automatically naming the text type image or the non-text type image by utilizing the keywords.

Optionally, in the image identification and classification method, after identifying the text-type image or the non-text-type image, the method further includes:

classifying the text type images into a text type image folder, and classifying the non-text type images into a non-text type image folder;

correspondingly, the classifying the text-class images into the folder corresponding to the first subdivision type and the classifying the non-text-class images into the folder corresponding to the second subdivision type includes:

classifying the text type images in the text type image folder into a folder corresponding to the first subdivision type, and classifying the non-text type images in the non-text type image folder into a folder corresponding to the second subdivision type.

Optionally, in the above image recognition and classification method, the first subdivision type includes: one or more of notes, certificates, receipts, screenshots, documents, certificates.

Optionally, in the image recognition and classification method, for the recognized non-text type image, the image recognition model recognizes content in the non-text type image;

the method further comprises the following steps:

and determining the second subdivision type according to the content of the non-text type image, and classifying the non-text type image into a folder corresponding to the second subdivision type.

Optionally, in the image recognition and classification method, after recognizing the content in the non-text image, the method further includes:

and automatically naming the non-text images according to the contents in the non-text images.

Optionally, in the image recognition and classification method, after classifying the text class image in the text class image folder into a folder corresponding to the first subdivision type, the method further includes:

responding to the operation of inputting a search word by a user, searching whether a keyword matched with the search word exists, and if so, outputting a text image corresponding to the matched keyword.

and in response to the printing operation of the user, importing all text type images in the folder corresponding to the first subdivision type for printing according to a one-key import function configured in advance.

Optionally, in the image recognition and classification method, before performing printing, the method further includes:

if the text images needing to be signed exist in all the imported text images, signing in a signature area preset in the text images needing to be signed;

and/or if the text images with the defects exist in all the imported text images, performing filter processing on the text images with the defects.

Based on the same inventive concept, the invention also provides a document identification processing method, which comprises the following steps:

acquiring an input image, wherein the input image comprises an original document to be identified;

recognizing the original document in the input image by adopting the text recognition method to obtain a character recognition result of the original document;

and arranging the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognition document.

Optionally, in the document identification processing method, arranging the character identification result of the original document according to the position information of each character of the original document in the input image to obtain an identification document, where the method includes:

and replacing the original text in the original document with the character recognition result of the original document according to the position information of each character of the original document in the input image to obtain a recognition document.

Optionally, in the document identification processing method, after obtaining the identification document, the method further includes:

comparing the original document with the identification document, judging whether the identification document and the original document have a difference point, and if so, correcting the difference point in the identification document.

Optionally, in the document identification processing method, before identifying the input image, the method further includes:

and identifying the curve radian of the original document in the input image by adopting a correction model, and if the curve radian meets a preset correction condition, correcting the original document in the input image to remove the curve radian of the original document.

identifying the input image by adopting an annotation identification model so as to identify the annotation content in the original document;

and typesetting the character recognition result corresponding to the marked content into a format consistent with the original document in the recognition document.

The invention also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory finish mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the steps of the text recognition method as described above, or implement the steps of the image recognition classification method as described above, or implement the steps of the document recognition processing method as described above, when executing the program stored in the memory.

The present invention also provides a computer readable storage medium having stored thereon instructions which, when executed, implement the steps in a text recognition method as described above, or implement the steps in an image recognition classification method as described above, or implement the steps in a document recognition processing method as described above.

Compared with the prior art, the text recognition method, the image recognition and classification method, the document recognition processing method, the electronic equipment and the computer readable storage medium provided by the invention have the following advantages:

when the text recognition is carried out, firstly, marking the text rows in the text to be recognized by using the text row frames, then, recognizing each text row by adopting a character recognition model to obtain a preliminary recognition result of the text to be recognized, then, carrying out language type recognition on the preliminary recognition result, and calling a corresponding language recognition model according to the recognized language type to further recognize the character part corresponding to the language type to obtain an optimized character recognition result. According to the embodiment, after the initial recognition result of the text to be recognized is obtained, the independent language recognition model is adopted for accurate recognition according to the language type involved in the initial recognition result, so that the accuracy of text recognition is improved.

According to the image identification and classification method, the OCR text identification method can be adopted for text identification on both the text image and the non-text image to obtain the text identification results of the text image and the non-text image, and the keywords are determined according to the text identification results to classify the text image and the non-text image. In addition, the non-text images can be classified by adopting the image content, and the accuracy of the classification result is also improved.

The document identification processing method, the corresponding electronic equipment and the computer readable storage medium provided by the invention have the advantages that the document to be identified in the input image is identified by adopting an OCR text identification method, so that the identified document is obtained, the non-editable document is converted into the editable document, convenience is provided for obtaining the document by subsequently adopting keyword search in the document, and the rapid search of the document is realized. In addition, by correcting the radian of the input image and identifying and adjusting the fonts marked and quoted in the document, errors in the process of converting the document to be identified in the input image into the editable electronic text are reduced, and the conversion accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a text recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image recognition and classification method according to an embodiment of the present invention;

FIG. 3 is an exemplary diagram of an image recognition classification presentation;

FIG. 4 is a flowchart illustrating a document identification processing method according to an embodiment of the invention;

FIG. 5a is a diagram of an example of an input image containing an original document;

FIG. 5b is an exemplary diagram of an identified document resulting from the identification of the input image shown in FIG. 5a using the method of the present invention;

FIG. 6a is another exemplary diagram of an input image containing an original document;

FIG. 6b is an exemplary diagram of a recognized document obtained by recognizing the input image shown in FIG. 6a using a conventional method;

FIG. 6c is an exemplary diagram of an identified document resulting from the identification of the input image shown in FIG. 6a using the method of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following describes a text recognition method, an image recognition and classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium according to the present invention in detail with reference to the accompanying drawings and specific embodiments. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.

In order to solve the problems in the prior art, the invention provides a text recognition method. Fig. 1 shows a flowchart of a text recognition method according to an exemplary embodiment of the present invention, which may be implemented in an application (app) installed on a smart terminal such as a mobile phone, a tablet computer, or the like. As shown in fig. 1, the method may include:

step S101, identifying text lines in a text to be identified in a text image, and labeling each text line with a universal text line frame.

In the present invention, a text image refers to an image whose image content is mainly text, for example: the image processing system comprises a name card image, a receipt image, a certificate image and a note image, wherein the name card image, the receipt image, the certificate image and the note image can be images obtained by photographing texts or images obtained by scanning the texts. For example, the note image may be an image obtained by photographing the content of handwritten characters on the paper.

Generally speaking, the text to be recognized in the text image comprises one or more text lines, the invention adopts a character OCR recognition method to perform text recognition, when in recognition, each text line is recognized respectively, and finally, the recognition result of the whole text to be recognized is obtained by combining the recognition results of all the text lines. Therefore, during recognition, it is necessary to recognize each text line in the text to be recognized in the text image, and label each text line with a general text line box.

It should be noted that, when recognizing text lines, the language in the text lines is not limited, but only the text lines are processed, that is, when characters in one text line have multiple language types, the characters are marked in the same general text line box as long as the characters are located in the same text line.

It should be noted that there may be a plurality of documents in one picture, for example, there are front and back sides of an identification card in one text image, and the two documents need to be identified separately, so before step S101 is executed, a document region in the text image (i.e., a region where a text to be identified is located) may also be identified, and the document region is sliced, for example, the document region may be sliced through a label box, or an edge of the document region may be identified through an edge identification method and then sliced according to the edge.

Preferably, before performing step S101, the method further includes: and identifying the direction of the text to be identified in the text image, and if the direction does not accord with the preset condition, correcting the direction of the text to be identified.

It can be understood that, before the text to be recognized in the text image is recognized, it is required to ensure that the direction of the text to be recognized in the text image satisfies a preset condition, for example, ensure that characters in a text line of the text to be recognized are arranged along a certain reference direction in the text image. Therefore, the direction of the text to be recognized in the text image needs to be corrected first. Specifically, a direction recognition model may be used to recognize the direction of the text to be recognized in the text image, and the direction recognition model may be a CNN-based neural network model.

The reference direction may be set to a positive direction in the horizontal direction. The direction recognition model can recognize the included angle between the arrangement direction of the characters in the text line and the positive direction of the horizontal direction in the text image, if the included angle is 0, correction is not needed, and if the included angle is not 0, correction processing needs to be carried out on the text image. The correction processing method is specifically to turn over the text image so that an included angle between characters in a text line of the text to be recognized and a positive direction of a horizontal direction in the text image is 0. In this embodiment, the direction to the right along the horizontal direction may be considered as a horizontal positive direction, and in other embodiments, other directions may be set as positive directions, which is not limited in the present invention.

The method of the correction processing may also use an average slope according to a plurality of text lines as a correction reference, or use other correction methods, which is not limited by the present invention.

And S102, recognizing the characters in each text line by adopting a character recognition model to obtain a preliminary recognition result of the text to be recognized.

In this embodiment, the character recognition model is an All in one model, which is obtained by training a plurality of character sets, such as CJK character set and ISO 88591-16 character set, and therefore the character recognition model can support recognition of CJK and latin-based fonts. The character recognition model is a neural network model based on CTC (China traffic control) associative semantic time classification technology and Attention mechanism. And respectively inputting each text line into the character recognition model, wherein the character recognition model can output a character recognition result of the text line, and then the character recognition result of the text to be recognized can be obtained by combining the character recognition results of the text lines and is used as a primary recognition result.

Connection-oriented Temporal Classification (CTC) is a time-series Classification algorithm without strict alignment information between a data unit and a labeling unit, which is widely used in Optical Character Recognition (OCR) and speech recognition at present, and the CTC model mainly functions to construct a loss function for a sequence and transmit a gradient determined according to the loss function back to a previous layer in a back propagation process to complete training of the CTC model.

The Attention mechanism has a great promotion effect on a sequence learning task, and in a coder-decoder framework, a model A is added in a coding section to perform data weighted transformation on a source data sequence, or the model A is introduced at a decoding end to perform weighted change on target data, so that the system performance of the sequence in a natural mode on the sequence can be effectively improved.

The invention adopts CTC (character-to-character) connection meaning time classification technology and Attention mechanism to construct a character recognition model, and can improve the accuracy of character recognition.

Step S103, performing language identification on the preliminary identification result by adopting a language classification model, acquiring the language type related in the preliminary identification result, and dividing the preliminary identification result into a plurality of different character parts according to the language type.

Since the character recognition model adopted in step S102 is obtained by training a plurality of character sets of different languages, the accuracy of the character recognition model on the character recognition result in the text line is not high, and therefore the preliminary recognition result needs to be optimized, and characters of different languages in the character are further recognized respectively, so as to improve the accuracy of character recognition.

Firstly, performing language recognition on the preliminary recognition result by adopting a language classification model to acquire a language type related to the preliminary recognition result, wherein language type (namely language type) recognition can be performed by adopting a langid technology, and the language classification model is a fasttext < N-Gram > language classification model based on a wiki data set.

The fasttext is a word vector and text classification tool, a typical application scenario is 'text classification problem with supervision', a simple and efficient text classification and characterization learning method is provided, and the performance is higher than that of shoulder deep learning and the speed is higher.

The N-Gram is a Language Model commonly used in large vocabulary continuous Language recognition, and for the Chinese, it can be called Chinese Language Model (CLM), and it can realize automatic conversion to Chinese characters by using the collocation information between adjacent words in the context. Specifically, when continuous blank-free pinyin, strokes or numbers representing letters or strokes are required to be converted into a Chinese character string (i.e., sentence) by utilizing collocation information between adjacent words in the context, the sentence with the maximum probability can be calculated, so that automatic conversion to Chinese characters is realized, manual selection by a user is not required, and the problem of repeated codes of a plurality of Chinese characters corresponding to the same pinyin (or stroke string or number string) is avoided.

The language classification model is adopted to perform language identification on the preliminary identification result, so that the language type related to the preliminary identification result can be more accurately obtained. After the language type is recognized, the preliminary recognition result may be divided into a plurality of different character parts, i.e., characters of each language type are divided into the same character part.

And step S104, calling a corresponding language recognition model according to the language type, and recognizing a corresponding character part to obtain a target recognition result of the text to be recognized.

In this embodiment, each language type has a corresponding language identification model, and after the language type related to the text to be identified and the character part corresponding to each language type are obtained in step S103, the corresponding language identification model is called to identify the corresponding character part, so that a more accurate character identification result of each character part can be obtained, and further, a target identification result of the text to be identified is obtained.

In summary, in the text recognition method provided by the present invention, when performing text recognition, a text line in a text to be recognized is labeled by a general text line frame, then a character recognition model is used to recognize each text line, so as to obtain a preliminary recognition result of the text to be recognized, then a language type is recognized for the preliminary recognition result, and a corresponding language recognition model is called according to the recognized language type to further recognize a character part corresponding to the language type, so as to obtain an optimized character recognition result. According to the embodiment, after the initial recognition result of the text to be recognized is obtained, the independent language recognition model is adopted for accurate recognition according to the language type involved in the initial recognition result, so that the accuracy of text recognition is improved.

On the basis of the text recognition method, the invention also provides an image recognition and classification method which is used for classifying and sorting a large number of images and classifying the images with similar contents in the same folder so as to facilitate the user to look up and search.

As shown in fig. 2, the image recognition and classification method includes the following steps:

step S201, an image recognition model is adopted to recognize the image to be classified, and a text image or a non-text image is recognized.

In this embodiment, the image to be classified may be a newly shot image, or may be an image that has been shot and saved in a folder, for example, an image saved in a mobile phone album. Text-type images refer to images whose image content is mainly text, for example: the image processing system comprises a name card image, a receipt image, a certificate image and a note image, wherein the name card image, the receipt image, the certificate image and the note image can be images obtained by photographing texts or images obtained by scanning the texts. For example, the note image may be an image obtained by photographing the content of handwritten characters on the paper. The non-text type image refers to an image with non-character-oriented image content, such as a life photograph, a landscape photograph, an animal and plant photograph, and the like.

The image to be classified is identified through the image identification model, and whether the image to be classified belongs to a text image or a non-text image can be identified, so that the text image and the non-text image can be classified.

After the text images and the non-text images are identified and classified, the images are automatically classified and stored into different preset folders. That is, after a text-class image is recognized, the text-class image is classified into a text-class image folder, and after a non-text-class image is recognized, the non-text-class image is classified into a non-text-class folder.

Step S202, recognizing the text in the text type image or the non-text type image to obtain the text recognition result of the text type image or the non-text type image.

Specifically, the text recognition method shown in fig. 1 may be used to recognize the text in the text-type image or the non-text-type image. The specific identification process is not described herein. And classifying different pictures according to the language type of the text recognition result.

Step S203, determining a keyword according to the text recognition result, determining a first subdivision type of the content of the text type image or a second subdivision type of the non-text type image according to the keyword, classifying the text type image into a folder corresponding to the first subdivision type, and classifying the non-text type image into a folder corresponding to the second subdivision type.

Specifically, a keyword classification model may be used to obtain keywords from the text recognition result, and then a first subdivision type of the content of the text-based image or a second subdivision type of the content of the non-text-based image is determined according to the keywords, so that the text-based image is classified into a folder corresponding to the first subdivision type, and the non-text-based image is classified into a folder corresponding to the second subdivision type.

The first subdivision type includes: one or more of notes, certificates, receipts, screenshots, documents, certificates, but not limited thereto.

For example, the text-type image is an identification card image, and the text recognition result includes characters such as "identity card of resident of the people's republic of china", and the like, the keyword classification model may obtain the keyword "identity card" from the text recognition result, so that the first subdivision type of the content of the text-type image may be determined as the identification image according to the keyword, and the text-type image may be classified into a folder of the subdivision type "identification image".

The subdivided type of "document image" may be further divided, and may be further divided into a plurality of specific types including, for example, identity cards, driver licenses, passports, military photographs, work licenses, birth certificates, family books, and the like. Therefore, the specific type of the text type image can be determined according to the keywords, and the text type image is further classified into a subfolder of the specific type under the folder corresponding to the first subdivision type. For example, for the text-type image in the foregoing example, the keyword classification model may obtain the keyword "id card" from the text recognition result, and then the text-type image may be further classified into a subfolder of a specific type of "id card" under a folder of a subdivided type of "document image". It is understood that a sub-divided type of folder "document image" may have disposed beneath it a plurality of sub-folders of a specific type, such as identification cards, driver's licenses, passports, military officers ' photographs, employee's licenses, birth certificates, family books, and the like.

By the method, the classified text images can be set into the file tree, and all folders are named layer by layer, so that all the text images to be classified can be automatically classified into the corresponding folders. In addition, in order to facilitate the search of the text type images, the text type images can be automatically named by using the keywords.

For example, the classification may be performed in the manner as shown in fig. 3, classifying all images in the album: first All Documents are displayed, and then hand write notes, ID Card & pass, script, Screens, Certificate, Other images, and the like are sequentially displayed. Of course this is only an example and in practical applications the classification can be done in other ways as well. The sorted images can be sorted according to the modified time sequence, also can be sorted according to the shooting time sequence, or the sorting mode can be set according to the requirement.

In practical application, a user can search the classified text images according to the keywords so as to find a target file quickly. Specifically, in response to an operation of inputting a search word by a user, whether a keyword matched with the search word exists is searched for, and if yes, a text image corresponding to the keyword is output. For example, when the search word input by the user is "identity", searching whether a keyword matched with the search word exists, and if the matched keyword "identity card" exists, outputting and displaying a text type image corresponding to the keyword "identity card" to the user.

The second subdivision type may include: people's life pictures, landscape pictures, animal pictures, plant pictures, etc.

For example, if the non-text image is a "thunderpeak tower" photo and the image contains three words, "thunderpeak tower" word, the text recognition result for the non-text image is "thunderpeak tower", and the keyword can be determined to be the thunderpeak tower, so that the second subdivision type of the content of the non-text type image can be determined to be a landscape photo according to the keyword, and the non-text type image can be classified into a folder of the subdivision type "landscape photo".

In addition, the "scenic pictures" can be further divided in a subdivision type, for example, scenic pictures are further divided according to scenic names. Therefore, the specific type of the non-text type image can be determined according to the identified sight point name, and the non-text type image is further classified into a sub-folder of the specific type under the folder corresponding to the second subdivision type. For example, for the non-text type image in the foregoing example, since the non-text image is identified as a thunderbolt photo, the non-text type image may be further classified into a subfolder of a specific type of "thunderbolt photo" under a folder of a subdivided type of "landscape photo". It is understood that a sub-folder of a specific type corresponding to different sights can be arranged under the folder of the subdivided type of "landscape photo".

In other embodiments, for the identified non-text-class image, when the image recognition model is used for recognition in step S201, the image recognition model may also identify content in the non-text-class image, so that the second subdivision type may also be determined according to the content of the non-text-class image, and the non-text-class image is classified into a folder corresponding to the second subdivision type. For example, if the image recognition model recognizes that the content displayed by the non-text-class image is a thunderbolt, the second subdivision type of the content of the non-text-class image may be determined to be landscape, and the non-text-class image may be classified into a folder of the subdivision type "landscape".

By the method, the classified non-text images can be set into the file tree, and all folders are named layer by layer, so that all the non-text images to be classified can be automatically classified into the corresponding folders.

The non-text type images can be automatically named according to the content of the non-text type images. For example, the content of the non-text image may include the name of the identified animal and plant, the name of the scenic spot, etc., so the non-text image may be automatically named according to the name of the identified animal and plant, the name of the scenic spot, etc. Or automatically naming the non-text images according to the keywords acquired by the keyword classification model. By automatic naming, the search for the non-text type image can be facilitated.

The classification of the non-text images may be performed according to the time, place, and relationship between persons, name, and the like of the image.

Preferably, for the classified text type images and non-text type images, encryption processing may be performed to ensure the security of the files, for example, encryption processing may be performed on important files of the license type, or encryption processing may be performed on the personal life of a private person, a single file may be encrypted during encryption, or a corresponding folder may be encrypted.

Preferably, for convenience of user operation, when a user needs to print, the document to be printed may be imported by one key and processed in association with the document according to the result of the classification, for example, when a certificate photo shot at a certain time or a certain place needs to be printed, the picture to be printed may be imported by searching for the picture through a keyword, thereby implementing a printing function.

Further, before performing printing, the method further includes: if the text images needing to be signed exist in all the imported text images, signing in a signature area preset in the text images needing to be signed; and/or if the text images with the defects exist in all the imported text images, performing filter processing on the text images with the defects.

Specifically, some documents to be signed are provided with signature areas, the images can be directly signed, and the signed documents are printed.

The image with defects is subjected to filter processing, for example, the following processing is performed:

a) some text images have shadows due to problems of light and the like during shooting, and the shadows can be removed in order to ensure the effect during printing;

b) the method can be used for completing the old and distorted photos;

c) the handwritten characters, smearing, oil stain and the like in the text can be automatically removed during printing;

d) in order to save the amount of ink used at the time of printing, binarization processing may also be performed on the image.

In summary, the image recognition and classification method provided by the invention can be used for both text images and non-text images to perform text recognition by using the OCR text recognition method described above, so as to obtain the text recognition results of the text images and the non-text images, and determine the keywords according to the text recognition results to further classify the text images and the non-text images. In addition, the non-text images can be classified by adopting the image content, and the accuracy of the classification result is also improved.

On the basis of the text recognition method, the invention also provides a document recognition processing method which is used for converting different types of files, such as scanned files, PDF files or pictures, into texts which can be searched or edited at any time. When a user wants to find a file or a picture, but does not remember a title, only a few words in the document can be recalled, but the document cannot be searched according to the words in the document because the document is in an uneditable format. By adopting the document identification processing method provided by the invention, because the non-editable document is converted into the editable document, when the search is carried out according to the document content, the search can be carried out according to the document content or the characters on the picture, namely, only the keywords are required to be input in the search box, and the keywords, namely the titles, the content, the remarks and the characters on the picture can be intelligently searched.

As shown in fig. 4, the image recognition and classification method includes the following steps:

step S301, an input image is obtained, wherein the input image comprises an original document to be identified.

The type of the original document may be a paper document, the input image may be formed by taking a picture or scanning, or the type of the original document may be an electronic document, such as a non-editable text PDF document or a picture document, and the input image may be directly obtained at this time.

Step S302, identifying the original document in the input image to obtain a character identification result of the original document.

Specifically, the original document in the input image may be recognized by using a text recognition method as shown in fig. 1. The specific identification process is not described herein.

Step S303, arranging the character recognition result of the original document according to the position information of each character of the original document in the input image to obtain a recognition document.

Specifically, the arranging the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognized document includes:

As shown in fig. 5a and 5b, fig. 5a shows an input image containing an original document, and fig. 5b shows a final recognition document, as can be seen from fig. 5a and 5b, coordinate information of each character of the original document in the input image can be obtained during processing, so that after a character recognition result of the original document is obtained, each character is placed in a corresponding position in the input image according to the coordinate information of the character to replace the character in the original document, thereby obtaining the recognition document.

From the above, the characters on the input image can be converted into editable characters by adopting the OCR, the characters can be intelligently recognized, manual typing input is not needed, and PPT, PDF files, pictures, business cards, test papers and the like can be instantly converted into electronic draft recognition documents which can be edited and modified. In order to ensure the accuracy of converting characters, the original document and the identification document can be compared, whether a difference point exists between the identification document and the original document or not is judged, and if the difference point exists, the difference point is corrected in the identification document. For example, a manual verification method may be used to compare the original document with the editable electronic text of the output identification document, and find the difference point between the editable electronic text and the original document in the conversion process.

Preferably, when scanning a thick book, because a file is shot in the presence of a radian, the original document in the acquired input image cannot be identified due to the radian and other problems, or the identified document may output a messy code, in which case the radian of the original document in the input image needs to be corrected, and the corrected input image with the radian removed is subjected to text identification and output, thereby avoiding the occurrence of the messy code. Specifically, a correction model may be used to identify the curvature of the original document in the input image, and if the curvature meets a preset correction condition, the original document in the input image is corrected to remove the curvature of the original document.

As shown in fig. 6a and 6b, for a font (with a font size generally smaller than a text word) with a label reference in an original document, after identification by a current document identification processing method, the labeled content may be inconsistent with the original document, as shown by several places selected by boxes in fig. 6a and 6b, which requires a user to check one by one and modify manually, which greatly reduces efficiency. For the situation, the invention adopts a label identification model to identify the input image so as to identify the label content in the original document, and in the identification document, the character identification result corresponding to the label content is typeset into a format consistent with the original document. The invention identifies the input image through the label identification model, distinguishes the label content from the characters of the original document, and outputs the label content in a form consistent with the original document instead of the character form same with other character contents. Fig. 6c shows the recognition document processed by the method of the present invention, and as can be seen from fig. 6a and 6c, the label is automatically recognized by the label recognition model in the OCR recognition process, and then the recognition document is automatically typeset into a format consistent with the original document after verification according to the recognition result, so that the text recognized by OCR is consistent with the original image without manual proofreading.

In summary, the document identification processing method provided by the invention identifies the document to be identified in the input image by using an OCR text identification method, so as to obtain the identified document, and since the non-editable document is converted into the editable document, convenience is provided for obtaining the document by subsequently adopting keyword search in the document, and rapid search of the document is realized. In addition, by correcting the radian of the input image and identifying and adjusting the fonts marked and quoted in the document, errors in the process of converting the document to be identified in the input image into the editable electronic text are reduced, and the conversion accuracy is improved.

Based on the same inventive concept, the invention also provides electronic equipment. As shown in fig. 7, the electronic device includes a processor 301, a communication interface 302, a memory 303 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 complete communication with each other through the communication bus 304;

the memory 303 is used for storing computer programs;

the processor 301, when executing the program stored in the memory 303, may implement the steps in the text recognition method as described above, or implement the steps in the image recognition classification method as described above, or implement the steps in the document recognition processing method as described above.

The communication bus 304 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 302 is used for communication between the above-described electronic apparatus and other apparatuses.

The Processor 301 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 301 is the control center of the electronic device and connects the various parts of the whole electronic device by various interfaces and lines.

The memory 303 may be used for storing the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling data stored in the memory 303.

The memory 303 may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Based on the same inventive concept, the present invention also provides a computer-readable storage medium having stored thereon instructions which, when executed, may implement the steps in the text recognition method as described above, or implement the steps in the image recognition classification method as described above, or implement the steps in the document recognition processing method as described above.

Similarly, computer-readable storage media in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. It should be noted that the computer-readable storage media described herein are intended to comprise, without being limited to, these and any other suitable types of memory.

It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In general, the various exemplary embodiments of this invention may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of the embodiments of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

It should be noted that, in the present specification, all the embodiments are described in a related manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the electronic device and the computer-readable storage medium, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the partial description of the method embodiments.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art based on the above disclosure are within the scope of the appended claims.

Claims

1. A text recognition method, comprising:

2. The text recognition method of claim 1, further comprising: identifying the direction of a text to be identified in a text image, and if the direction does not accord with a preset condition, correcting the direction of the text to be identified;

3. The text recognition method of claim 1, wherein the character recognition model is a neural network model based on CTC join-sense time classification technique and Attention mechanism.

4. The method of claim 1, wherein the character recognition model is trained using a training sample set comprising a CJK character set and an ISO 88591-16 character set.

5. The text recognition method of claim 1, wherein the language classification model is a fasttext < N-Gram > language classification model based on a wiki dataset.

6. An image recognition and classification method is characterized by comprising the following steps:

recognizing the text in the text type image or the non-text type image by adopting the text recognition method according to any one of claims 1 to 5 to obtain a text recognition result of the text type image or the non-text type image;

7. The image recognition and classification method according to claim 6, further comprising, after determining the keyword:

8. The image recognition and classification method of claim 6, after recognizing the text-type image or the non-text-type image, further comprising:

9. The image recognition classification method of claim 6, wherein the first subdivision type comprises: one or more of notes, certificates, receipts, screenshots, documents, certificates.

10. The image recognition and classification method according to claim 6, wherein for the recognized non-text type image, the image recognition model recognizes the content in the non-text type image;

the method further comprises the following steps:

11. The image recognition and classification method of claim 10, after recognizing the content in the non-text type image, further comprising:

12. The image recognition and classification method of claim 6, wherein after classifying the text class images in the text class image folder into a folder corresponding to the first segmentation type, further comprising:

13. The image recognition and classification method of claim 6, wherein after classifying the text class images in the text class image folder into a folder corresponding to the first segmentation type, further comprising:

14. The image recognition and classification method according to claim 13, before performing printing, further comprising:

15. A document identification processing method is characterized by comprising the following steps:

identifying the original document in the input image by adopting the text identification method according to any one of claims 1 to 5 to obtain a character identification result of the original document;

16. The document recognition processing method of claim 15, wherein arranging the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognized document comprises:

17. The document identification processing method of claim 15, after obtaining the identification document, further comprising:

18. The document identification processing method of claim 15, further comprising, before identifying the input image:

19. The document identification processing method of claim 15, after obtaining the identification document, further comprising:

20. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored on the memory, implementing the steps of the method of any of claims 1 to 19.