CN115050037A - Card text recognition method, device and storage medium - Google Patents

Card text recognition method, device and storage medium Download PDF

Info

Publication number
CN115050037A
CN115050037A CN202110213987.5A CN202110213987A CN115050037A CN 115050037 A CN115050037 A CN 115050037A CN 202110213987 A CN202110213987 A CN 202110213987A CN 115050037 A CN115050037 A CN 115050037A
Authority
CN
China
Prior art keywords
text
image
card
region
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110213987.5A
Other languages
Chinese (zh)
Inventor
洪芳宇
施烈航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110213987.5A priority Critical patent/CN115050037A/en
Priority to PCT/CN2022/077038 priority patent/WO2022179471A1/en
Publication of CN115050037A publication Critical patent/CN115050037A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

The application relates to the field of optical character recognition in the technical field of artificial intelligence, in particular to a card text recognition method, a device and a storage medium, wherein the method comprises the following steps: acquiring a first to-be-identified image of a card; detecting the first image to be recognized to obtain at least one first text region, wherein the first text region represents a region where a text in the first image to be recognized is located; according to the first text area, performing rotation correction on the first image to be recognized to obtain a second image to be recognized; detecting the second image to be recognized to obtain at least one second text region, wherein the second text region represents a region where a text in the second image to be recognized is located; and identifying the image in the second text region to obtain a first target text corresponding to the second text region. According to the card text recognition method and device, the card text recognition accuracy can be improved, and the user experience is improved.

Description

Card text recognition method, device and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, and a storage medium for card text recognition.
Background
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. That is, artificial intelligence studies the design principle and implementation method of various intelligent machines, so that the machine has the functions of perception, reasoning and decision making.
Optical Character Recognition (OCR) belongs to an important direction in the field of artificial intelligence, and OCR is based on a deep learning technology, provides various services, for example, supports the content of characters on a picture, and intelligently recognizes the content of characters as a structured text, and has a wide application scenario, for example, in the current card-approval OCR application scenario, a user needs an OCR solution with quicker response, higher precision and stronger universality urgently.
Disclosure of Invention
In view of the above, a method, an apparatus and a storage medium for card text recognition are provided.
In a first aspect, an embodiment of the present application provides a method for identifying a card text, where the method is used for a terminal device, and the method includes: acquiring a first to-be-identified image of a card; detecting the first image to be recognized to obtain at least one first text region, wherein the first text region represents a region where a text in the first image to be recognized is located; according to the first text area, performing rotation correction on the first image to be recognized to obtain a second image to be recognized; detecting the second image to be recognized to obtain at least one second text region, wherein the second text region represents a region where a text in the second image to be recognized is located; and identifying the image in the second text region to obtain a first target text corresponding to the second text region.
According to the embodiment of the application, the first image to be identified of the card is acquired, the first image to be identified is detected to obtain at least one first text region, the first image to be identified is rotationally corrected according to the first text region to obtain a second image to be identified, the second image to be identified is detected to obtain at least one second text region, the image in the second text region is identified to obtain the first target text corresponding to the second text region, the input of the first target text into the image and the output of the first target text into the card text content can be realized, the angle of the card text can be adjusted to a better state through the rotational correction, the text content identification of the inclined card image can be realized, the omission of the text region can be avoided through secondary detection, the detection accuracy of the text region of the inclined card image is improved, and the identification accuracy of the text content can also be improved, the method is used for the terminal equipment, so that response in detection and identification is fast, power consumption can be reduced, the problems of network disconnection and slow response caused by calling the method on the cloud side are solved, and the user experience in use is improved.
According to a first aspect, in a first possible implementation manner of the method for recognizing a card text, performing rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized, including: and performing rotation correction on the first image to be recognized according to the average inclination angle of at least one text region with the longest length in the first text region to obtain the second image to be recognized.
According to the embodiment of the application, the first image to be recognized is subjected to rotation correction through the average inclination angles of the first text regions with the longest length, so that the correction accuracy can be improved, and the detection accuracy is further improved.
According to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the method for recognizing a card text, detecting the second image to be recognized to obtain at least one second text region includes: determining a horizontal slope of the second text region; and correcting the left edge and the right edge of the second text region according to the horizontal slope of the second text region, wherein after correction, the left edge and the right edge of the second text region are respectively perpendicular to the upper edge and/or the lower edge of the second text region.
According to the embodiment of the application, the horizontal slope of the second text region is determined, the left edge and the right edge of the second text region are corrected according to the horizontal slope of the second text region, and after correction, the left edge and the right edge of the second text region are respectively perpendicular to the upper edge and/or the lower edge of the second text region, so that the situation of word deformation after perspective transformation of the text region due to irregularity of the text region can be prevented, the text in the text region is easier to recognize, and the accuracy of the recognition of the card text is further improved.
According to the first aspect or the first or second possible implementation manner of the first aspect, in a third possible implementation manner of the method for recognizing a card text, detecting the second image to be recognized to obtain at least one second text region includes: determining a horizontal slope and a height of the second text region; and according to the horizontal slope of the second text region, respectively extending the upper edge and the lower edge of the second text region to two sides, wherein the extending distance is determined according to the height.
According to the embodiment of the application, the horizontal slope and the height of the second text area are determined, the upper edge and the lower edge of the second text area are respectively extended to two sides according to the horizontal slope of the second text area, the problem that characters are cut and missed due to the fact that the text area is too close to the text can be prevented, the text in the text area can be identified more easily, and therefore the accuracy of card text identification is further improved.
According to the first aspect or the first, second, or third possible implementation manner of the first aspect, in a fourth possible implementation manner of the card text recognition method, recognizing the image in the second text region to obtain the first target text corresponding to the second text region includes: identifying the image in the second text region to obtain a second target text corresponding to the second text region; determining attributes of the second target text; filtering the connection time sequence classification CTC sequence corresponding to the second text region according to the attribute of the second target text to obtain a filtered CTC sequence; and obtaining the first target text according to the category and the corresponding confidence in the filtered CTC sequence.
According to the embodiment of the application, the image in the second text region is recognized to obtain a second target text corresponding to the second text region, the attribute of the second target text is determined, the connecting time sequence classification CTC sequence corresponding to the second text region is filtered according to the attribute of the second target text to obtain the filtered CTC sequence, and the first target text is obtained according to the category and the corresponding confidence coefficient in the filtered CTC sequence.
According to the first aspect or the first, second, third or fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the method for identifying a card text, the method further includes: training the detection model and the recognition model according to the training sample to obtain a trained detection model and a trained recognition model; the training sample comprises a positive sample and a negative sample, the positive sample corresponds to the negative sample in a one-to-one mode, the positive sample comprises a card picture sample, the card picture sample comprises a text region, the negative sample comprises the card picture sample obtained after the text region is covered, the detection model after the training is used for detecting the first text region and the second text region, and the recognition model after the training is used for recognizing the first target text and the second target text.
According to the embodiment of the application, the detection model and the recognition model are trained according to the training sample to obtain the trained detection model and the trained recognition model, the trained detection model is used for detecting the first text region and the second text region, the trained recognition model is used for recognizing the first target text and the second target text, the occupation of a ROM in the terminal equipment can be reduced, the terminal equipment is prevented from being stuck, the training sample comprises the positive sample and the negative sample, the positive sample corresponds to the negative sample one by one, the positive sample comprises the card picture sample, the card picture sample comprises the text region, the negative sample comprises the card picture sample obtained after covering the text region, the antagonistic learning of the positive sample and the negative sample can be realized, and the distinguishing degree of the detection model for the text region and the non-text region is enhanced, meanwhile, the recognition accuracy under a complex background is improved, the robustness of the model is improved, and the precision of the model is improved.
In a second aspect, an embodiment of the present application provides a card text recognition apparatus, where the apparatus is used for a terminal device, and the apparatus includes: the card identification device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a first to-be-recognized image of a card; the first detection module is used for detecting the first image to be recognized to obtain at least one first text region, wherein the first text region represents a region where a text in the first image to be recognized is located; the correction module is used for performing rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized; the second detection module is used for detecting the second image to be recognized to obtain at least one second text region, and the second text region represents a region where a text in the second image to be recognized is located; and the identification module is used for identifying the image in the second text area to obtain a first target text corresponding to the second text area.
In a first possible implementation manner of the card text recognition apparatus according to the second aspect, the correcting module includes: and the first correction sub-module is used for performing rotation correction on the first image to be recognized according to the average inclination angle of at least one text region with the longest length in the first text region to obtain the second image to be recognized.
In a second possible implementation manner of the card text recognition apparatus according to the second aspect or the first possible implementation manner of the second aspect, the second detection module includes: a first determination module to determine a horizontal slope of the second text region; and the second correction submodule is used for correcting the left edge and the right edge of the second text region according to the horizontal slope of the second text region, wherein after correction, the left edge and the right edge of the second text region are respectively vertical to the upper edge and/or the lower edge of the second text region.
In a third possible implementation manner of the card text recognition apparatus according to the second aspect or the first or second possible implementation manner of the second aspect, the second detection module includes: a second determination module to determine a horizontal slope and a height of the second text region; and the extension module is used for respectively extending the upper edge and the lower edge of the second text region to two sides according to the horizontal slope of the second text region, and the extension distance is determined according to the height.
In a fourth possible implementation manner of the device for recognizing a card text according to the second aspect as well as the first, second or third possible implementation manner of the second aspect, the recognition module includes: the recognition sub-module is used for recognizing the image in the second text region to obtain a second target text corresponding to the second text region; a third determining module, configured to determine an attribute of the second target text; the filtering module is used for filtering the connection time sequence classification CTC sequence corresponding to the second text region according to the attribute of the second target text to obtain a filtered CTC sequence; and the fourth determining module is used for obtaining the first target text according to the category and the corresponding confidence degree in the filtered CTC sequence.
According to the second aspect or the first, second, third or fourth possible implementation manner of the second aspect, in a fifth possible implementation manner of the card text recognition apparatus, the apparatus further includes: the training module is used for training the detection model and the recognition model according to the training sample to obtain a trained detection model and a trained recognition model; the training sample comprises a positive sample and a negative sample, the positive sample corresponds to the negative sample in a one-to-one mode, the positive sample comprises a card picture sample, the card picture sample comprises a text region, the negative sample comprises a card picture sample obtained after the text region is covered, the detection model after training is used for detecting the first text region and the second text region, and the recognition model after training is used for recognizing the first target text and the second target text.
In a third aspect, an embodiment of the present application provides a card text recognition apparatus, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the method for card text recognition of the first aspect described above or one or more of many possible implementations of the first aspect when executing the instructions.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement the method for card text recognition of the first aspect or one or more of the many possible implementations of the first aspect.
In a fifth aspect, an embodiment of the present application provides a terminal device, where the terminal device may perform the method for recognizing a card text in the first aspect or in one or more of multiple possible implementation manners of the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, which includes computer readable code or a non-transitory computer readable storage medium carrying computer readable code, and when the computer readable code runs in an electronic device, a processor in the electronic device executes a card text recognition method of the first aspect or one or more of the many possible implementations of the first aspect.
These and other aspects of the present application will be more readily apparent from the following description of the embodiment(s).
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the description, serve to explain the principles of the application.
Fig. 1 shows a schematic diagram of an application scenario according to an embodiment of the present application.
Fig. 2 shows a flow chart of a method of card text recognition according to an embodiment of the application.
FIG. 3 shows a flow diagram for generating negative examples according to an embodiment of the present application.
Fig. 4 shows a flow chart of a method of card text recognition according to an embodiment of the application.
Fig. 5 shows a flow chart of rotation correction of a picture according to an embodiment of the present application.
Fig. 6 is a schematic diagram illustrating an effect of performing rotation correction on a picture according to an embodiment of the present application.
Fig. 7a shows a schematic diagram of a quadrangular text box obtained through secondary detection.
Fig. 7b is a schematic diagram illustrating the effect of directly performing perspective transformation on the quadrangular text box obtained through secondary detection.
Fig. 7c is a schematic diagram illustrating edge correction of the quadrangular text box obtained by the secondary detection.
Fig. 7d is a schematic diagram illustrating the effect of performing edge correction and then performing perspective transformation on the quadrangular textbox obtained by secondary detection.
Fig. 7e shows a schematic diagram of a quad text box that is too close together.
FIG. 7f is a schematic diagram of edge expansion of the quadrangular text box obtained by the secondary detection.
FIG. 8 illustrates a flow chart for performing edge correction according to an embodiment of the present application.
FIG. 9 shows a flow diagram for performing edge augmentation according to an embodiment of the present application.
Fig. 10 shows a schematic diagram of confidence filtering based on CTC sequences according to an embodiment of the application.
Fig. 11 shows a flow chart of a method of card text recognition according to an embodiment of the application.
FIG. 12 shows a flow diagram of a method of card text recognition according to an embodiment of the present application.
FIG. 13 shows a flow diagram of a method of card text recognition according to an embodiment of the present application.
FIG. 14 shows a flow diagram of a method of card text recognition according to an embodiment of the present application.
Fig. 15 shows a block diagram of a card text recognition apparatus according to an embodiment of the present application.
Fig. 16 shows a schematic structural diagram of a terminal device according to an embodiment of the present application.
Fig. 17 shows a block diagram of a software configuration of a terminal device according to an embodiment of the present application.
Detailed Description
Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.
At present, under the situation of card text recognition, when a user performs card text recognition on a terminal device, the terminal device needs to send a card picture to a server on a cloud side for recognition, and the card picture is returned to the user after a recognition result is obtained. The card includes any type of certificate with a certain shape and format, such as an identification card, a bank card, an employee card, a business license, etc., and the picture of the card may include a picture stored in the terminal device by the user, a picture instantly taken by the user, a picture scanned by the terminal device by the user holding the card, etc. Under the condition of using a cloud side server for identification, card text identification on the end side is not supported, namely card text identification is directly carried out on terminal equipment, and meanwhile, because the terminal equipment and the server need to transmit data through a network, the identification cannot be carried out under the condition of no network, network delay exists under the condition of the network, and the response speed is low. Meanwhile, in this case, the accuracy of recognition is not high for oblique ka text.
In another case, when card text recognition is performed on the terminal device, for an inclined card book, the accuracy of text recognition is very low, for card text recognition under complex scenes such as letterpress and steel seal fonts, card text and background distinction degree low, illumination interference, card picture blur and the like on a card, the accuracy of recognition is very low, and part of card text pictures cannot be recognized after preview. Meanwhile, if a dedicated detection and recognition model needs to be trained on the terminal device, a large amount of read-only memory (ROM) storage of the terminal device is occupied, and the problem of terminal device blocking is caused.
In order to solve the technical problem, the application provides a card text identification method, the card text identification method of the embodiment of the application can realize detection of a text region on a card picture and identification of a text in the text region, and the method can be applied to terminal equipment, so that the identification accuracy of the text is improved.
Fig. 1 shows a schematic diagram of an application scenario according to an embodiment of the present application. In a possible implementation manner, the text recognition method provided in this embodiment of the present application may be applied to a scenario in which text recognition on a card is performed on a terminal device, for example, a bank card number recognition or a driver's license information recognition, and in the scenario in which the bank card number recognition is performed, a user may obtain a recognized bank card number "6214 XXXX 73469446" after uploading or scanning a non-horizontal bank card photo as shown in fig. 1(a) by using the terminal device; in a scene of identifying the driver license information, a user can hold the driver license as shown in fig. 1(b) by hand, and perform scanning or photographing uploading by using a terminal device, and obtain the information of the identified driver license as follows: "name: color RoxX; sex: male; certificate number: 3408111992XXXX 6319; the driving type is as follows: c1; date of the certificate: 2011-02-14; the validity period is as follows: 2017-02-14 to 2027-02-14 ". After the relevant information is identified, the identified key information can be processed, for example, the identified key information such as the driver license number, the name and the like is in one-to-one correspondence with the preset relevant fields to form structured text information, so that the information processing efficiency is greatly improved.
The terminal equipment can be equipment with a wireless connection function, the wireless connection function can be connected with other terminal equipment in a wireless connection mode such as wifi and Bluetooth, and the terminal equipment can also have a wired connection communication function. The terminal device of the application can be a touch screen, can also be a non-touch screen, and can also be screen-free, the touch screen can control the terminal device in a manner of clicking, sliding and the like on a display screen through fingers, a touch pen and the like, the non-touch screen device can be connected with input devices such as a mouse, a keyboard, a touch panel and the like, the terminal device is controlled through the input devices, and the screen-free device can be a screen-free Bluetooth sound box and the like. For example, the terminal device of the present application may be a smart phone, a netbook, a tablet computer, a notebook computer, a wearable electronic device (such as a smart band, a smart watch, and the like), a TV, a virtual reality device, a sound, electronic ink, and the like. The method and the device for identifying the card type are not limited to the type of the terminal device, and are not limited to the card type which can be identified by the terminal device, and the method and the device for identifying the card type can be applied to scenes for identifying information contained in any card under any scenes (including complex scenes such as natural scenes and printing scenes), and can also be applied to other scenes.
FIG. 2 shows a flow diagram of a method of card text recognition according to an embodiment of the present application. As shown in fig. 2, a flow of a card text recognition method according to an embodiment of the present application includes:
step S101, training phase.
The detection model and the recognition model can be trained by utilizing a training set to obtain the trained detection model and the trained recognition model. The training set may include samples of the kava pictures and their corresponding labels. The detection model and the recognition model may be a common OCR model, and the application does not limit the category of the detection model and the recognition model.
For example, fine tuning training may be performed on the basis of the structures and parameters of the original detection model and recognition model, and thus, since no new detection model and recognition model are added and fine tuning training is performed only on the basis of the general model, zero ROM increase may be achieved and the blocking of the terminal device may be reduced. The training set used by the training may include positive examples, which may represent a card picture containing textual content, and negative examples, which may represent a card picture containing no textual content.
In order to increase the robustness of the model, the positive samples may be transformed, and the training set includes the transformed positive samples, so that the original model enhances the adaptability to a new scene, and the method of transforming the positive samples may include: the method comprises the following steps of random translation (simulating the condition that a user is not shooting directly, and a lens is horizontally shifted), random scaling (simulating the condition that the shooting distance of the user is far and near different), random rotation (simulating the condition that the shooting angle of the user is inclined in a plane), perspective transformation (simulating the condition that the shooting angle of the user is inclined in the front and back), fuzzy processing (simulating the condition that the user is not in focus and the lens shakes), and random aspect ratio (simulating the condition that pictures shot by different mobile phones are different in size and different in aspect ratio).
In order to enable the model to perform antagonistic learning aiming at positive and negative samples, under the condition that a text area is deeply fused with a picture background and is difficult to distinguish (for example, the font is a letterpress font, and the text is printed on a landscape painting), the situations of frame error, frame leakage, multiple frames and few frames are avoided, and the model can obtain better detection and identification effects. The positive samples in the training set can be respectively made into one-to-one corresponding negative samples, and the positive samples and the corresponding negative samples are input in a correlation mode in the training stage. FIG. 3 shows a flow diagram for generating negative examples according to an embodiment of the present application. As shown in fig. 3, the process of generating negative examples includes steps S201 to S204:
step S201 reads in the label and the positive sample.
The label may be a labeling label of the positive sample after labeling in the training set, and may represent coordinates of a text region and a non-text region in the positive sample, for example.
In step S202, a mask map of the text region and the non-text region is generated.
For example, the corresponding mask map may be generated according to the text region and the non-text region determined by the coordinates marked in the label. The mask map may be a black and white picture, for example, a portion where a text region is white in the negative example, and a non-text region is black.
Step S203, selecting a non-text area in the neighborhood of the text area.
For example, the non-text region may be a black portion in the mask map, indicating content that needs to be retained.
Step S204, covering the text area.
For example, pixels of the non-text area are cut out according to the mask map and filled into the text area, so that the text area is covered, the non-text area in the positive sample is normally displayed, and the processed picture of the negative sample is formed.
In step S205, a negative sample is saved.
After the negative sample is generated, it may be associated with the corresponding positive sample and stored in the training set.
Thus, the discrimination of the detection model for the text region and the non-text region can be enhanced.
Referring back to fig. 2, after the training phase of step S102 is completed, in step S102, a preprocessing phase is entered.
The original input picture can be preprocessed to obtain a processed picture, and the processed picture is adjusted to be input of the detection model. For example, the picture may be normalized to fit the original input size of the detection model.
The picture input in the preprocessing stage can be a card picture uploaded by a user, the uploading mode can comprise that the user directly shoots the card picture, the user uploads a picture stored in the terminal device, or the user scans the card picture through the terminal device, and the like, the uploading mode can also be any other mode, and the application does not limit the mode of uploading the card picture by the user.
Step S103, text detection stage.
The method comprises the steps of carrying out text detection on a preprocessed picture twice by using a trained detection model, obtaining a detected quadrilateral text box during the first text detection, carrying out rotation correction on the picture according to the detected quadrilateral text box so that text lines in the picture can tend to be horizontal, and inputting the rotation-corrected picture during the second text detection so as to obtain a detected new quadrilateral text box.
For the input picture, a plurality of text boxes may be detected, the text boxes may correspond to the detected text regions, and the text regions may include related card information to be identified, such as a bank card number.
Step S104, text recognition stage.
The method comprises the steps of carrying out edge expansion and correction and perspective transformation on a quadrilateral text box to obtain a rectangular picture block, identifying the picture block by using a trained identification model, and outputting corresponding text contents.
And the attribute of the obtained text content can be further determined, the text content can be further accurately identified according to the attribute, and the identified text content is checked and corrected.
On the basis of fig. 2, fig. 4 shows a flowchart of a method for card text recognition according to an embodiment of the present application. As shown in fig. 4, the process of card text recognition specifically includes:
step S301, inputting a card picture.
The input card picture can be the card picture which is processed in the preprocessing stage and accords with the input size of the detection model.
Step S302, the input picture is detected by using the detection model, and the detected candidate text area is obtained.
For a certain input picture, a plurality of candidate text regions contained in the picture can be detected, the candidate text regions can include coordinates and corresponding confidence degrees, the candidate text regions with the similar coordinates can represent the same corresponding text, and the confidence degrees represent the probability that the corresponding candidate text region is the most suitable text region pointing to the corresponding certain text.
For example, for a text representing a card number in a bank card picture, the text may correspond to a plurality of candidate text regions and corresponding confidence levels, the confidence level corresponding to a candidate text region that does not completely contain the card number text is relatively low, and the confidence level corresponding to a candidate text region that completely contains the card number text is relatively high.
The detection model may be obtained by performing fine-tuning training on the positive and negative samples, and the training process may refer to step S101 in fig. 2.
As can be seen from the above, for a certain text in the picture (e.g. a card number in a bank card), a plurality of corresponding candidate text regions may be detected, and thus there may be a large amount of redundancy. And determining the quadrilateral text box corresponding to each text in the picture by fusing and filtering redundant candidate text regions.
For example, a non-maximum suppression (NMS) algorithm may be used to perform fusion and filtering processing on the candidate text regions, and for multiple candidate text regions corresponding to the same text, a best-fit candidate text region may be finally determined to form a quadrangular text box.
Step S303, rotationally correcting the picture according to the first m longer quadrangular text boxes to enable the text lines in the picture to tend to be horizontal.
After step S303, quadrilateral text boxes corresponding to a plurality of text regions may be determined in the picture, and in order to solve the problem of detection accuracy reduction caused by picture inclination, after the picture is rotated in step S303, in step S304, the rotated and corrected picture is input into a detection model, and secondary detection is performed to obtain a new quadrilateral text box. Fig. 5 is a flowchart illustrating rotation correction of a picture according to an embodiment of the present application, which can be taken as an example of steps S301 to S304. Step S401 may refer to step S301 in fig. 4, and step S402 and step S403 may refer to step S302 in fig. 4, and the post-processing may include performing fusion and filtering processing on redundant candidate text regions, and as shown in fig. 5, the flow of performing rotation correction on a picture further includes:
in step S404, the first m longer text boxes of the plurality of quadrangular text boxes are obtained.
In step S405, the average tilt angle α of the m text boxes is calculated.
Step S406, the picture is rotated by an angle alpha to obtain the rotationally corrected picture.
Thus, the lines of text in the picture may tend to be horizontal.
For example, a text box with a length of five top in the driver license picture may be selected, and the inclination angles corresponding to the five text boxes are calculated to be α respectively 1 、α 2 、α 3 、α 4 And alpha 5 Then, the average inclination angle α is (α) 12345 )/5. In one possible implementation, the image rotation can be implemented by using the warpAffine function in openCV, and the rotated picture boundary background is filled (for example, copy edge filling). The inclination angle may be relative to the horizontal direction or relative to the vertical direction.
And step S407, adjusting the size of the image after rotational correction to obtain an image suitable for detecting the input size of the model.
In one example, fig. 6 is a schematic diagram illustrating an effect of performing rotation correction on a picture according to an embodiment of the present application. As shown in fig. 6, fig. 6(a) may represent a picture of a bank card input at the time of first detection, text boxes 1, 2, 3, and 4 of white borders determined by white dots on the picture of the bank card in fig. 6(b) may respectively represent quadrangular text boxes determined in step S303, and fig. 6(c) may represent a picture of the bank card obtained after rotation correction. Quadrangular text boxes 1, 2, 3 and 4 shown in fig. 6(b) can be obtained by detecting the picture of the bank card for the first time, quadrangular text boxes (e.g. text boxes 1, 2 and 3) with the first three lengths can be selected, the image in the white box shown in fig. 6(b) is rotated by using a warpAffine function, the area in the white box containing the picture of the bank card in fig. 6(c) can represent the area after the image in the white box shown in fig. 6(b) is rotated, the color outside the area after the image rotation in fig. 6(c) can be obtained by copying edges and filling by using the warpAffine function, and finally the rotationally corrected picture of the bank card is formed. The size of the rotationally corrected bank card picture shown in fig. 6(c) can also be adjusted to fit the input size required by the detection model.
As shown in fig. 6(b), when the image without rotation correction is detected, some text regions may be missed, and the text lines in the image shown in fig. 6(c) tend to be horizontal, and when the image after rotation correction is detected, the difficulty of text detection can be greatly reduced, so that the detected text regions are more accurate, and basically no missing detection can be realized.
And step S408, inputting the image after the rotation correction into a detection model, and performing secondary detection to obtain a new quadrangular text box.
For the output result of the secondary detection, the NMS algorithm may be further used to perform fusion and filtering processing on the candidate text regions included in the output result, and for multiple candidate text regions corresponding to the same text, a new quadrangular text box corresponding to a most suitable candidate text region may be finally determined.
And obtaining a new quadrangular text box, wherein the new quadrangular text box is more accurate and has lower probability of missed detection compared with the quadrangular text box obtained by the first detection.
Referring back to fig. 4, after step S304, in step S305, the quadratic detected quadrangular text box is edge-corrected and expanded, and perspectively transformed into a rectangular picture block.
Fig. 7a shows a schematic diagram of a quadrangular text box obtained through secondary detection. As shown in fig. 7a, a quadrilateral text box obtained through secondary detection may have a problem that the left and right edges and the upper and lower edges are not perpendicular, and fig. 7b is a schematic diagram illustrating an effect of directly performing perspective transformation on the quadrilateral text box obtained through secondary detection, as shown in fig. 7b, if the quadrilateral text box shown in fig. 7a is directly subjected to perspective transformation, a text line which is obliquely deformed as shown in fig. 7b may be obtained, and if such a picture block is input to a recognition model for recognition, the recognition effect may be deteriorated.
Therefore, the quadratic detection quadrangular text box can be edge-corrected, so that the left and right edges and the upper and lower edges of the text box are vertical. FIG. 8 illustrates a flow chart for performing edge correction according to an embodiment of the present application. As shown in fig. 8, the flow of edge correction includes:
step S501, determining the quadrangular text box detected by the detection model.
Wherein the quadrangular text box can be shown as a white quadrangular text box in fig. 7a, for example.
In step S502, the horizontal slope k of the quadrangle is calculated.
The horizontal slope may be obtained according to the inclination degree of the upper and lower sides of the quadrangular text box on the horizontal line, for example, the horizontal slope may be represented by a tangent of an angle between the line segment AD (or the line segment BC) and the horizontal line in fig. 7 c.
And step S503, drawing a vertical line through the midpoints of the left side and the right side of the quadrilateral frame.
And step S504, calculating the intersection points of the vertical line and the upper and lower sides of the quadrangular frame, and determining a new quadrangular frame.
In one example, fig. 7c is a schematic diagram illustrating edge correction of a quadrangular text box obtained through secondary detection. As shown in fig. 7C, for a certain quadrangular textbox, the horizontal slope k can be obtained according to the degree of inclination of the AD side and the BC side, and the intersection points of the vertical lines a and B and the upper and lower sides (the point a, the point B, the point C, and the point D shown in fig. 7C) are obtained according to the crossing point of the horizontal slope k (the point E and the point F shown in fig. 7C) and the vertical lines a and B drawn on the upper and lower sides (the vertical line a and the vertical line B shown in fig. 7C), and the quadrangular textbox formed by the intersection points is the quadrangular textbox (the quadrangular ABCD shown in fig. 7C) obtained by edge correction. Fig. 7d is a schematic diagram illustrating the effect of performing edge correction and then performing perspective transformation on the quadrangular textbox obtained by secondary detection. As shown in fig. 7d, a better text effect can be obtained after perspective transformation is performed on the quadrangular text box through edge correction.
The method of edge correction of the quadrangular text box is not limited to the above method, and for example, the non-midpoint of the left and right sides (the AB side and the CD side) may be taken to be the vertical line of the upper and lower sides, so long as the left and right sides of the quadrangular text box obtained by correction are perpendicular to the upper and lower sides.
In a possible case, the text box detected by the detection model is too close to the characters, so that the condition of missing characters and missing characters is caused. Fig. 7e shows a schematic diagram of a quad text box that is too close together. As shown in fig. 7e, the first number 6 and the last number 0 of the bank card number in the quadrangular text box are not completely included, and inputting such text box into the recognition model may result in that the recognition model is not sensitive to the text and cannot recognize the incompletely included number 6 and number 0.
Therefore, the quadrilateral text box can be subjected to edge expansion, so that the quadrilateral text box subjected to edge filling can completely contain corresponding text. FIG. 9 shows a flow diagram for performing edge augmentation according to an embodiment of the present application. As shown in fig. 9, the process of edge extension includes:
and step S601, determining the quadrangular text box after edge correction.
In step S602, the height h of the quadrangular text box is calculated.
Step S603, the upper and lower sides of the quadrangular text box are extended by h/2 (or other multiples of the height) respectively according to the horizontal slope k.
Step S604, verifying the validity of the edge correction and the edge extension.
The validity of the quadrilateral frame after edge correction and edge expansion is checked, and whether the corrected quadrilateral text frame contains contents outside the picture or not can be determined.
In one example, FIG. 7f illustrates performing quadratic detection on a quad text boxSchematic diagram of edge expansion. As shown in fig. 7f, the height h and the horizontal slope k of the text box may be calculated, and the upper and lower sides of the quadrangular text box ABCD are extended by h/2 according to the horizontal slope k of the text box to obtain the quadrangular text box a 1 B 1 C 1 D 1 (the extended portion is shown as a dashed box in FIG. 7 f), quadrangular text box A 1 B 1 C 1 D 1 More content on the picture than the quadrangular text box ABCD.
The method may further include performing a validity check on the modified quadrangular text box, for example, checking a vertex coordinate a of the extended quadrangular text box 1、 B 1、 C 1、 D 1 And determining whether there are coordinates outside the picture.
After the text box is subjected to edge correction and edge expansion, perspective transformation can be performed on the text box, a quadrangle corresponding to the text box is transformed into a rectangle, and a picture block in the shape of the rectangle is obtained.
The quadrangle corresponding to the text box may include any quadrangle such as a parallelogram and a trapezoid.
For example, the perspective transformation may be to project a quadrilateral text box in a picture to a new view plane, resulting in a rectangular picture block.
Referring back to fig. 4, after the edge expansion correction of step S305 is completed, in step S306, the picture block is input into the recognition model, and the recognized text content is obtained.
The recognition model may be obtained by performing fine-tuning training on the targeted positive and negative samples, and the training process may refer to step S101 in fig. 2.
Step S307, determining the attribute of the text content according to the recognized text content and the corresponding coordinates.
Wherein, 'key' in the key-value matching may represent an attribute of the text content, and 'value' may represent the text content.
For example, in the case of recognizing the information of the driver's license, the text content "zhang san" is recognized, and the coordinates thereof are confirmed in a predetermined specific area (for example, in an area indicating a name) of the driver's license, and the attribute of the text content may be determined as "name" according to the preset correspondence between the specific area and the attribute, which may be a preset custom attribute.
And step S308, performing confidence degree filtering and re-recognition according to the attribute of the text content to obtain a re-recognition result.
In a complex scenario, for example, a bank card number and an identification number are easily confused by characters, such as the following:
(0) and (O, O, D), (1) and (I, |, (5) and (S, s), (6) and (B), (8) and (B), (9) and (q, g), (7) and (T), (4) and (+, H), etc.
In the scenario of recognizing the card text, the text included in the card generally does not include all text categories (for example, the text of the bank card number only includes a numeric category), so that confidence filtering and re-recognition may be performed according to attributes of the text content on the basis of a Connection Timing Classification (CTC) sequence output in the middle of the recognition model, where the CTC sequence may represent an intermediate sequence formed on the basis of solving the problem of character alignment using a CTC algorithm, and fig. 10 illustrates a schematic diagram of performing confidence filtering based on the CTC sequence according to an embodiment of the present application. As shown in fig. 10, in the case of identifying the card number of the bank card, through step S305, a picture block as shown in the leftmost side of fig. 10 may be obtained, the picture block is input into the identification model, for a certain bit '0' in the middle of the card number, a filtering and screening of categories may be performed on the CTC sequence as shown in the figure, which includes the 7357 categories and the corresponding confidence levels, where 7357 may represent that the total number of output categories is 7357, the confidence level may represent that each item in the corresponding 7357 categories and the probability that the text content is the item, according to step S307, the attribute of the text content corresponding to the picture block may be determined as "bank card number", and in the case of not filtering the interference items, the 'D' as shown in the figure with the confidence level of 0.9 will be determined as the final text content, thereby causing the misrecognition of the card. And because the card number of the bank card belongs to the number category, 7357 categories can be filtered according to the attribute of the text content (the card number of the bank card), 7346 interference items in non-number categories are screened out, the remaining 10 categories are reserved as the attention items of the number, a new CTC sequence is formed, re-identification is performed on the new CTC sequence of the remaining 10 attention items after filtering, for example, according to the confidence degrees corresponding to the 10 attention items, one item with the highest confidence degree is determined as a re-identification result, as shown in the figure, 0' with the confidence degree of 0.8 can be output as a final identification result, and therefore, the identification precision is further improved.
Step S309, checking and correcting the re-identification result according to the attribute and the checking rule of the text content to obtain the final text content.
The verification rule may be, for example, an encoding rule of a bank card, and for example, in the case of identifying a card number of the bank card, the attribute of the text content is determined as the card number, whether the re-identification content is a number is checked, and whether, for example, a starting digit of the card number corresponds to an issuer of the bank card or the like may be determined according to the encoding rule of the bank card. When the starting digit of the card number is not consistent with the card issuer of the bank card, for example, a digit is different, the digit can be corrected according to the starting digit of the card number corresponding to the card issuer of the bank card.
For example, the luhn (luhn algorithmm) algorithm may be used according to a verification rule for verifying a bank card number.
FIG. 11 shows a flow diagram of a method of card text recognition according to an embodiment of the present application. The method is applied to a terminal device, and as shown in fig. 11, the method includes:
step 1101, acquiring a first to-be-identified image of a card;
step S1102, detecting the first image to be recognized to obtain at least one first text region, where the first text region represents a region where a text in the first image to be recognized is located;
step S1103, performing rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized;
step S1104, detecting the second image to be recognized to obtain at least one second text region, where the second text region represents a region where a text in the second image to be recognized is located;
step S1105, recognizing the image in the second text region to obtain a first target text corresponding to the second text region.
According to the embodiment of the application, the first image to be identified of the card is acquired, the first image to be identified is detected to obtain at least one first text region, the first image to be identified is rotationally corrected according to the first text region to obtain a second image to be identified, the second image to be identified is detected to obtain at least one second text region, the image in the second text region is identified to obtain a first target text corresponding to the second text region, the first target text is input into the image and output as the text content of the card, the angle of the text of the card can be adjusted to a better state through the rotational correction, the text content of the card can be identified in an inclined card image, the detection omission of the text region can be avoided through secondary detection, the detection accuracy of the text region of the inclined card image is improved, and the identification accuracy of the text content can be improved, the method is used for the terminal equipment, so that response is fast during detection and identification, power consumption can be reduced, the problems of network disconnection and slow response caused by calling the method at the cloud side are avoided, and user experience during use is improved.
The first image to be recognized may include a card picture uploaded to the terminal device by the user, the card picture may include a card picture directly taken by the user, a picture uploaded and stored in the terminal device by the user, a card picture obtained by scanning the terminal device by the user, and the like. The first text region and the second text region may refer to the quadrangular text box described above, may represent regions where any text in the image to be recognized is located, and the number of the second text regions may be greater than or equal to the number of the first text regions. The first target text may include text on the card, which may be determined based on the purpose of card identification, e.g., if the card number of the bank card is to be identified, and the first target text may include the card number of the bank card, e.g., "6214 XXXX 73469446".
In a possible implementation manner, the rotation correction is performed on the first image to be recognized according to the first text region, where the rotation correction may be performed on the first image to be recognized according to an average inclination angle of at least one text region with the longest length in the first text region (e.g., the first image to be recognized is rotated by the average inclination angle), so as to obtain the second image to be recognized.
According to the embodiment of the application, the first image to be recognized is subjected to rotation correction through the average inclination angles of the first text regions with the longest length, so that the correction accuracy can be improved, and the detection accuracy is further improved.
The number of the at least one text region may be selected according to the requirement, which is not limited in the present application.
Step S1103 may refer to steps S404 to S407 shown in fig. 5.
Fig. 12 shows a flow chart of a method of card text recognition according to an embodiment of the application. As shown in fig. 12, detecting the second image to be recognized to obtain at least one second text region includes:
step S1201, determining the horizontal slope of the second text region;
step S1202, correcting a left edge and a right edge of the second text region according to the horizontal slope of the second text region, where after correction, the left edge and the right edge of the second text region are respectively perpendicular to an upper edge and/or a lower edge of the second text region.
According to the embodiment of the application, the horizontal slope of the second text region is determined, the left edge and the right edge of the second text region are corrected according to the horizontal slope of the second text region, and after correction, the left edge and the right edge of the second text region are respectively perpendicular to the upper edge and/or the lower edge of the second text region, so that the situation of word deformation after perspective transformation of the text region due to irregularity of the text region can be prevented, the text in the text region is easier to recognize, and the accuracy of the recognition of the card text is further improved.
Wherein, the horizontal slope may represent a degree of inclination of the second text region, the left edge and the right edge of the second text region may refer to the left edge and the right edge of the quadrangular text box described above in fig. 7c, the upper edge and the lower edge of the second text region may refer to the upper edge and the lower edge of the quadrangular text box described above, and after rectification, the second text region may represent a new text region in the second image to be recognized.
Step 1201 may refer to step S502 shown in fig. 8, and step S1201 may refer to step S503 to step S504 shown in fig. 8.
FIG. 13 shows a flow diagram of a method of card text recognition according to an embodiment of the present application. As shown in fig. 13, detecting the second image to be recognized to obtain at least one second text region includes:
step S1301, determining a horizontal slope and a height of the second text region;
step S1302, according to the horizontal slope of the second text region, respectively extending the upper edge and the lower edge of the second text region to two sides, where the extended distance is determined according to the height.
According to the embodiment of the application, the horizontal slope and the height of the second text area are determined, the upper edge and the lower edge of the second text area are respectively prolonged to two sides according to the horizontal slope of the second text area, the problem that characters are cut and missed due to the fact that the text area is too close to the text can be prevented, the text in the text area is easier to recognize, and therefore accuracy of card text recognition is further improved.
The extension may be to expand the range of the second image to be recognized that can be represented by the second text region, for example, for the extended second text region, text that is not originally included in the text region may be included, and the distance of the extension may be preset, for example, 1/2 or other multiples of the height, which is not limited in this application.
The method for calculating the height in step S1301 may refer to step S602 in fig. 9, and step S1302 may refer to step S603 in fig. 9.
Fig. 14 shows a flow chart of a method of card text recognition according to an embodiment of the application. As shown in fig. 14, recognizing the image in the second text region to obtain a first target text corresponding to the second text region includes:
step 1401, identifying the image in the second text region to obtain a second target text corresponding to the second text region;
step S1402, determining the attribute of the second target text;
step S1403, according to the attribute of the second target text, filtering a connection time series classification CTC sequence corresponding to the second text region to obtain a filtered CTC sequence;
step S1404, obtaining the first target text according to the category and the corresponding confidence in the filtered CTC sequence.
According to the embodiment of the application, the image in the second text region is recognized to obtain the second target text corresponding to the second text region, the attribute of the second target text is determined, the CTC sequence corresponding to the second text region is classified according to the attribute of the second target text, the filtered CTC sequence is obtained, and the first target text is obtained according to the category and the corresponding confidence degree in the filtered CTC sequence.
The first target text can represent the target text after the filtering, CTC sequence and re-identification are carried out on the basis of the second target text. The attribute of the second target text may be self-defined, or may be obtained through the second target text (for example, obtained according to a corresponding relationship between a position on the card where the second target text is located and the attribute). The confidence level may represent the probability that the category in the corresponding CTC sequence is the first target text. The filtered CTC sequence may only contain categories corresponding to attributes of the second target text and their confidence levels.
For example, in the case where the attribute of the second target text is "bank card number", non-numeric items in the CTC sequence may be filtered out, and only numeric items are retained, reducing the possibility of misidentification.
Step 1402 may refer to step S307 of fig. 4, and an example of a CTC sequence may refer to a CTC sequence containing a 7357 class and corresponding confidence as shown in fig. 10.
In one possible implementation, the method further includes: training the detection model and the recognition model according to the training sample to obtain a trained detection model and a trained recognition model; the training sample comprises a positive sample and a negative sample, the positive sample corresponds to the negative sample in a one-to-one mode, the positive sample comprises a card picture sample, the card picture sample comprises a text region, the negative sample comprises a card picture sample obtained after the text region is covered, the detection model after training is used for detecting the first text region and the second text region, and the recognition model after training is used for recognizing the first target text and the second target text.
According to the embodiment of the application, the trained detection model and the trained recognition model are obtained by training the detection model and the recognition model according to the training samples, the trained detection model is used for detecting the first text region and the second text region, the trained recognition model is used for recognizing the first target text and the second target text, the occupation of a ROM in the terminal equipment can be reduced, the terminal equipment is prevented from being stuck, the training samples comprise positive samples and negative samples, the positive samples correspond to the negative samples in a one-to-one mode, the positive samples comprise card picture samples, the card picture samples comprise text regions, the negative samples comprise card picture samples obtained after covering the text regions, the confrontation learning of the positive samples and the negative samples can be realized, and the discrimination of the detection model to the text regions and the non-text regions is enhanced, meanwhile, the recognition accuracy under a complex background is improved, the robustness of the model is improved, and the precision of the model is improved.
The identification model and the detection model may include a general OCR model, the type of the model is not limited in the present application, the training method may include fine tuning training, the mode of training the model may refer to step S101 in fig. 2, the card image sample may include the card image sample after performing random translation, random scaling, random rotation, perspective transformation, blurring processing, and random aspect ratio processing, and the mode of covering the card image sample may include filling pixels of a non-text region in the card image into a text region.
The manner of generating the negative sample may refer to steps S201 to S205 in fig. 3.
The steps of training the detection model and the recognition model according to the training sample to obtain the trained detection model and the trained recognition model can be executed on the terminal device or the server, and the terminal device can download at least one of the trained detection model and the trained recognition model from the server.
Fig. 15 shows a block diagram of a card text recognition apparatus according to an embodiment of the present application. As shown in fig. 15, the apparatus is for a terminal device, and includes:
the acquiring module 1501 is configured to acquire a first to-be-identified image of a card;
a first detecting module 1502, configured to detect the first image to be recognized to obtain at least one first text region, where the first text region represents a region where a text in the first image to be recognized is located;
the correction module 1503 is configured to perform rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized;
a second detecting module 1504, configured to detect the second image to be recognized to obtain at least one second text region, where the second text region represents a region where a text in the second image to be recognized is located;
the identifying module 1505 is configured to identify the image in the second text region to obtain a first target text corresponding to the second text region.
According to the embodiment of the application, the first image to be identified of the card is acquired, the first image to be identified is detected to obtain at least one first text region, the first image to be identified is rotationally corrected according to the first text region to obtain a second image to be identified, the second image to be identified is detected to obtain at least one second text region, the image in the second text region is identified to obtain the first target text corresponding to the second text region, the input of the first target text into the image and the output of the first target text into the card text content can be realized, the angle of the card text can be adjusted to a better state through the rotational correction, the text content identification of the inclined card image can be realized, the omission of the text region can be avoided through secondary detection, the detection accuracy of the text region of the inclined card image is improved, and the identification accuracy of the text content can also be improved, the device is used for the terminal equipment, so that the response is fast during detection and identification, the power consumption can be reduced, the problems of network disconnection and slow response caused by a cloud side calling method are avoided, and the experience of a user during use is improved.
In one possible implementation, the orthotic module comprises: and the first correction submodule is used for performing rotation correction on the first image to be recognized according to the average inclination angle of at least one text region with the longest length in the first text region to obtain the second image to be recognized.
According to the embodiment of the application, the first image to be recognized is subjected to rotation correction through the average inclination angles of the first text regions with the longest length, so that the correction accuracy can be improved, and the detection accuracy is further improved.
In one possible implementation, the second detection module includes: a first determination module to determine a horizontal slope of the second text region; and the second correction submodule is used for correcting the left edge and the right edge of the second text region according to the horizontal slope of the second text region, wherein after correction, the left edge and the right edge of the second text region are respectively vertical to the upper edge and/or the lower edge of the second text region.
According to the embodiment of the application, the horizontal slope of the second text region is determined, the left edge and the right edge of the second text region are corrected according to the horizontal slope of the second text region, and after correction, the left edge and the right edge of the second text region are respectively perpendicular to the upper edge and/or the lower edge of the second text region, so that the situation of word deformation after perspective transformation of the text region due to irregularity of the text region can be prevented, the text in the text region is easier to recognize, and the accuracy of the recognition of the card text is further improved.
In one possible implementation, the second detection module includes: a second determination module to determine a horizontal slope and a height of the second text region; and the extension module is used for respectively extending the upper edge and the lower edge of the second text region to two sides according to the horizontal slope of the second text region, and the extension distance is determined according to the height.
According to the embodiment of the application, the horizontal slope and the height of the second text area are determined, the upper edge and the lower edge of the second text area are respectively prolonged to two sides according to the horizontal slope of the second text area, the problem that characters are cut and missed due to the fact that the text area is too close to the text can be prevented, the text in the text area is easier to recognize, and therefore accuracy of card text recognition is further improved.
In one possible implementation, the identification module includes: the recognition sub-module is used for recognizing the image in the second text region to obtain a second target text corresponding to the second text region; a third determining module, configured to determine an attribute of the second target text; the filtering module is used for filtering the connection time sequence classification CTC sequence corresponding to the second text region according to the attribute of the second target text to obtain a filtered CTC sequence; and the fourth determining module is used for obtaining the first target text according to the category and the corresponding confidence coefficient in the filtered CTC sequence.
According to the embodiment of the application, the image in the second text region is recognized to obtain the second target text corresponding to the second text region, the attribute of the second target text is determined, the CTC sequence corresponding to the second text region is classified according to the attribute of the second target text, the filtered CTC sequence is obtained, and the first target text is obtained according to the category and the corresponding confidence degree in the filtered CTC sequence.
In one possible implementation, the apparatus further includes: the training module is used for training the detection model and the recognition model according to the training sample to obtain a trained detection model and a trained recognition model; the training sample comprises a positive sample and a negative sample, the positive sample corresponds to the negative sample in a one-to-one mode, the positive sample comprises a card picture sample, the card picture sample comprises a text region, the negative sample comprises the card picture sample obtained after the text region is covered, the detection model after the training is used for detecting the first text region and the second text region, and the recognition model after the training is used for recognizing the first target text and the second target text.
According to the embodiment of the application, the detection model and the recognition model are trained according to the training sample to obtain the trained detection model and the trained recognition model, the trained detection model is used for detecting the first text region and the second text region, the trained recognition model is used for recognizing the first target text and the second target text, the occupation of a ROM in the terminal equipment can be reduced, the terminal equipment is prevented from being stuck, the training sample comprises the positive sample and the negative sample, the positive sample corresponds to the negative sample one by one, the positive sample comprises the card picture sample, the card picture sample comprises the text region, the negative sample comprises the card picture sample obtained after covering the text region, the antagonistic learning of the positive sample and the negative sample can be realized, and the distinguishing degree of the detection model for the text region and the non-text region is enhanced, meanwhile, the recognition accuracy under a complex background is improved, the robustness of the model is improved, and the precision of the model is improved.
Fig. 16 shows a schematic structural diagram of a terminal device according to an embodiment of the present application. Taking the terminal device as a mobile phone as an example, fig. 16 shows a schematic structural diagram of the mobile phone 200.
The mobile phone 200 may include a processor 210, an external memory interface 220, an internal memory 221, a USB interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 251, a wireless communication module 252, an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, a sensor module 280, keys 290, a motor 291, an indicator 292, a camera 293, a display 294, a SIM card interface 295, and the like. The sensor module 280 may include a gyroscope sensor 280A, an acceleration sensor 280B, a proximity light sensor 280G, a fingerprint sensor 280H, and a touch sensor 280K (of course, the mobile phone 200 may further include other sensors, such as a temperature sensor, a pressure sensor, a distance sensor, a magnetic sensor, an ambient light sensor, an air pressure sensor, a bone conduction sensor, and the like, which are not shown in the figure).
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the mobile phone 200. In other embodiments of the present application, handset 200 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 210 may include one or more processing units, such as: the processor 210 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a Neural-Network Processing Unit (NPU), among others. Wherein, the different processing units may be independent devices or may be integrated in one or more processors. Wherein the controller can be the neural center and the command center of the cell phone 200. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 210 for storing instructions and data. In some embodiments, the memory in the processor 210 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 210. If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 210, thereby increasing the efficiency of the system.
The processor 210 may operate the card text recognition method provided in the embodiment of the present application, so as to obtain a first to-be-recognized image of a card; detecting the first image to be recognized to obtain at least one first text region, wherein the first text region represents a region where a text in the first image to be recognized is located; according to the first text area, performing rotation correction on the first image to be recognized to obtain a second image to be recognized; detecting the second image to be recognized to obtain at least one second text region, wherein the second text region represents a region where a text in the second image to be recognized is located; the image in the second text region is identified to obtain the first target text corresponding to the second text region, so that the identification accuracy of text content is improved, the response is fast during detection and identification, meanwhile, the power consumption is reduced, the problems of network disconnection and slow response caused by a cloud side calling method are avoided, and the user experience during use is improved. The processor 210 may include different devices, for example, when the CPU and the GPU are integrated, the CPU and the GPU may cooperate to execute the card text recognition method provided in the embodiment of the present application, for example, part of algorithms in the card text recognition method is executed by the CPU, and another part of algorithms is executed by the GPU, so as to obtain faster processing efficiency.
The display screen 294 is used to display images, video, and the like. The display screen 294 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the cell phone 200 may include 1 or N display screens 294, N being a positive integer greater than 1. The display screen 294 may be used to display information input by or provided to the user as well as various Graphical User Interfaces (GUIs). For example, the display 294 may display a photograph, video, web page, or file, etc. As another example, the display 294 may display a graphical user interface. The graphical user interface comprises a status bar, a hidden navigation bar, a time and weather widget (widget) and an application icon, such as a browser icon. The status bar includes the name of the operator (e.g., china mobile), the mobile network (e.g., 4G), the time and the remaining power. The navigation bar includes a back key icon, a home key icon, and a forward key icon. Further, it will be appreciated that in some embodiments, a bluetooth icon, Wi-Fi icon, peripheral icon, etc. may also be included in the status bar. It will also be appreciated that in other embodiments, a Dock bar may also be included in the graphical user interface, and that a commonly used application icon may be included in the Dock bar, etc. When the processor 210 detects a touch event of a finger (or a stylus, etc.) of a user with respect to an application icon, in response to the touch event, a user interface of an application corresponding to the application icon is opened and displayed on the display 294.
In the embodiment of the present application, the display screen 294 may be an integrated flexible display screen, or a spliced display screen formed by two rigid screens and a flexible screen located between the two rigid screens may be adopted.
After the processor 210 runs the card text recognition method provided by the embodiment of the present application, the terminal device may establish a connection with another terminal device through the antenna 1, the antenna 2, or the USB interface, and control the display screen 294 to display a corresponding graphical user interface according to the card text recognition method provided by the embodiment of the present application.
The cameras 293 (front camera or rear camera, or one camera may be used as both front camera and rear camera) are used for capturing still images or video. In general, the camera 293 may include a photosensitive element such as a lens group including a plurality of lenses (convex or concave) for collecting an optical signal reflected by an object to be photographed and transferring the collected optical signal to an image sensor, and an image sensor. And the image sensor generates an original image of the object to be shot according to the optical signal.
Internal memory 221 may be used to store computer-executable program code, including instructions. The processor 210 executes various functional applications and data processing of the cellular phone 200 by executing instructions stored in the internal memory 221. The internal memory 221 may include a program storage area and a data storage area. Wherein the storage program area may store an operating system, codes of application programs (such as a camera application, a WeChat application, etc.), and the like. The data storage area can store data (such as images, videos and the like acquired by a camera application) and the like created in the use process of the mobile phone 200.
The internal memory 221 may further store one or more computer programs 1310 corresponding to the card text recognition method provided by the embodiment of the present application. The one or more computer programs 1304 are stored in the memory 221 and configured to be executed by the one or more processors 210, the one or more computer programs 1310 including instructions that may be used to perform the steps in the respective embodiments of fig. 2-5, 8-9, and 11-14, the computer programs 1310 may include an acquisition module 1501, a first detection module 1502, a rectification module 1503, a second detection module 1504, and an identification module 1505. The obtaining module 1501 is configured to obtain a first to-be-identified image of a card; a first detecting module 1502, configured to detect the first image to be recognized to obtain at least one first text region, where the first text region represents a region where a text in the first image to be recognized is located; the correction module 1503 is configured to perform rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized; a second detecting module 1504, configured to detect the second image to be recognized to obtain at least one second text region, where the second text region represents a region where a text in the second image to be recognized is located; the identifying module 1505 is configured to identify the image in the second text region to obtain a first target text corresponding to the second text region. When the code of the card text recognition method stored in the internal memory 221 is executed by the processor 210, the processor 210 may control the display screen to display the recognition result.
In addition, the internal memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.
Of course, the code of the card text recognition method provided in the embodiment of the present application may also be stored in the external memory. In this case, the processor 210 may execute the code of the card text recognition method stored in the external memory through the external memory interface 220.
The function of the sensor module 280 is described below.
The gyro sensor 280A can be used to determine the motion attitude of the cellular phone 200. In some embodiments, the angular velocity of the cell phone 200 about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 280A. I.e., the gyro sensor 280A may be used to detect the current state of motion of the handset 200, such as shaking or standing still.
When the display screen in the embodiment of the present application is a foldable screen, the gyro sensor 280A may be used to detect a folding or unfolding operation acting on the display screen 294. The gyro sensor 280A may report the detected folding operation or unfolding operation as an event to the processor 210 to determine the folded state or unfolded state of the display screen 294.
The acceleration sensor 280B can detect the magnitude of acceleration of the cellular phone 200 in various directions (typically three axes). I.e., the gyro sensor 280A may be used to detect the current motion state of the cellular phone 200, such as shaking or being still. When the display screen in the embodiment of the present application is a foldable screen, the acceleration sensor 280B may be used to detect a folding or unfolding operation acting on the display screen 294. The acceleration sensor 280B may report the detected folding operation or unfolding operation as an event to the processor 210 to determine the folded state or unfolded state of the display screen 294.
The proximity light sensor 280G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The mobile phone emits infrared light outwards through the light emitting diode. The handset uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the handset. When insufficient reflected light is detected, the handset can determine that there are no objects near the handset. When the display screen in the embodiment of the present application is a foldable display screen, the proximity light sensor 280G may be disposed on a first screen of the foldable display screen 294, and the proximity light sensor 280G may detect a folding angle or an unfolding angle of the first screen and the second screen according to an optical path difference of an infrared signal.
The gyro sensor 280A (or the acceleration sensor 280B) may transmit the detected motion state information (such as an angular velocity) to the processor 210. The processor 210 determines whether the mobile phone 200 is currently in the hand-held state or the tripod state (for example, when the angular velocity is not 0, it indicates that the mobile phone 200 is in the hand-held state) based on the motion state information.
The fingerprint sensor 280H is used to collect a fingerprint. The mobile phone 200 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.
The touch sensor 280K is also referred to as a "touch panel". The touch sensor 280K may be disposed on the display screen 294, and the touch sensor 280K and the display screen 294 form a touch screen, which is also called a "touch screen". The touch sensor 280K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operations may be provided through the display screen 294. In other embodiments, the touch sensor 280K can be disposed on the surface of the mobile phone 200, different from the position of the display screen 294.
Illustratively, the display screen 294 of the cell phone 200 displays a home interface that includes icons for a plurality of applications (e.g., a camera application, a WeChat application, etc.). The user clicks an icon of the camera application in the main interface through the touch sensor 280K, and the processor 210 is triggered to start the camera application and open the camera 293. Display screen 294 displays an interface, such as a viewfinder interface, for a camera application.
The wireless communication function of the mobile phone 200 can be realized by the antenna 1, the antenna 2, the mobile communication module 251, the wireless communication module 252, the modem processor, the baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 200 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 251 can provide a solution including 2G/3G/4G/5G wireless communication applied to the handset 200. The mobile communication module 251 may include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), etc. The mobile communication module 251 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit the electromagnetic waves to the modem for demodulation. The mobile communication module 251 can also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 251 may be disposed in the processor 210. In some embodiments, at least some of the functional modules of the mobile communication module 251 may be disposed in the same device as at least some of the modules of the processor 210. In this embodiment, the mobile communication module 251 may also be used for information interaction with other terminal devices.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 270A, the receiver 270B, etc.) or displays an image or video through the display screen 294. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 251 or other functional modules, independent of the processor 210.
The wireless communication module 252 may provide solutions for wireless communication applied to the mobile phone 200, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 252 may be one or more devices that integrate at least one communication processing module. The wireless communication module 252 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 210. The wireless communication module 252 may also receive a signal to be transmitted from the processor 210, perform frequency modulation on the signal, amplify the signal, and convert the signal into electromagnetic waves via the antenna 2 to radiate the electromagnetic waves. In this embodiment, the wireless communication module 252 is configured to transmit data with other terminal devices under the control of the processor 210.
In addition, the mobile phone 200 can implement an audio function through the audio module 270, the speaker 270A, the receiver 270B, the microphone 270C, the earphone interface 270D, and the application processor. Such as music playing, recording, etc. The handset 200 may receive key 290 inputs, generating key signal inputs relating to user settings and function control of the handset 200. The cell phone 200 can generate a vibration alert (e.g., an incoming call vibration alert) using the motor 291. The indicator 292 in the mobile phone 200 may be an indicator light, and may be used to indicate a charging status, a power change, or an indication message, a missed call, a notification, or the like. The SIM card interface 295 in the handset 200 is used to connect a SIM card. The SIM card can be attached to and detached from the mobile phone 200 by being inserted into the SIM card interface 295 or being pulled out from the SIM card interface 295.
It should be understood that in practical applications, the mobile phone 200 may include more or less components than those shown in fig. 16, and the embodiment of the present application is not limited thereto. The illustrated handset 200 is merely an example, and the handset 200 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
The software system of the terminal device may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of a terminal device.
Fig. 17 is a software configuration block diagram of a terminal device according to an embodiment of the present application.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages.
As shown in fig. 17, the application package may include phone, camera, gallery, calendar, call, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 17, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.
The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The telephone manager is used for providing a communication function of the terminal equipment. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a brief dwell, and does not require user interaction. Such as a notification manager used to notify download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the terminal device vibrates, and an indicator light flashes.
The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The embodiment of the application provides a card text recognition device, includes: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to implement the above method when executing the instructions.
Embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
Embodiments of the present application provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.
The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable Programmable Read-Only Memory (EPROM or flash Memory), a Static Random Access Memory (SRAM), a portable Compact Disc Read-Only Memory (CD-ROM), a Digital Versatile Disc (DVD), a Memory stick, a floppy disk, a mechanical coding device, a punch card or an in-groove protrusion structure, for example, having instructions stored thereon, and any suitable combination of the foregoing.
The computer readable program instructions or code described herein may be downloaded to the respective computing/processing device from a computer readable storage medium, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present application may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize custom electronic circuitry, such as Programmable Logic circuits, Field-Programmable Gate arrays (FPGAs), or Programmable Logic Arrays (PLAs).
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
It is also noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by hardware (e.g., an electronic Circuit or an ASIC (Application Specific Integrated Circuit)) for performing the corresponding functions or acts, or combinations of hardware and software, such as firmware.
While the invention has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The foregoing description of the embodiments of the present application has been presented for purposes of illustration and description and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A card text recognition method is used for a terminal device, and comprises the following steps:
acquiring a first to-be-identified image of a card;
detecting the first image to be recognized to obtain at least one first text region, wherein the first text region represents a region where a text in the first image to be recognized is located;
according to the first text area, performing rotation correction on the first image to be recognized to obtain a second image to be recognized;
detecting the second image to be recognized to obtain at least one second text region, wherein the second text region represents a region where a text in the second image to be recognized is located;
and identifying the image in the second text region to obtain a first target text corresponding to the second text region.
2. The method for recognizing the card text according to claim 1, wherein performing rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized comprises:
and performing rotation correction on the first image to be recognized according to the average inclination angle of at least one text region with the longest length in the first text region to obtain the second image to be recognized.
3. The method according to claim 1 or 2, wherein detecting the second image to be recognized to obtain at least one second text region comprises:
determining a horizontal slope of the second text region;
and correcting the left edge and the right edge of the second text region according to the horizontal slope of the second text region, wherein after correction, the left edge and the right edge of the second text region are respectively perpendicular to the upper edge and/or the lower edge of the second text region.
4. The method for recognizing the card text according to any one of claims 1 to 3, wherein detecting the second image to be recognized to obtain at least one second text region comprises:
determining a horizontal slope and a height of the second text region;
and according to the horizontal slope of the second text area, respectively extending the upper edge and the lower edge of the second text area to two sides, wherein the extending distance is determined according to the height.
5. The method for recognizing the card text according to any one of claims 1 to 4, wherein recognizing the image in the second text region to obtain the first target text corresponding to the second text region comprises:
identifying the image in the second text area to obtain a second target text corresponding to the second text area;
determining attributes of the second target text;
filtering the connection time sequence classification CTC sequence corresponding to the second text region according to the attribute of the second target text to obtain a filtered CTC sequence;
and obtaining the first target text according to the category and the corresponding confidence in the filtered CTC sequence.
6. The method of any of claims 1-5, wherein the method further comprises:
training the detection model and the recognition model according to the training samples to obtain a trained detection model and a trained recognition model;
wherein the training sample comprises a positive sample and a negative sample, the positive sample corresponds to the negative sample one by one, the positive sample comprises a card picture sample, the card picture sample comprises a text area, the negative sample comprises a card picture sample obtained after covering the text area,
the trained detection model is used for detecting the first text area and the second text area, and the trained recognition model is used for recognizing the first target text and the second target text.
7. The card text recognition device is characterized in that the device is used for terminal equipment, and the device comprises:
the acquisition module is used for acquiring a first to-be-identified image of the card;
the first detection module is used for detecting the first image to be recognized to obtain at least one first text region, wherein the first text region represents a region where a text in the first image to be recognized is located;
the correction module is used for performing rotation correction on the first image to be recognized according to the first text region to obtain a second image to be recognized;
the second detection module is used for detecting the second image to be recognized to obtain at least one second text region, and the second text region represents a region where a text in the second image to be recognized is located;
and the identification module is used for identifying the image in the second text region to obtain a first target text corresponding to the second text region.
8. A card text recognition apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the method of any one of claims 1-6 when executing the instructions.
9. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of claims 1-6.
10. A computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the method of any of claims 1-6.
CN202110213987.5A 2021-02-25 2021-02-25 Card text recognition method, device and storage medium Pending CN115050037A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110213987.5A CN115050037A (en) 2021-02-25 2021-02-25 Card text recognition method, device and storage medium
PCT/CN2022/077038 WO2022179471A1 (en) 2021-02-25 2022-02-21 Card text recognition method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110213987.5A CN115050037A (en) 2021-02-25 2021-02-25 Card text recognition method, device and storage medium

Publications (1)

Publication Number Publication Date
CN115050037A true CN115050037A (en) 2022-09-13

Family

ID=83048674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110213987.5A Pending CN115050037A (en) 2021-02-25 2021-02-25 Card text recognition method, device and storage medium

Country Status (2)

Country Link
CN (1) CN115050037A (en)
WO (1) WO2022179471A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975466B (en) * 2024-04-01 2024-06-25 山东浪潮科学研究院有限公司 Universal scene card identification system based on layout analysis

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8867828B2 (en) * 2011-03-04 2014-10-21 Qualcomm Incorporated Text region detection system and method
CN108694393A (en) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 A kind of certificate image text area extraction method based on depth convolution
CN110136069B (en) * 2019-05-07 2023-05-16 语联网(武汉)信息技术有限公司 Text image correction method and device and electronic equipment
CN110647882A (en) * 2019-09-20 2020-01-03 上海眼控科技股份有限公司 Image correction method, device, equipment and storage medium
CN111444908B (en) * 2020-03-25 2024-02-02 腾讯科技(深圳)有限公司 Image recognition method, device, terminal and storage medium

Also Published As

Publication number Publication date
WO2022179471A1 (en) 2022-09-01

Similar Documents

Publication Publication Date Title
US20230099824A1 (en) Interface layout method, apparatus, and system
US11748054B2 (en) Screen projection method and terminal device
CN110059685B (en) Character area detection method, device and storage medium
WO2022035697A9 (en) Machine-readable label generator
CN114520868B (en) Video processing method, device and storage medium
US11893767B2 (en) Text recognition method and apparatus
CN110554816A (en) Interface generation method and equipment
WO2022042425A1 (en) Video data processing method and apparatus, and computer device and storage medium
US20230224574A1 (en) Photographing method and apparatus
US20240125603A1 (en) Road Recognition Method and Apparatus
WO2022179471A1 (en) Card text recognition method and apparatus, and storage medium
CN115147288A (en) Image processing method and electronic device
CN114520867B (en) Camera control method based on distributed control and terminal equipment
CN114489429B (en) Terminal equipment, long screen capturing method and storage medium
WO2022194005A1 (en) Control method and system for synchronous display across devices
EP4231628A1 (en) Image processing method and device
CN116954409A (en) Application display method and device and storage medium
EP4273679A1 (en) Method and apparatus for executing control operation, storage medium, and control
CN114513760B (en) Font library synchronization method, device and storage medium
CN117135448B (en) Shooting method and electronic equipment
WO2023280077A1 (en) Image correction method, apparatus, and storage medium
CN109155080A (en) For handling the method, apparatus and recording medium of image
CN114564141A (en) Text extraction method and device
CN115456895A (en) Image acquisition method and device for foggy scene
CN117132479A (en) Moire pattern eliminating method, electronic device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination