WO2022198969A1 - ***文本识别方法、装置、设备及计算机可读存储介质 - Google Patents

***文本识别方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022198969A1
WO2022198969A1 PCT/CN2021/121297 CN2021121297W WO2022198969A1 WO 2022198969 A1 WO2022198969 A1 WO 2022198969A1 CN 2021121297 W CN2021121297 W CN 2021121297W WO 2022198969 A1 WO2022198969 A1 WO 2022198969A1
Authority
WO
WIPO (PCT)
Prior art keywords
seal
text
detection frame
seal detection
feature data
Prior art date
Application number
PCT/CN2021/121297
Other languages
English (en)
French (fr)
Inventor
张正夫
梁鼎
刘旭
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2022198969A1 publication Critical patent/WO2022198969A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to computer vision technology, and in particular, to a seal text recognition method, apparatus, device, and computer-readable storage medium.
  • Seal text detection and recognition has a wide range of application scenarios in the field of Optical Character Recognition (OCR). Because the text arrangement in the seal is usually irregular, the direction of the seal is various, and the text line may have a large inclination angle, so the text recognition of the seal is difficult.
  • OCR Optical Character Recognition
  • the embodiments of the present disclosure provide a seal text recognition scheme.
  • a seal text recognition method comprising: acquiring a seal detection result of an object to be processed, wherein the seal detection result includes a seal detection frame and angle information of the seal detection frame; Based on the seal detection result, a positive seal detection frame and a seal image in the seal detection frame are obtained; the outline of at least one text line in the seal image is obtained; and at least one text line is cut out according to the outline of the at least one text line.
  • a text line image; text recognition is performed on each of the text line images to obtain a text recognition result.
  • a seal detection network is used to perform seal detection on the object to be processed, and a seal detection result is obtained, wherein the classifier of the seal detection network includes a classifier for identifying the seal detection frame. Angled branch.
  • the method further includes: training the seal detection network using a sample set including a plurality of sample seal images, wherein the sample seal images are marked with the real detection frame of the seal.
  • Direction information wherein the direction information is determined according to the angle between the text reading direction of the text in the seal and the horizontal direction.
  • obtaining a positive seal detection frame and a seal image in the seal detection frame based on the seal detection result includes: adjusting all the seal detection frames according to the angle information of the seal detection frame.
  • the direction correction is performed on the seal detection frame, and the seal image is cropped from the direction-corrected seal detection frame.
  • the performing direction correction on the seal detection frame according to the angle information of the seal detection frame includes: performing a correction on the seal detection frame according to the angle information of the seal detection frame.
  • the rotation transformation is performed so that the direction of the first side of the seal detection frame is the horizontal direction, wherein the first side is the side matching the reading direction of the characters in the seal detection frame.
  • the obtaining the outline of at least one text line in the seal image includes: performing text detection on the seal image to obtain the outline of at least one text line included in the seal image. mask and the cluster center of each of the text lines; the connected domain is divided according to the cluster center of each of the text lines to obtain the initial area of each of the text lines; in the mask of the multiple text lines Within the range of the modulus, the initial area of each of the text lines is expanded to determine the pixel area corresponding to each text line; the pixel area corresponding to the text line is fitted with polygons to obtain each of the text lines. Outlines of lines of text.
  • the method further includes, for each of the text line images cropped out, setting the pixels outside the outline of each of the text lines to zero.
  • the performing text recognition on each of the text images to obtain a text recognition result includes: performing feature extraction on the text line images through a convolutional neural network to obtain multiple channel The first feature data, wherein the first feature data includes a height dimension and a width dimension; the height dimension and the width dimension of the first feature data are compressed into one dimension to obtain the second feature data; the attention mechanism is used to The feature vector of each position of the second feature data is weighted and summed to obtain third feature data with the same value in each dimension as the first feature data; according to the third feature data, a text recognition result is obtained.
  • the use of an attention mechanism to perform weighted summation processing on feature vectors of each position of the second feature data to obtain the third feature data includes: determining the second feature data.
  • the similarity between the feature vector of the first position of the feature data and the feature vector of each position of the second feature data, and the first position is any position in the second feature data;
  • the feature vectors of each position of the second feature data are weighted and summed to obtain the updated feature vector of the first position; according to the updated feature vectors of each position in the second feature data, the third feature data is obtained .
  • a seal text recognition device the device includes: a first acquisition unit for acquiring a seal detection result of an object to be processed, wherein the seal detection result includes a seal detection frame and the angle information of the seal detection frame; a second acquisition unit for obtaining a positive seal detection frame and a seal image in the seal detection frame based on the seal detection result; a third acquisition unit for obtaining the seal The outline of at least one text line in the image, and at least one text line image is cut out according to the outline of the at least one text line; the recognition unit is used for text recognition on each of the text line images to obtain a text recognition result.
  • the first obtaining unit is specifically configured to: perform seal detection on the object to be processed through a seal detection network, and obtain a seal detection result, wherein the classifier of the seal detection network A branch is included for identifying the angle of the stamp detection frame.
  • the apparatus further includes a training unit for training the seal detection network using a sample set including a plurality of sample seal images, wherein the sample seal images are marked with The direction information of the real detection frame, wherein the direction information is determined according to the angle between the text reading direction of the text in the seal and the horizontal direction.
  • the second acquisition unit is specifically configured to: perform direction correction on the seal detection frame according to the angle information of the seal detection frame, and cut out the direction-corrected seal detection frame Print out the stamp image.
  • the second acquisition unit when used to correct the direction of the seal detection frame according to the angle information of the seal detection frame, it is specifically used for: according to the angle information of the seal detection frame. angle information, the seal detection frame is rotated and transformed, so that the direction of the first side of the seal detection frame is the horizontal direction, wherein the first side matches the reading direction of the text in the seal detection frame edge.
  • the third obtaining unit when used to obtain the outline of at least one text line in the seal image, it is specifically configured to: perform text detection on the seal image to obtain the seal The mask of at least one text line contained in the image and the cluster center of each text line; the connected domain is divided according to the cluster center of each text line to obtain the initial area of each text line ; Within the scope of the mask of the plurality of text lines, the initial area of each of the text lines is expanded to determine the pixel area corresponding to each text line; Use polygons to the pixels corresponding to the text lines Regions are fitted to obtain the contours of each of the text lines.
  • the apparatus further includes a zero setting unit, configured to zero the pixels outside the outline of each text line for each of the text line images that are cropped out.
  • the identifying unit is specifically configured to: perform feature extraction on the text line image through a convolutional neural network to obtain first feature data of multiple channels, wherein the first feature The data includes a height dimension and a width dimension; the height dimension and the width dimension of the first feature data are compressed into one dimension to obtain the second feature data; the feature vector of each position of the second feature data is performed using the attention mechanism.
  • a weighted summation process is used to obtain third feature data with the same value in each dimension as the first feature data; and a text recognition result is obtained according to the third feature data.
  • the identification unit when used to perform weighted summation processing on the feature vector of each position of the second feature data by using the attention mechanism to obtain the third feature data, specifically: For: determining the similarity between the feature vector of the first position of the second feature data and the feature vector of each position of the second feature data, and the first position is any position in the second feature data; Weighted summation is performed on the feature vectors of each position of the second feature data according to the similarity to obtain the updated feature vector of the first position; according to the updated features of each position in the second feature data vector to obtain the third feature data.
  • an electronic device comprising a memory and a processor, the memory for storing computer instructions executable on the processor, the processor for when executing the computer instructions.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the seal text recognition method according to any embodiment of the present disclosure.
  • the embodiment of the present disclosure utilizes the seal detection result to obtain a positive seal detection frame and a seal image in the seal detection frame based on the angle information of the seal detection frame, and cuts out the outline of at least one text line in the seal image. At least one text line image is obtained, and text recognition is performed on each of the text line images to obtain a text recognition result.
  • text detection is improved and the accuracy of subsequent text recognition; by performing text recognition on a text line image cut out according to the outline of at least one text line in the seal image, the interference of the non-seal area on the seal text recognition is eliminated, and the seal text recognition is further improved. accuracy and efficiency.
  • Fig. 1 is a flow chart of a seal text recognition method proposed by at least one embodiment of the present disclosure
  • Fig. 2 is the labeling schematic diagram of the circular seal image
  • FIG. 3 is a flowchart of a method for obtaining the outline of a text line in a seal image provided by at least one embodiment of the present disclosure
  • 4A is a schematic diagram of an oval seal image
  • Figure 4B is a mask diagram of three text lines in the seal image of Figure 4A;
  • Fig. 4C is the cluster center of three text lines in the seal image of Fig. 4A;
  • 4D is a schematic diagram of the labeling of the expanded initial area corresponding to each text line in the seal image of FIG. 4A;
  • FIG. 5 is a flowchart of a method for text recognition on a text image proposed by at least one embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of a seal text recognition device proposed by at least one embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by at least one embodiment of the present disclosure.
  • Embodiments of the present disclosure may be applied to computer systems/servers that are operable with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments and/or configurations suitable for use with computer systems/servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, Microprocessor systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, among others.
  • FIG. 1 is a flowchart of a method for recognizing seal text according to at least one embodiment of the present disclosure. As shown in FIG. 1 , the method includes steps 101 to 105 .
  • step 101 a seal detection result of the object to be processed is acquired, wherein the seal detection result includes at least a seal detection frame and angle information of the seal detection frame.
  • the object to be processed is an image, an electronic document, etc. that may include a seal, such as a passport image, an invoice image, an order image, a pdf document of a design drawing, and the like.
  • a seal such as a passport image, an invoice image, an order image, a pdf document of a design drawing, and the like.
  • the obtained seal detection result includes a seal detection frame, and the seal detection frame has angle information.
  • the seal detection frame may be the circumscribed rectangle detection frame of the seal
  • the angle information may include the rotation angle of the circumscribed rectangle detection frame relative to the horizontal direction, where the horizontal direction is in the image coordinate system of the object to be processed the horizontal direction.
  • step 102 based on the seal detection result, a positive seal detection frame and a seal image in the seal detection frame are obtained.
  • the positive direction of the seal detection frame means that the direction of the first side of the seal detection frame is the horizontal direction, and the horizontal direction is the horizontal direction in the image coordinate system of the seal image.
  • the first side is the side matching the reading direction of the text in the stamp detection frame.
  • the character reading direction may be determined according to the orientation of the characters in the text in the seal detection frame.
  • the direction parallel to the center line of the text is the text reading direction.
  • the overall reading direction of the text is determined according to the overall shape of the text.
  • the direction parallel to the text connecting lines at both ends of the text can be determined as the overall reading direction of the text.
  • the reading direction of the text in the seal detection frame may be determined in an appropriate manner according to the actual situation, which is not limited in the present disclosure.
  • the direction correction of the seal detection frame can be performed to obtain the direction-corrected seal detection frame.
  • the direction correction may be performed on the object to be processed containing the seal, such as a bill, according to the angle information of the seal detection frame, so that the seal detection frame in the object to be processed after the direction correction is straight and crop out the stamp image in the positive stamp detection frame.
  • the stamp image cropped from the seal detection frame is also in a positive direction, which may also be referred to as a positive seal image hereinafter.
  • the image corresponding to the seal detection frame in the object to be processed may be cropped out, and the orientation correction is performed on the cropped image according to the angle information of the seal detection frame, In order to obtain the positive seal detection frame and the seal image in the seal detection frame.
  • step 103 the outline of at least one text line in the seal image is acquired.
  • step 104 at least one text line image is cropped according to the outline of the at least one text line.
  • the seal image may include multiple text lines, and the shape of the text lines may be arranged in a straight line or an oblique line, or may be arranged in a polyline or a curved line.
  • the text line image corresponding to the text line can be cut out according to the outline of one of the text lines itself; the text line image corresponding to the multiple text lines can also be cut out according to the minimum circumscribed rectangle of the outlines of the multiple text lines.
  • step 105 text recognition is performed on the text line image to obtain a text recognition result.
  • a positive seal detection frame and a seal image in the seal detection frame are obtained based on the angle information of the seal detection frame, and at least one text line is cut out according to the outline of at least one text line in the seal image.
  • a text line image, performing text recognition on the text line image to obtain a text recognition result is obtained.
  • the seal detection frame can be rotated and transformed according to the angle information of the seal detection frame, so that the direction of the first side of the seal detection frame is the horizontal direction, so as to realize the detection of the seal Orientation correction of the detection frame.
  • the first side is the side matching the character reading direction in the seal detection frame
  • the horizontal direction is the horizontal direction in the image coordinate system of the seal image.
  • seal detection may be performed on the object to be processed through a seal detection network to obtain a seal detection result.
  • the seal detection frame in the seal detection result may have angle information, coordinate information and classification information.
  • the seal detection network can be implemented by, for example, RCNN (Region-based Convolutional Neural Network, region-based convolutional neural network) or Faster RCNN (Faster Region-based Convolutional Neural Network, fast region-based convolutional neural network) ) network obtained.
  • RCNN Registered-based Convolutional Neural Network, region-based convolutional neural network
  • Faster RCNN Faster Region-based Convolutional Neural Network, fast region-based convolutional neural network
  • the seal detection network may be trained using a set of sample seal images.
  • the sample seal images in the sample seal image set are marked with the real detection frame of the included seal, and the marked information includes parameter information of the real detection frame.
  • the parameter information includes coordinates of four vertices of the real detection frame, and angle information between the real detection frame and the horizontal direction.
  • the loss function for training the seal detection network includes the difference between the angle information of the predicted seal detection frame and the angle information of the real detection frame.
  • Figure 2 shows an example of the callout of a circular seal.
  • the angle of the included angle 203 between the character reading direction 202 of the seal and the horizontal direction 201 in the seal image coordinate system is marked as the angle of the circular seal.
  • the direction parallel to the center line of the text in the straight text is the text reading direction 202 .
  • the overall reading direction of the text can be determined according to the shape of the text as a whole.
  • the direction parallel to the character connecting lines at both ends of the text can be determined as the overall reading direction 204 of the characters of the seal.
  • the text reading direction of the circular seal may be determined in an appropriate manner in actual conditions, which is not limited in the present disclosure.
  • angle information of the seal detection frame can be obtained to correct the direction of the seal detection frame, thereby improving the accuracy of seal detection and recognition.
  • affine transformation may be performed on the seal detection frame in the to-be-processed image according to the angle information of the seal detection frame, so as to obtain the seal detection frame whose direction is corrected to be positive and the seal detection frame seal image.
  • the embodiment of the present disclosure obtains a positive seal image by performing the seal detection and angle regression algorithm as the input for subsequent seal text detection.
  • the whole image is directly used as the input for the seal text detection, and has the following beneficial techniques Effect:
  • the interference of the non-stamp image area on the seal text detection is eliminated, and the stamp text detection accuracy is improved; by correcting the direction of the stamp detection frame, the text detection and recognition of the positive stamp image can be performed, and the stamp text detection is improved. and recognition accuracy; the input resolution and input volume of the seal text detection algorithm are reduced, and the efficiency of seal text detection and recognition is improved.
  • the outline of at least one text line in the stamp image can be obtained by the following method. As shown in FIG. 3 , the method for obtaining the outline of a text line in a seal image may include steps 301 to 304 .
  • step 301 text detection is performed on the seal image, and a mask of at least one text line included in the seal image and a cluster center of each text line are obtained.
  • feature extraction may be performed on the seal image first to obtain feature data of the seal image; and image segmentation may be performed based on the feature data to obtain the signature of at least one text line included in the seal image. mask and the cluster center for each text line.
  • the above process of feature extraction and image segmentation based on feature data may be implemented through a text detection network.
  • the text detection network may include a feature extraction backbone network for performing feature extraction on the seal image, and an image segmentation network for performing image segmentation based on the feature data.
  • the feature extraction backbone network may be a convolutional neural network such as ResNet18.
  • the feature data extracted by the feature extraction backbone network may be multi-channel feature data, and the size of the feature data and the number of channels may be determined by the specific structure of the feature extraction backbone network.
  • the image segmentation network can be used to: upsample the feature data of each channel, and concatenate the upsampled features of each channel to obtain a concatenated result that fuses features of different scales; then , and obtain the mask of the text area and the cluster center of each text line according to the concatenation result.
  • the seal image contains only one text line, a mask for the text line is obtained; when the seal contains multiple text lines, a mask for multiple text lines is obtained.
  • the resulting multiple text lines may not be well masked.
  • the mask distinguishes the corresponding mask of each text line in the mold.
  • FIG. 4A shows a schematic diagram of an oval stamp image, which includes a text line 411 arranged in an arc and two text lines 412 and 413 arranged in a straight line.
  • step 302 the connected domain is divided according to the cluster center of each of the text lines to obtain the initial area of each of the text lines.
  • the cluster center map of each text line is a binary image, that is, the cluster center map of multiple text lines cannot be distinguished from the cluster center map. Therefore, in the embodiment of the present disclosure, the connected domain is divided according to the cluster center of each text line, and an initial region independent of each text line without overlapping regions can be obtained. Since each text line should belong to different connected domains, each text line can be distinguished.
  • step 303 within the scope of the mask of the at least one text line, the initial area of each text line is extended to determine the pixel area corresponding to each text line.
  • the expanded area at this time is obtained as the pixel area corresponding to the text line .
  • the pixel regions corresponding to each text line obtained by extending the initial region are also independent. As shown in FIG. 4D , the pixel regions 441 , 442 , and 443 corresponding to different text lines may be represented by different pixel values to distinguish them.
  • step 304 a polygon is used to fit the pixel area corresponding to each of the text lines to obtain the outline of each of the text lines.
  • a polygon is used to fit the pixel area corresponding to the text line obtained in step 304 to obtain the outline of the text line.
  • the initial region of the text line is obtained by dividing the connected domain according to the cluster center of each text line, and the initial region is extended to coincide with the mask of the text line to obtain the For the pixel area corresponding to the text line, the outline of the text line can be determined by performing polygon fitting according to the obtained pixel area. In this way, through the above method, an accurate outline of text lines can be obtained, and each text line can be instance segmented. Even when two adjacent text lines are close in distance or have overlapping areas, each text can be detected separately. The outline of the line is beneficial to improve the effect of subsequent text recognition.
  • the minimum enclosing rectangle of the outline of the at least one text line may be cropped to obtain a text line image. Since the text line image may contain text adjacent to the text line, by setting the pixels outside the outline of the at least one text line to zero and then performing text recognition, targeting within the outline of the text line can be achieved. The text is recognized, which eliminates the interference of the adjacent text on the recognition, and improves the recognition accuracy.
  • text recognition can be performed on the text image by the following method to obtain a text recognition result.
  • the method for text recognition includes steps 501 to 504 .
  • step 501 feature extraction is performed on the text line image through a convolutional neural network to obtain first feature data of multiple channels.
  • the first feature data includes a height dimension and a width dimension.
  • the number of channels and the size of the first feature data are determined according to the specific structure of the convolutional neural network. For example, the values of the width dimension, height dimension, and channel dimension of the first feature data are h, w, and c, respectively, that is, the shape of the feature map corresponding to the first feature data is h ⁇ w ⁇ c.
  • step 502 the height dimension and the width dimension of the first feature data are compressed into one dimension to obtain second feature data.
  • the shape of the feature map corresponding to the second feature data may be represented as (h*w) ⁇ c.
  • the second feature data is in the The way of storing in the memory is unchanged from that of the first feature data.
  • an attention mechanism is used to perform a weighted summation process on the feature vector of each position of the second feature data, to obtain third feature data with the same value in each dimension as the first feature data. That is, the feature map corresponding to the third feature data has the same shape as the feature map corresponding to the first feature data.
  • the weighted summation of the eigenvectors for each location can be performed in the following manner.
  • the similarity between the feature vector of the first position of the second feature data and the feature vector of each position of the second feature data is determined, and the first position may be any position in the second feature data.
  • weighted summation is performed on the feature vectors of each position of the second feature data according to the similarity to obtain the updated feature vector of the first position.
  • the third feature data is obtained. It should be understood by those skilled in the art that the weighted summation of the feature vector of each position may also be performed in other manners, which is not limited in this embodiment of the present disclosure.
  • the above process of weighting and summing the feature vectors of each position may be repeated multiple times to obtain a final weighted summation result.
  • the values of the feature data in the width dimension, height dimension, and channel dimension can be restored to h, w, and c respectively according to the height priority, that is, the shape of the feature map is restored to h ⁇ w ⁇ c.
  • step 502 restoring the values of the feature data in the width dimension, height dimension, and channel dimension to h, w, and c, respectively, does not change the storage method of the feature data in the memory.
  • step 504 a text recognition result is obtained according to the third feature data.
  • the third feature data may be pooled in the height dimension, and a fully connected layer may be used to determine the character corresponding to each position in the width dimension to obtain a text recognition result.
  • the above-mentioned text recognition method can be implemented by using a text recognition network
  • the text recognition network can include a convolutional neural network that performs feature extraction on text line images, a network that implements an attention mechanism, and a pooling layer and a full connection layer.
  • the text recognition network can be trained using the CTC (Connectionist Temporal Classification) method.
  • the attention mechanism is used to perform a weighted sum operation, and after the initial shape is restored, the fully connected layer is used for classification to obtain the final recognition result. It can improve the robustness of the text recognition network for text recognition of different shapes.
  • FIG. 6 is a seal text recognition device proposed by at least one embodiment of the present disclosure.
  • the device may include: a first obtaining unit 601, configured to obtain a seal detection result of an object to be processed, wherein the The seal detection result includes the seal detection frame and the angle information of the seal detection frame; the second obtaining unit 602 is used for obtaining the positive seal detection frame and the seal image in the seal detection frame based on the seal detection result;
  • the third obtaining unit 603 is configured to obtain the outline of at least one text line in the seal image, and cut out at least one text line image according to the outline of the at least one text line;
  • the text line image is used for text recognition, and the text recognition result is obtained.
  • the first obtaining unit is specifically configured to: perform seal detection on the object to be processed through a seal detection network, and obtain a seal detection result, wherein the classifier of the seal detection network includes a classifier for identifying The branch of the angle of the seal detection frame.
  • the apparatus further includes a training unit for training the seal detection network using a sample set including a plurality of sample seal images, wherein the sample seal images are marked with the real detection frame of the seal.
  • Direction information wherein the direction information is determined according to the angle between the text reading direction of the text in the seal and the horizontal direction.
  • the second obtaining unit is specifically configured to: perform direction correction on the seal detection frame according to the angle information of the seal detection frame, and crop out the seal image from the direction-corrected seal detection frame.
  • the second obtaining unit when configured to perform direction correction on the seal detection frame according to the angle information of the seal detection frame, it is specifically configured to: according to the angle information of the seal detection frame, The seal detection frame is rotated and transformed so that the direction of the first side of the seal detection frame is a horizontal direction, wherein the first side is a side that matches the reading direction of the text in the seal detection frame.
  • the third obtaining unit when used to obtain the outline of at least one text line in the seal image, it is specifically configured to: perform text detection on the seal image, and obtain the information contained in the seal image.
  • the apparatus further includes a zero-setting unit, configured to zero the pixels outside the outline of each of the text lines for each of the cropped text line images.
  • the identifying unit is specifically configured to: perform feature extraction on the text line image through a convolutional neural network to obtain first feature data of multiple channels, wherein the first feature data includes a height dimension and width dimension; compress the height dimension and width dimension of the first feature data into one dimension to obtain the second feature data; use the attention mechanism to perform weighted summation processing on the feature vector of each position of the second feature data , obtain third feature data with the same value in each dimension as the first feature data; and obtain a text recognition result according to the third feature data.
  • the identifying unit when the identifying unit is configured to use an attention mechanism to perform weighted summation processing on feature vectors of each position of the second feature data to obtain third feature data, the identifying unit is specifically configured to: determine the The similarity between the feature vector of the first position of the second feature data and the feature vector of each position of the second feature data, the first position is any position in the second feature data; according to the similarity The eigenvectors of each position of the second feature data are weighted and summed to obtain the updated eigenvectors of the first position; according to the updated eigenvectors of each position in the second feature data, the first position is obtained.
  • Three characteristic data Three characteristic data.
  • the present disclosure also provides an electronic device, please refer to FIG. 7 , which shows the structure of the device, the device includes a memory and a processor, and the memory is used for storing computer instructions that can be executed on the processor, The processor is configured to implement the method described in any embodiment of the present disclosure when executing the computer instructions.
  • the present disclosure also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in any of the embodiments of the present disclosure.
  • one or more embodiments of this specification may be provided as a method, system or computer program product. Accordingly, one or more embodiments of this specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present specification may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuitry, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this specification and their structural equivalents, or in a combination of one or more.
  • Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks or memory devices. removable disks
  • magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

一种***文本识别方法、装置、设备及计算机可读存储介质,所述方法包括:获取待处理对象的***检测结果,其中,所述***检测结果包括***检测框以及所述***检测框的角度信息(101);基于所述***检测结果,得到正向的***检测框中的***图像(102);获取所述***图像中至少一个文本行的轮廓(103),并根据所述至少一个文本行的轮廓裁剪出文本行图像(104);对所述文本行图像进行文本识别,得到文本识别结果(105)。

Description

***文本识别方法、装置、设备及计算机可读存储介质
相关申请的交叉引用
本公开要求于2021年3月25日提交的、申请号为202110322157.6、发明名称为“***文本识别方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术,尤其涉及一种***文本识别方法、装置、设备及计算机可读存储介质。
背景技术
***文本检测和识别在光学字符识别(Optical Character Recognition,OCR)领域有着广泛的应用场景。由于***中的文本排布通常不规则,且***方向多样、文本行可能有较大倾斜角度,因此***的文本识别具有较大难度。
发明内容
本公开实施例提供了一种***文本识别方案。
根据本公开的一方面,提供一种***文本识别方法,所述方法包括:获取待处理对象的***检测结果,其中,所述***检测结果包括***检测框以及所述***检测框的角度信息;基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像;获取所述***图像中至少一个文本行的轮廓;根据所述至少一个文本行的轮廓裁剪出至少一个文本行图像;对每个所述文本行图像进行文本识别,得到文本识别结果。
结合本公开提供的任一实施方式,通过***检测网络对所述待处理对象进行***检测,得到***检测结果,其中,所述***检测网络的分类器中包括用于识别所述***检测框的角度的分支。
结合本公开提供的任一实施方式,所述方法还包括:利用包括多个样本***图像的样本集对所述***检测网络进行训练,其中,所述样本***图像标注有***的真实检测框的方向信息,其中,所述方向信息,根据所述***中文本的文字阅读方向与水平方向 之间的角度确定。
结合本公开提供的任一实施方式,所述基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像,包括:根据所述***检测框的角度信息对所述***检测框进行方向校正,从经方向校正的***检测框中裁剪出***图像。
结合本公开提供的任一实施方式,所述根据所述***检测框的角度信息对所述***检测框进行方向校正,包括:根据所述***检测框的角度信息,对所述***检测框进行旋转变换,以使所述***检测框的第一边的方向为水平方向,其中,所述第一边为与所述***检测框中的文字阅读方向匹配的边。
结合本公开提供的任一实施方式,所述获取所述***图像中至少一个文本行的轮廓,包括:对所述***图像进行文本检测,得到所述***图像中所包含的至少一个文本行的掩模以及每个所述文本行的聚类中心;根据每个所述文本行的聚类中心进行连通域划分,得到每个所述文本行的初始区域;在所述多个文本行的掩模的范围内,对每个所述文本行的初始区域进行扩展,确定每个文本行所对应的像素区域;利用多边形对所述文本行所对应的像素区域进行拟合,得到每个所述文本行的轮廓。
结合本公开提供的任一实施方式,所述方法还包括,对于裁剪出的每个所述文本行图像,将每个所述文本行的轮廓以外的像素置零。
结合本公开提供的任一实施方式,所述对每个所述文本图像进行文本识别,获得文本识别结果,包括:通过卷积神经网络对所述文本行图像进行特征提取,得到多个通道的第一特征数据,其中,所述第一特征数据包括高度维度和宽度维度;将所述第一特征数据的高度维度和宽度维度压缩至一个维度,获得第二特征数据;使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取与所述第一特征数据在各个维度上数值相同的第三特征数据;根据所述第三特征数据,获得文本识别结果。
结合本公开提供的任一实施方式,所述使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获得所述第三特征数据,包括:确定所述第二特征数据第一位置的特征向量与所述第二特征数据各个位置的特征向量的相似度,所述第一位置为所述第二特征数据中的任一位置;根据所述相似度对所述第二特征数据各个位置的特征向量进行加权求和,得到所述第一位置的更新的特征向量;根据所述第二特征数据中各个位置的更新后的特征向量,得到所述第三特征数据。
根据本公开的一方面,提供一种***文本识别装置,所述装置包括:第一获取单元, 用于获取待处理对象的***检测结果,其中,所述***检测结果包括***检测框以及所述***检测框的角度信息;第二获取单元,用于基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像;第三获取单元,用于获取所述***图像中至少一个文本行的轮廓,根据所述至少一个文本行的轮廓裁剪出至少一个文本行图像;识别单元,用于对每个所述文本行图像进行文本识别,得到文本识别结果。
结合本公开提供的任一实施方式,所述第一获取单元具体用于:通过***检测网络对所述待处理对象进行***检测,得到***检测结果,其中,所述***检测网络的分类器中包括用于识别所述***检测框的角度的分支。
结合本公开提供的任一实施方式,所述装置还包括训练单元,用于利用包括多个样本***图像的样本集对所述***检测网络进行训练,其中,所述样本***图像标注有***的真实检测框的方向信息,其中,所述方向信息,根据所述***中文本的文字阅读方向与水平方向之间的角度确定。
结合本公开提供的任一实施方式,所述第二获取单元具体用于:根据所述***检测框的角度信息对所述***检测框进行方向校正,并从经方向校正的***检测框中裁剪出***图像。
结合本公开提供的任一实施方式,所述第二获取单元在用于根据所述***检测框的角度信息对所述***检测框进行方向校正时,具体用于:根据所述***检测框的角度信息,对所述***检测框进行旋转变换,以使所述***检测框的第一边的方向为水平方向,其中,所述第一边为与所述***检测框中的文字阅读方向匹配的边。
结合本公开提供的任一实施方式,所述第三获取单元在用于获取所述***图像中至少一个文本行的轮廓时,具体用于:对所述***图像进行文本检测,得到所述***图像中所包含的至少一个文本行的掩模以及每个所述文本行的聚类中心;根据每个所述文本行的聚类中心进行连通域划分,得到每个所述文本行的初始区域;在所述多个文本行的掩模的范围内,对每个所述文本行的初始区域进行扩展,确定每个文本行所对应的像素区域;利用多边形对所述文本行所对应的像素区域进行拟合,得到每个所述文本行的轮廓。
结合本公开提供的任一实施方式,所述装置还包括置零单元,用于对于裁剪出的每个所述文本行图像,将每个所述文本行的轮廓以外的像素置零。
结合本公开提供的任一实施方式,所述识别单元具体用于:通过卷积神经网络对所 述文本行图像进行特征提取,得到多个通道的第一特征数据,其中,所述第一特征数据包括高度维度和宽度维度;将所述第一特征数据的高度维度和宽度维度压缩至一个维度,获得第二特征数据;使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取与所述第一特征数据在各个维度上数值相同的第三特征数据;根据所述第三特征数据,获得文本识别结果。
结合本公开提供的任一实施方式,所述识别单元在用于使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取所述第三特征数据时,具体用于:确定所述第二特征数据第一位置的特征向量与所述第二特征数据各个位置的特征向量的相似度,所述第一位置为所述第二特征数据中的任一位置;根据所述相似度对所述第二特征数据各个位置的特征向量进行加权求和,得到所述第一位置的更新后的特征向量;根据所述第二特征数据中各个位置的更新后的特征向量,得到所述第三特征数据。
根据本公开的一方面,提供一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开任一实施方式所述的***文本识别方法。
根据本公开的一方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施方式所述的***文本识别方法。
本公开实施例利用***检测结果中,基于***检测框的角度信息得到正向的***检测框以及所述***检测框中的***图像,并根据所述***图像中的至少一个文本行的轮廓裁剪出至少一个文本行图像,对每个所述文本行图像进行文本识别,得到文本识别结果,通过对正向的***检测框中的***图像进行文本检测,提高了文本检测的精度,并提高了后续进行文本识别的精度;通过对根据所述***图像中的至少一个文本行的轮廓裁剪出的文本行图像进行文本识别,排除了非***区域对***文本识别的干扰,进一步提高了***文本识别的精度和效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书一起用于解释本说明书的原理。
图1是本公开至少一个实施例提出的一种***文本识别方法的流程图;
图2为圆形***图像的标注示意图;
图3为本公开至少一个实施例提出的获取***图像中文本行的轮廓的方法流程图;
图4A为椭圆形***图像的示意图;
图4B为图4A的***图像中三个文本行的掩模图;
图4C为图4A的***图像中三个文本行的聚类中心;
图4D为图4A的***图像中各具文本行对应的扩展后的初始区域的标注示意图;
图5为本公开至少一个实施例提出的对文本图像进行文本识别的方法流程图;
图6是本公开至少一个实施例提出的一种***文本识别装置的示意图;
图7是本公开至少一个实施例提出的一种电子设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
本公开实施例可以应用于计算机***/服务器,其可与众多其它通用或专用计算***环境或配置一起操作。适于与计算机***/服务器一起使用的众所周知的计算***、环境和/或配置的例子包括但不限于:个人计算机***、服务器计算机***、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的***、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机***、大型计算机***和包括上述任何***的分布式云计算技术环境,等等。
图1是本公开至少一个实施例示出的一种***文本识别方法的流程图。如图1所示,该方法包括步骤101~步骤105。
在步骤101中,获取待处理对象的***检测结果,其中,所述***检测结果至少包括***检测框以及所述***检测框的角度信息。
在本公开实施例中,所述待处理对象为可能包含***的图像、电子文档等,例如护 照图像、***图像、订单图像、设计图的pdf文档等等。
在所述待处理对象中包含***的情况下,所得到的***检测结果中包含***检测框,并且所述***检测框具有角度信息。其中,所述***检测框可以是***的外接矩形检测框,所述角度信息可以包含所述外接矩形检测框相对于水平方向的旋转角度,该水平方向为所述待处理对象的图像坐标系中的水平方向。
在步骤102中,基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像。
在本公开实施例中,***检测框的正向是指所述***检测框的第一边的方向为水平方向,该水平方向为所述***图像的图像坐标系中的水平方向。其中,所述第一边为与所述***检测框中的文字阅读方向匹配的边。所述文字阅读方向可根据所述***检测框内的文本中文字的朝向确定。在所述文本呈直线排列的情况下,与文字的中心线平行的方向即为文字阅读方向。在所述文本呈曲线/折线排列的情况下,则根据文本整体呈现的形状确定文字整体阅读方向。例如,可以将与文本两端的文字连线平行的方向确定为文字整体阅读方向。所述***检测框中的文字阅读方向可以根据实际情况以适当的方式确定,本公开对此不进行限制。
根据所述***检测结果中***检测框的角度信息,可以对所述***检测框进行方向校正,以得到经方向校正的***检测框。在一些实施例中,可以根据所述***检测框的角度信息,对包含所述***的待处理对象,例如票据,进行方向校正,以使经方向校正后的待处理对象中的***检测框呈正向,并裁剪出该呈正向的***检测框中的***图像。其中,由于***检测框呈正向,从该***检测框中裁剪出的***图像也呈正向,以下也可称为正向的***图像。
在一些实施例中,还可以在进行方向校正之前,将所述待处理对象中的***检测框对应的图像裁剪出来,并根据所述***检测框的角度信息对裁剪出来的图像进行方向校正,以得到呈正向的***检测框以及该***检测框中的***图像。
在步骤103中,获取所述***图像中至少一个文本行的轮廓。
在步骤104中,根据所述至少一个文本行的轮廓裁剪出至少一个文本行图像。
在本公开实施例中,所述***图像中可能包括多个文本行,且所述文本行的形状可能是直线排列或斜线排列,也可能是折线或曲线排列的。可以根据其中一个文本行自身的轮廓,裁剪出所述文本行对应的文本行图像;也可以根据多个文本行轮廓的最小外接 矩形,裁剪出所述多个文本行对应的文本行图像。
在步骤105中,对所述文本行图像进行文本识别,得到文本识别结果。
在本公开实施例中,基于***检测框的角度信息得到正向的***检测框以及所述***检测框中的***图像,并根据所述***图像中的至少一个文本行的轮廓裁剪出至少一个文本行图像,对所述文本行图像进行文本识别以获得文本识别结果。这样,由于通过对正向的***检测框中的***图像进行文本检测,提高了文本检测的精度,并提高了后续进行文本识别的精度。此外,通过对根据所述***图像中的至少一个文本行的轮廓裁剪出的文本行图像进行文本识别,排除了非***区域对***文本识别的干扰,进一步提高了***文字识别的精度和效率。
在一些实施例中,可以根据所述***检测框的角度信息,对所述***检测框进行旋转变换,以使所述***检测框的第一边的方向为水平方向,从而实现对所述***检测框的方向校正。其中,所述第一边为与所述***检测框中的文字阅读方向匹配的边,所述水平方向为所述***图像的图像坐标系中的水平方向。
在一些实施例中,可以通过***检测网络对所述待处理对象进行***检测,得到***检测结果。其中,所述***检测结果中的***检测框可以具有角度信息、坐标信息以及分类信息。
在一个示例中,所述***检测网络例如可以通过RCNN(Region-based Convolutional Neural Network,基于区域的卷积神经网络)或Faster RCNN(Faster Region-based Convolutional Neural Network,快速基于区域的卷积神经网络)网络得到。在本公开实施例中,对于分类器,在输出检测框的类别置信度和位置偏移量的分支之外,可以增加用于预测检测框的角度的分支,以通过例如角度回归算法得到所述***检测框的角度信息。
在一些实施例中,可以利用样本***图像集对所述***检测网络进行训练。其中,所述样本***图像集中的样本***图像标注了所包含的***的真实检测框,标注信息包括真实检测框的参数信息。所述参数信息包含所述真实检测框的四个顶点的坐标,以及所述真实检测框与水平方向的角度信息。
对所述***检测网络进行训练的损失函数中包含了所预测的***检测框的角度信息与真实检测框的角度信息之间的差异。
图2示出了圆形***的标注示例。如图2所示,对于圆形***,以所述***的文字 阅读方向202与***图像坐标系中的水平方向201之间的夹角203的角度,作为所述圆形***的角度进行标注。
例如,在所述圆形***中包含呈直线排列的文本的情况下,与所述直线文本中文字的中心线平行的方向即为文字阅读方向202。在所述圆形***包含呈曲线排列的文本的情况下,则可以根据文本整体呈现的形状确定文字整体阅读方向。比如,可将与文本两端的文字连线平行的方向确定为***的文字整体阅读方向204。所述圆形***的文字阅读方向可以实际情况以适当的方式确定,本公开对此不进行限制。
在本公开实施例中,通过在***检测网络中增加预测角度的分支,可以获得***检测框的角度信息,以对所述***检测框进行方向校正,从而可以提高***检测和识别的精度。
在一些实施例中,可以根据所述***检测框的角度信息对所述待处理图像中的***检测框进行仿射变换,得到经方向校正为正向的***检测框以及所述***检测框中的***图像。
本公开实施例通过进行***检测和角度回归算法,得到正向的***图像作为后续进行***文本检测的输入,相较于相关技术中将整张图像直接作为***文本检测的输入,具有以下有益技术效果:排除了非***图像区域对***文本检测的干扰,提高了***文本检测精度;通过对***检测框进行方向校正,使得可以对正向的***图像进行文本检测和识别,提高了***文本检测和识别的精度;减小了***文本检测算法的输入分辨率和输入量,提高了***文本检测和识别的效率。
在一些实施例中,可以通过以下方法得到所述***图像中至少一个文本行的轮廓。如图3所示,获取***图像中文本行的轮廓的方法可包括步骤301~304。
在步骤301中,对所述***图像进行文本检测,得到所述***图像中所包含的至少一个文本行的掩模以及每个所述文本行的聚类中心。
在本公开实施例中,可以首先对***图像进行特征提取,得到所述***图像的特征数据;并基于所述特征数据进行图像分割,以得到所述***图像中所包含的至少一个文本行的掩模以及每个文本行的聚类中心。
可选地,可以通过文本检测网络实现上述特征提取以及基于特征数据进行图像分割的过程。其中,所述文本检测网络可以包括用于对所述***图像进行特征提取的特征提取主干网络、以及用于基于所述特征数据进行图像分割的图像分割网络。
在一个示例中,所述特征提取主干网络可以是卷积神经网络,例如ResNet18。所述特征提取主干网络所提取的特征数据可以是多通道的特征数据,特征数据的大小和通道数目可由特征提取主干网络的具体结构确定。
在一个示例中,所述图像分割网络可以用于:对各个通道的特征数据进行上采样,并将各个通道的经上采样得到的特征进行串联,得到融合了不同尺度的特征的串联结果;之后,根据所述串联结果获得文本区域的掩模以及每个文本行的聚类中心。在所述***图像中仅包含一个文本行的情况下,即获得了该文本行的掩模;在所述***中包含多个文本行的情况下,即获得了多个文本行的掩模。
然而,在相邻的两个或多个文本行之间距离较近或者具有重叠的情况下,或者***覆盖于其他文本上的情况下,可能无法很好地从得到的多个文本行的掩模中区分各个文本行分别对应的掩模。
图4A示出了椭圆形***图像的示意图,其中包含了一个弧线排列的文本行411以及两个直线排列的文本行412、413。通过对图4A中的***图像进行特征提取,并根据特征数据进行图像分割,可以得到所述***图像中所包含的三个文本行的掩模421、422、423,如图4B所示,以及得到所述三个文本行的聚类中心431、432、433,如图4C所示。
从图4B可见,由于弧线排列的文本行411的两端与一个直线排列的文本行412距离较近,因此所得到的两个文本行的掩模421、422有所重叠;而从图4C可见,由于各个文本行的聚类中心431、432、433对应的图像区域相较于各个文本行的掩模421、422、423较小,因此各个文本行的聚类中心之间不易产生重叠。
在步骤302中,根据每个所述文本行的聚类中心进行连通域划分,得到每个所述文本行的初始区域。
所述各个文本行的聚类中心图是二进制图像,也即从聚类中心图是无法将多个文本行的聚类中心区分开的。因此,本公开实施例中根据每个所述文本行的聚类中心进行连通域划分,可以得到对于各个文本行独立的、没有重叠区域的初始区域。由于各个文本行应属于不同的连通域,从而可以将各个文本行区分开。
在步骤303中,在所述至少一个文本行的掩模的范围内,对每个所述文本行的初始区域进行扩展,确定每个文本行所对应的像素区域。
对于每个文本行,通过在所对应的掩模的范围内对初始区域进行扩展,直到扩展至 与该文本行的掩模重合,获得此时扩展后的区域作为该文本行所对应的像素区域。
由于各个文本行的初始区域是独立的,因此由初始区域扩展得到的各个文本行所对应的像素区域也是独立的。如图4D所示,不同文本行所对应的像素区域441、442、443可以用不同的像素值表示以进行区分。
在步骤304中,利用多边形对每个所述文本行所对应的像素区域进行拟合,得到每个所述文本行的轮廓。
对于每个文本行,利用多边形对在步骤304中所得到的文本行所对应的像素区域进行拟合,可以得到该文本行的轮廓。
在本公开实施例中,通过根据每个文本行的聚类中心进行连通域划分得到所述文本行的初始区域,并将所述初始区域扩展至与所述文本行的掩模重合得到所述文本行所对应的像素区域,根据所得到的像素区域进行多边形拟合可以确定所述文本行的轮廓。这样,通过上述方式既可以得到精确的文本行轮廓,且可以将各个文本行进行实例分割,即使在相邻两个文本行距离较近或者具有重叠区域的情况下,也可以分别检测出各个文本行的轮廓,有利于提升后续文本识别的效果。
在一些实施例中,可以对所述至少一个文本行的轮廓的最小外接矩形进行裁剪,以得到文本行图像。由于所述文本行图像中可能包含了与所述文本行邻近的文字,通过将所述至少一个文本行的轮廓以外的像素置零后进行文本识别,可以实现针对所述文本行的轮廓之内的文本进行识别,排除了近邻文字对识别的干扰,提高了识别精度。
在一些实施例中,可以通过以下方法对所述文本图像进行文本识别,获得文本识别结果。如图5所示,进行文本识别的方法包括步骤501~504。
在步骤501中,通过卷积神经网络对所述文本行图像进行特征提取,得到多个通道的第一特征数据。其中,所述第一特征数据包括高度维度和宽度维度。所述第一特征数据的通道数以及大小根据所述卷积神经网络的具体结构确定。例如,所述第一特征数据在宽度维度、高度维度、通道维度上的数值分别为h、w、c,也即,所述第一特征数据所对应的特征图的形状(shape)为h×w×c。
在步骤502中,将所述第一特征数据的高度维度和宽度维度压缩至一个维度,获得第二特征数据。所述第二特征数据对应的特征图的形状可以表示为(h*w)×c。
在本公开实施例中,虽然所述第二特征数据相较于所述第一特征数据在维度上的数值发生了变化,也即特征图的形状发生了变化,但所述第二特征数据在内存中存储的方 式相比第一特征数据的并未改变。通过将三维矩阵形式的第一特征数据变化为二维矩阵形式,有利于提高计算效率。
在步骤503中,使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取与所述第一特征数据在各个维度上数值相同的第三特征数据。也即,所述第三特征数据所对应的特征图,与所述第一特征数据所对应的特征图的形状是相同的。
在一个示例中,可以通过以下方式对每个位置的特征向量进行加权求和处理。
首先,确定所述第二特征数据第一位置的特征向量与所述第二特征数据各个位置的特征向量的相似度,所述第一位置可为所述第二特征数据中的任一位置。接下来,根据所述相似度对所述第二特征数据各个位置的特征向量进行加权求和,得到所述第一位置的更新后的特征向量。最后,根据所述第二特征数据中各个位置的更新后的特征向量,得到所述第三特征数据。本预域技术人员应当理解,还可以利用其他方式对每个位置的特征向量进行加权求和,本公开实施例对此不进行限制。
在一些实施例中,可以多次重复上述对每个位置的特征向量进行加权求和的过程,以得到最终的加权求和结果。
对于所得到的加权求和结果,可以按照高度优先将特征数据在宽度维度、高度维度、通道维度上的值分别恢复为h、w、c,也即,也即将特征图的形状恢复为h×w×c。
与步骤502相似,将特征数据在宽度维度、高度维度、通道维度上的值分别恢复为h、w、c,并不会改变该特征数据在内存中的存储方式。
在步骤504中,根据所述第三特征数据,获得文本识别结果。
在一个示例中,可以对所述第三特征数据在高度维度上进行池化处理,并利用全连接层在宽度维度上确定每个位置对应的字符,获得文本识别结果。
在一些实施例中,可以利用文本识别网络实现上述的文本识别方法,所述文本识别网络可以包括对于文本行图像进行特征提取的卷积神经网络、实现注意力机制的网络以及池化层和全连接层。所述文本识别网络可以利用CTC(Connectionist Temporal Classification)方法进行训练。
在本公开实施例中,通过对卷积神经网络提取的特征进行变形后,利用注意力机制进行加权求和操作,在恢复初始的形状后再通过全连接层进行分类,得到最终的识别结果,可以提升文本识别网络对于不同形状文字进行识别的鲁棒性。
图6为本公开至少一个实施例提出的一种***文本识别装置,如图6所示,该装置可以包括:第一获取单元601,用于获取待处理对象的***检测结果,其中,所述***检测结果包括***检测框以及所述***检测框的角度信息;第二获取单元602,用于基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像;第三获取单元603,用于获取所述***图像中至少一个文本行的轮廓,并根据所述至少一个文本行的轮廓裁剪出至少一个文本行图像;识别单元604,用于对每个所述文本行图像进行文本识别,得到文本识别结果。
在一些实施例中,所述第一获取单元具体用于:通过***检测网络对所述待处理对象进行***检测,得到***检测结果,其中,所述***检测网络的分类器中包括用于识别所述***检测框的角度的分支。
在一些实施例中,所述装置还包括训练单元,用于利用包括多个样本***图像的样本集对所述***检测网络进行训练,其中,所述样本***图像标注有***的真实检测框的方向信息,其中,方向信息,根据所述***中文本的文字阅读方向与水平方向之间的角度确定。
在一些实施例中,所述第二获取单元具体用于:根据所述***检测框的角度信息对所述***检测框进行方向校正,并从经方向校正的***检测框中裁剪出***图像。
在一些实施例中,所述第二获取单元在用于根据所述***检测框的角度信息对所述***检测框进行方向校正时,具体用于:根据所述***检测框的角度信息,对所述***检测框进行旋转变换,以使所述***检测框的第一边的方向为水平方向,其中,所述第一边为与所述***检测框中的文字阅读方向匹配的边。
在一些实施例中,所述第三获取单元在用于获取所述***图像中至少一个文本行的轮廓时,具体用于:对所述***图像进行文本检测,得到所述***图像中所包含的至少一个文本行的掩模以及每个所述文本行的聚类中心;根据每个所述文本行的聚类中心进行连通域划分,得到每个所述文本行的初始区域;在所述多个文本行的掩模的范围内,对每个所述文本行的初始区域进行扩展,确定每个文本行所对应的像素区域;利用多边形对所述文本行所对应的像素区域进行拟合,得到每个所述文本行的轮廓。
在一些实施例中,所述装置还包括置零单元,用于对于裁剪出的每个所述文本行图像,将每个所述文本行的轮廓以外的像素置零。
在一些实施例中,所述识别单元具体用于:通过卷积神经网络对所述文本行图 像进行特征提取,得到多个通道的第一特征数据,其中,所述第一特征数据包括高度维度和宽度维度;将所述第一特征数据的高度维度和宽度维度压缩至一个维度,获得第二特征数据;使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取与所述第一特征数据在各个维度上数值相同的第三特征数据;根据所述第三特征数据,获得文本识别结果。
在一些实施例中,所述识别单元在用于使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取第三特征数据时,具体用于:确定所述第二特征数据第一位置的特征向量与所述第二特征数据各个位置的特征向量的相似度,所述第一位置为所述第二特征数据中的任一位置;根据所述相似度对所述第二特征数据各个位置的特征向量进行加权求和,得到所述第一位置的更新后的特征向量;根据所述第二特征数据中各个位置的更新后的特征向量,得到所述第三特征数据。
本公开还提供了一种电子设备,请参照附图7,其示出了该设备的结构,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开任一实施例所述的方法。
本公开还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的方法。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、***或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理 也是可以的或者可能是有利的。
本说明书中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本说明书中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本说明书中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本说明书中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位***(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本说明书包含许多具体实施细节,但是这些不应被解释为限制任何发明的范围或所要求保护的范围,而是主要用于描述特定发明的具体实施例的特征。本说明书内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单 个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种***模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和***通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。

Claims (12)

  1. 一种***文本识别方法,其特征在于,所述方法包括:
    获取待处理对象的***检测结果,其中,所述***检测结果包括***检测框以及所述***检测框的角度信息;
    基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像;
    获取所述***图像中至少一个文本行的轮廓;
    根据所述至少一个文本行的轮廓裁剪出至少一个文本行图像;
    对每个所述文本行图像进行文本识别,得到文本识别结果。
  2. 根据权利要求1所述的方法,其特征在于,所述获取待处理对象的***检测结果包括:
    通过***检测网络对所述待处理对象进行***检测,得到***检测结果,
    其中,所述***检测网络的分类器中包括用于识别所述***检测框的角度的分支。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    利用包括多个样本***图像的样本集对所述***检测网络进行训练,
    其中,所述样本***图像标注有***的真实检测框的方向信息,
    其中,所述方向信息根据所述***中文本的文字阅读方向与水平方向之间的角度确定。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像,包括:
    根据所述***检测框的角度信息对所述***检测框进行方向校正,并
    从经方向校正的***检测框中裁剪出***图像。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述***检测框的角度信息对所述***检测框进行方向校正,包括:
    根据所述***检测框的角度信息,对所述***检测框进行旋转变换,以使所述***检测框的第一边的方向为水平方向,其中,所述第一边为与所述***检测框中的文字阅读方向匹配的边。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述获取所述***图像 中至少一个文本行的轮廓,包括:
    对所述***图像进行文本检测,得到所述***图像中所包含的至少一个文本行的掩模以及每个所述文本行的聚类中心;
    根据每个所述文本行的聚类中心进行连通域划分,得到每个所述文本行的初始区域;
    在所述多个文本行的掩模的范围内,对每个所述文本行的初始区域进行扩展,确定每个文本行所对应的像素区域;
    利用多边形对所述文本行所对应的像素区域进行拟合,得到每个所述文本行的轮廓。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述方法还包括:
    对于裁剪出的每个所述文本行图像,将每个所述文本行的轮廓以外的像素置零。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述对每个所述文本图像进行文本识别,获得文本识别结果,包括:
    通过卷积神经网络对所述文本行图像进行特征提取,得到多个通道的第一特征数据,其中,所述第一特征数据包括高度维度和宽度维度;
    将所述第一特征数据的高度维度和宽度维度压缩至一个维度,获得第二特征数据;
    使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取与所述第一特征数据在各个维度上数值相同的第三特征数据;
    根据所述第三特征数据,获得文本识别结果。
  9. 根据权利要求8所述的方法,其特征在于,所述使用注意力机制对所述第二特征数据每个位置的特征向量进行加权求和处理,获取所述第三特征数据,包括:
    确定所述第二特征数据第一位置的特征向量与所述第二特征数据各个位置的特征向量的相似度,所述第一位置为所述第二特征数据中的任一位置;
    根据所述相似度对所述第二特征数据各个位置的特征向量进行加权求和,得到所述第一位置的更新后的特征向量;
    根据所述第二特征数据中各个位置的更新后的特征向量,得到所述第三特征数据。
  10. 一种***文本识别装置,其特征在于,所述装置包括:
    第一获取单元,用于获取待处理对象的***检测结果,其中,所述***检测结果包括***检测框以及所述***检测框的角度信息;
    第二获取单元,用于基于所述***检测结果,得到正向的***检测框以及所述***检测框中的***图像;
    第三获取单元,用于获取所述***图像中至少一个文本行的轮廓,根据所述至少一个文本行的轮廓裁剪出至少一个文本行图像;
    识别单元,用于对每个所述文本行图像进行文本识别,得到文本识别结果。
  11. 一种电子设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至9任一项所述的方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至9任一项所述的方法。
PCT/CN2021/121297 2021-03-25 2021-09-28 ***文本识别方法、装置、设备及计算机可读存储介质 WO2022198969A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110322157.6A CN112926511A (zh) 2021-03-25 2021-03-25 ***文本识别方法、装置、设备及计算机可读存储介质
CN202110322157.6 2021-03-25

Publications (1)

Publication Number Publication Date
WO2022198969A1 true WO2022198969A1 (zh) 2022-09-29

Family

ID=76176081

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/121297 WO2022198969A1 (zh) 2021-03-25 2021-09-28 ***文本识别方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN112926511A (zh)
WO (1) WO2022198969A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416626A (zh) * 2023-06-12 2023-07-11 平安银行股份有限公司 圆形***数据的获取方法、装置、设备及存储介质
CN117671694A (zh) * 2023-12-04 2024-03-08 合肥大智慧财汇数据科技有限公司 一种基于检测和融合的文档***预处理方法
CN117901559A (zh) * 2024-03-18 2024-04-19 易签链(深圳)科技有限公司 一种基于数据采集分析的印文生成方法
CN117975492A (zh) * 2024-03-29 2024-05-03 南昌航空大学 一种矩形***文字识别方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926511A (zh) * 2021-03-25 2021-06-08 深圳市商汤科技有限公司 ***文本识别方法、装置、设备及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019020A1 (en) * 2017-07-17 2019-01-17 Open Text Corporation Systems and methods for image based content capture and extraction utilizing deep learning neural network and bounding box detection training techniques
CN111767911A (zh) * 2020-06-22 2020-10-13 平安科技(深圳)有限公司 面向复杂环境的***文字检测识别方法、装置及介质
CN111950353A (zh) * 2020-06-30 2020-11-17 深圳市雄帝科技股份有限公司 ***文本识别方法、装置及电子设备
CN111950555A (zh) * 2020-08-17 2020-11-17 北京字节跳动网络技术有限公司 文本识别方法、装置、可读介质及电子设备
CN112507946A (zh) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 用于处理图像的方法、装置、设备以及存储介质
CN112926511A (zh) * 2021-03-25 2021-06-08 深圳市商汤科技有限公司 ***文本识别方法、装置、设备及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019020A1 (en) * 2017-07-17 2019-01-17 Open Text Corporation Systems and methods for image based content capture and extraction utilizing deep learning neural network and bounding box detection training techniques
CN111767911A (zh) * 2020-06-22 2020-10-13 平安科技(深圳)有限公司 面向复杂环境的***文字检测识别方法、装置及介质
CN111950353A (zh) * 2020-06-30 2020-11-17 深圳市雄帝科技股份有限公司 ***文本识别方法、装置及电子设备
CN111950555A (zh) * 2020-08-17 2020-11-17 北京字节跳动网络技术有限公司 文本识别方法、装置、可读介质及电子设备
CN112507946A (zh) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 用于处理图像的方法、装置、设备以及存储介质
CN112926511A (zh) * 2021-03-25 2021-06-08 深圳市商汤科技有限公司 ***文本识别方法、装置、设备及计算机可读存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416626A (zh) * 2023-06-12 2023-07-11 平安银行股份有限公司 圆形***数据的获取方法、装置、设备及存储介质
CN116416626B (zh) * 2023-06-12 2023-08-29 平安银行股份有限公司 圆形***数据的获取方法、装置、设备及存储介质
CN117671694A (zh) * 2023-12-04 2024-03-08 合肥大智慧财汇数据科技有限公司 一种基于检测和融合的文档***预处理方法
CN117901559A (zh) * 2024-03-18 2024-04-19 易签链(深圳)科技有限公司 一种基于数据采集分析的印文生成方法
CN117901559B (zh) * 2024-03-18 2024-05-17 易签链(深圳)科技有限公司 一种基于数据采集分析的印文生成方法
CN117975492A (zh) * 2024-03-29 2024-05-03 南昌航空大学 一种矩形***文字识别方法
CN117975492B (zh) * 2024-03-29 2024-06-07 南昌航空大学 一种矩形***文字识别方法

Also Published As

Publication number Publication date
CN112926511A (zh) 2021-06-08

Similar Documents

Publication Publication Date Title
WO2022198969A1 (zh) ***文本识别方法、装置、设备及计算机可读存储介质
TWI766855B (zh) 一種字符識別方法和裝置
CN111814794B (zh) 文本检测方法、装置、电子设备及存储介质
US10289924B2 (en) System and method for scanned document correction
US9092697B2 (en) Image recognition system and method for identifying similarities in different images
CN111459269B (zh) 一种增强现实显示方法、***及计算机可读存储介质
EP3791356B1 (en) Perspective distortion correction on faces
CN110738204B (zh) 一种证件区域定位的方法及装置
CN112613506A (zh) 图像中的文本识别方法、装置、计算机设备和存储介质
US20150302270A1 (en) A method of providing a feature descriptor for describing at least one feature of an object representation
WO2022205816A1 (zh) 目标检测方法、装置、设备及计算机可读存储介质
Takezawa et al. Robust perspective rectification of camera-captured document images
JP2014134856A (ja) 被写体識別装置、被写体識別方法および被写体識別プログラム
KR102126722B1 (ko) 3차원 피사체 인식의 안티-스푸핑 방법
US10210414B2 (en) Object detection system and computer program product
EP3410389A1 (en) Image processing method and device
CN109785367B (zh) 三维模型追踪中外点滤除方法和装置
TWI536280B (zh) 街景影像之文字區域偵測系統及裝置
JP3006560B2 (ja) 位置合わせ装置及び位置合わせプログラムを記録したコンピュータが読み取り可能な記録媒体
US20230069608A1 (en) Object Tracking Apparatus and Method
CN107229935B (zh) 一种三角形特征的二进制描述方法
JP7485200B2 (ja) 画像拡張装置、制御方法、及びプログラム
US20230368576A1 (en) Image processing apparatus, image processing method, and non-transitory storage medium
Nakashima et al. SIFT feature point selection by using image segmentation
CN111325194B (zh) 一种文字识别方法、装置及设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932578

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.01.2024)