CN113673522B - Method, device and equipment for detecting inclination angle of text image and storage medium - Google Patents

Method, device and equipment for detecting inclination angle of text image and storage medium Download PDF

Info

Publication number
CN113673522B
CN113673522B CN202111228759.1A CN202111228759A CN113673522B CN 113673522 B CN113673522 B CN 113673522B CN 202111228759 A CN202111228759 A CN 202111228759A CN 113673522 B CN113673522 B CN 113673522B
Authority
CN
China
Prior art keywords
pixel point
position information
statistical value
candidate
angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111228759.1A
Other languages
Chinese (zh)
Other versions
CN113673522A (en
Inventor
刘永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202111228759.1A priority Critical patent/CN113673522B/en
Publication of CN113673522A publication Critical patent/CN113673522A/en
Application granted granted Critical
Publication of CN113673522B publication Critical patent/CN113673522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to a method, a device, equipment and a storage medium for detecting an inclination angle of a text image, wherein the method comprises the following steps: acquiring first position information and pixel data of each pixel point in a text image to be detected; determining second position information of each pixel point after being respectively rotated by a plurality of candidate angles based on the first position information; for each candidate angle, determining a first statistical value of pixel data generated after each pixel point is projected in a first direction and a second statistical value of pixel data generated after each pixel point is projected in a second direction based on second position information and pixel data of each pixel point; and determining the inclination angle of the text image to be detected from a plurality of candidate angles based on the first statistical value and the second statistical value. According to the technical scheme, the accuracy of detecting the inclination angle of the text image can be improved.

Description

Method, device and equipment for detecting inclination angle of text image and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting an inclination angle of a text image.
Background
Currently, OCR (Optical Character Recognition) technology is widely used. In an OCR scene, the situation that text lines in a text image incline frequently occurs, the inclined text lines can affect an OCR recognition result, if the inclination angle of the text image can be detected, the inclination correction can be performed on the text image, and the performance of an OCR system is improved.
According to the inclination angle detection scheme in the related art, firstly, character segmentation is carried out on a text image, then clustering is carried out on text lines, and the main body orientation of the text lines is obtained through layout analysis, so that the inclination angle is determined.
Disclosure of Invention
According to an aspect of the present disclosure, there is provided a method for detecting an inclination angle of a text image, including:
acquiring first position information and pixel data of each pixel point in a text image to be detected;
determining second position information of each pixel point after the pixel point is respectively rotated by a plurality of candidate angles based on the first position information;
for each candidate angle, determining a first statistical value of the pixel data generated after each pixel point is projected in a first direction and a second statistical value of the pixel data generated after each pixel point is projected in a second direction based on second position information and pixel data of each pixel point;
and determining the inclination angle of the text image to be detected from the plurality of candidate angles based on the first statistical value and the second statistical value.
According to another aspect of the present disclosure, there is provided a tilt angle detection apparatus of a text image, including:
the acquisition module is used for acquiring first position information and pixel data of each pixel point in the text image to be detected;
a first determining module, configured to determine, based on the first position information, second position information obtained by respectively rotating each pixel point by multiple candidate angles;
a second determining module, configured to determine, for each candidate angle, a first statistical value of the pixel data generated after the projection of each pixel point in a first direction and a second statistical value of the pixel data generated after the projection of each pixel point in a second direction based on the second position information and the pixel data of each pixel point;
and the third determining module is used for determining the inclination angle of the text image to be detected from the candidate angles based on the first statistical value and the second statistical value.
According to another aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instruction from the memory and execute the instruction to implement the method for detecting the tilt angle of the text image according to the above aspect.
According to another aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the tilt angle detection method for a text image according to the above-described one aspect.
According to one or more technical solutions provided in the embodiments of the present application, based on first position information of each pixel point in a text image to be detected, second position information of each pixel point after being respectively rotated by a plurality of candidate angles is determined, for each candidate angle, based on second position information and pixel data of each pixel point, a first statistical value of pixel data generated after each pixel point is projected in a first direction is determined, and a second statistical value of pixel data generated after each pixel point is projected in a second direction is determined, based on the first statistical value and the second statistical value, an inclination angle of the text image to be detected is determined from the plurality of candidate angles, so that accuracy of detecting the inclination angle of the text image is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a method for detecting an inclination angle of a text image according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart illustrating another method for detecting an inclination angle of a text image according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram of a text image provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of another text image provided by embodiments of the present disclosure;
fig. 5 is a schematic structural diagram of an apparatus for detecting an inclination angle of a text image according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
Aspects of the present disclosure are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a method for detecting an inclination angle of a text image according to an embodiment of the present disclosure, and as shown in fig. 1, the method for detecting an inclination angle of a text image according to an embodiment of the present disclosure may include:
step 101, obtaining first position information and pixel data of each pixel point in a text image to be detected.
The method of the embodiment of the disclosure is used for detecting the inclination angle of the text image.
In the embodiment of the disclosure, a text image to be detected can be obtained, and for all pixel points in the text image to be detected, the first position information of each pixel point and the pixel data of each pixel point are determined.
The first position information is used for representing the initial position of the pixel point, and the pixel data comprises gray data. As an example, a coordinate system is provided, and the first position information is coordinates (x, y) of the pixel point in the coordinate system, x represents an abscissa, and y represents an ordinate.
In practical applications, the acquired text images generally have different sizes, and therefore, in an embodiment of the present disclosure, normalization processing may be performed on the acquired text images, and the text images after normalization processing are used as text images to be detected. For example, the length L of the longest side of the text image to be detected is preset, and for the height h and the width w of the acquired text image, the height of the normalized image is determined to be L and the width of the normalized image to be wL/h when h is greater than or equal to w, and the width of the normalized image to be L and the height of the normalized image to be hL/w when h is less than w.
And 102, determining second position information of each pixel point which is respectively rotated by a plurality of candidate angles based on the first position information.
In the embodiment of the disclosure, for each candidate angle, based on the first position information of the pixel point in the text image to be detected, the position information of the pixel point rotated by the candidate angle can be determined and used as the second position information of the pixel point.
The candidate angles may be predetermined, for example, the candidate angles are 1 °, 2 °, 3 °, and the rotation center is, for example, the center of the text image to be detected. As an example, for a candidate angle of 1 °, based on the first position information (x, y) of the pixel point, the coordinate (x ', y') of the pixel point rotated by 1 ° with respect to the center of the text image to be detected is determined as the second position information of the pixel point corresponding to the candidate angle of 1 °.
Step 103, for each candidate angle, based on the second position information and the pixel data of each pixel point, determining a first statistical value of the pixel data generated after each pixel point is projected in the first direction, and a second statistical value of the pixel data generated after each pixel point is projected in the second direction.
In the embodiment of the disclosure, for each candidate angle, a first statistical value and a second statistical value corresponding to the candidate angle may be determined.
As an example, taking the example of projecting each pixel point in the first direction, there are multiple first projection positions, and based on the second position information of each pixel point, a pixel point corresponding to each first projection position is determined. For each first projection position, generating a pixel data statistic value of the first projection position according to the pixel data of the pixel point corresponding to the first projection position, and further generating a first statistic value based on the pixel data statistic value of each first projection position.
As another example, taking the example of projecting each pixel point in the second direction, there are multiple second projection positions, and based on the second position information of each pixel point, a pixel point corresponding to each second projection position is determined. And for each second projection position, generating a pixel data statistic value of the second projection position according to the pixel data of the pixel point corresponding to the second projection position, and further generating a second statistic value based on the pixel data statistic value of each second projection position.
The first direction and the second direction are different directions, for example, the first direction is a horizontal direction, the second direction is a vertical direction, or the first direction is perpendicular to the second direction, or an included angle between the first direction and the second direction is a set angle, and this is not limited specifically here.
And 104, determining the inclination angle of the text image to be detected from a plurality of candidate angles based on the first statistic value and the second statistic value.
In the embodiment of the disclosure, the confidence of each candidate angle is determined according to the first statistical value and the second statistical value, the maximum confidence is determined from the confidence of the plurality of candidate angles, and the inclination angle is determined according to the candidate angle corresponding to the maximum confidence.
As an example, for each candidate angle, a corresponding first statistical value and a corresponding second statistical value may be determined, and according to a product of the first statistical value and the second statistical value corresponding to the candidate angle, a confidence of the candidate angle is determined, and then, a candidate angle corresponding to a maximum confidence is determined as the inclination angle of the text image to be detected.
It should be noted that the manner of determining the confidence based on the product of the first statistical value and the second statistical value is only an example, and is not limited in particular here.
According to the technical scheme of the embodiment of the disclosure, first position information and pixel data of each pixel point in a text image to be detected are obtained, second position information of each pixel point after being respectively rotated by a plurality of candidate angles is determined based on the first position information, for each candidate angle, a first statistical value of the pixel data generated after each pixel point is projected in a first direction is determined based on the second position information and the pixel data of each pixel point, a second statistical value of the pixel data generated after each pixel point is projected in a second direction is determined, an inclination angle of the text image to be detected is determined from the plurality of candidate angles based on the first statistical value and the second statistical value, the inclination angle is determined based on the statistical values of the pixel data under each candidate angle, and the inclination angle is determined based on the first statistical value and the second statistical value in the process of determining the inclination angle, the constraint between the first direction and the second direction is introduced, and compared with the condition of a statistic value of pixel data generated by projecting pixel points in a single direction, the inclination angle detection method can accurately detect the inclination angle of various text images with different text line directions, and improves the accuracy and the robustness of the detection of the inclination angle of the text images.
Based on the above embodiments, the method of the embodiments of the present disclosure may implement the detection of the inclination angle of the text image based on the neural network.
Fig. 2 is a schematic flow chart of another method for detecting an inclination angle of a text image according to an embodiment of the present disclosure, and as shown in fig. 2, the method for detecting an inclination angle of a text image includes:
step 201, obtaining first position information and pixel data of each pixel point in a text image to be detected.
In the embodiment of the disclosure, the circumscribed circle of the text image to be detected can be determined, and the radius of the circumscribed circle is recorded as R. As an example, for each pixel point in the text image to be detected, two-dimensional coordinate information of the pixel point is obtained, for example, the initial coordinate is (x, y), and based on the radius R of the circumscribed circle of the text image to be detected, the two-dimensional coordinate information is converted into three-dimensional coordinate information, that is, the initial coordinate (x, y) is converted into three-dimensional coordinate (x, y, R) as the first position information.
Step 202, inputting the first position information of each pixel point into a convolutional neural network for processing, so as to determine second position information of each pixel point after being respectively rotated by a plurality of candidate angles.
In the embodiment of the disclosure, the input of the convolutional neural network is a four-dimensional matrix [ n, 3, H, W ], where n represents the number of text images to be detected, 3 represents three dimensions (abscissa x, ordinate y and R), H represents the height of the text images to be detected, and W represents the width of the text images to be detected, and through the number of the text images to be detected, three-dimensional coordinate information, and the width and height of the text images to be detected, input data suitable for the convolutional neural network is generated, and a plurality of text images to be detected can be processed at one time according to requirements, so that the detection efficiency is improved. Optionally, the convolutional neural network is implemented by using a convolutional layer, so that the neural network structure is simplified.
Optionally, a plurality of candidate angles are determined according to the preset angle step and the preset angle interval, a plurality of convolution kernels may be set, and a convolution kernel parameter of the convolution neural network is determined according to the preset angle step and the preset angle interval.
For example, by combining the requirement on the precision of detecting the inclination angle of the text image, the preset angle step is 0.1 °, considering that the general reading direction of the text image is from left to right, and the preset angle interval is-45 ° to 45 °, the number of candidate angles is 901 (including 0 °), 1802 convolution kernels are set, wherein 901 convolution kernels are used for determining the abscissa of the pixel after the pixel is rotated by each candidate angle, and the other 901 convolution kernels are used for determining the ordinate of the pixel after the pixel is rotated by each candidate angle. Taking 901 convolution kernels as an example, for the ith convolution kernel, setting a first parameter of the convolution kernel to be + cos (-45+ angle _ s × i), a second parameter to be-sin (-45+ angle _ s × i), and a third parameter to be 1.0, where angle _ s is a preset angle step, and i is a convolution kernel index for representing the ith candidate angle, and a value of i in this example is 0 to 900.
In this example, the output of the input convolutional neural network is 1802 channel features, the channel features represent coordinate values x 'and y' after each pixel point is rotated by 901 candidate angles, each channel includes coordinate values corresponding to a plurality of pixel points, the output size is [1802, 2R ], and 2R represents the projection position.
Step 203, determining a first statistical value of the pixel data generated after each pixel point is projected in the first direction and a second statistical value of the pixel data generated after each pixel point is projected in the second direction based on the second position information and the pixel data of each pixel point.
In the embodiment of the disclosure, based on the second position information and the pixel data of each pixel point, an accumulated value of the pixel data corresponding to each of the plurality of first projection positions in the first direction is determined, and a first statistical value is generated according to the accumulated value of the pixel data corresponding to each of the first projection positions. Optionally, the first direction is a horizontal direction.
And determining an accumulated value of the pixel data corresponding to each of the plurality of second projection positions in the second direction based on the second position information and the pixel data of each pixel point, and generating a second statistical value according to the accumulated value of the pixel data corresponding to each of the second projection positions. Optionally, the second direction is a vertical direction.
The accumulated value of the pixel data corresponding to each second projection position is determined as an example, and the statistical value may be determined by multiplying the pixel data corresponding to each second projection position.
In one embodiment of the present disclosure, generating the first statistical value according to the accumulated value of the pixel data corresponding to each first projection position includes: and activating the accumulated value of the pixel data corresponding to each first projection position according to a preset activation function, and generating a first statistical value according to the activated accumulated value of the pixel data corresponding to each first projection position.
Generating a second statistical value according to the accumulated value of the corresponding pixel data of each second projection position, including: and activating the accumulated value of the pixel data corresponding to each second projection position according to a preset activation function, and generating a second statistical value according to the activated accumulated value of the pixel data corresponding to each second projection position.
Optionally, the preset activation function adopts an sqrt function. And activating processing is executed through the sqrt function, so that the influence of the extreme value on the result is reduced, and the accuracy is further improved.
For example, the gray data filling is performed on the output of the convolutional neural network according to the gray data and the second position information of each pixel point, and a statistical tensor is generated to reflect the change condition of the black and white pixels. The size of the statistical tensor is [1802, 2R ], in 1802 lines of the statistical tensor, 901 lines respectively correspond to the case that pixel points respectively rotated by 901 candidate angles are projected in a first direction, in addition, 901 lines respectively correspond to the case that pixel points respectively rotated by 901 candidate angles are projected in a second direction, 2R represents a projection position, and gray data of pixel points belonging to the same line in the first direction/the second direction after rotation are accumulated, so that the change condition of black and white pixels between different lines can be reflected. A nonlinear activation operation is performed on the statistical tensor. Further, taking a candidate angle as an example, for the case that the pixel points are projected in the first direction, there are 2R first projection positions, the pixel point corresponding to each first projection position is determined based on the second position information of each pixel point, the gray data of the pixel point corresponding to the first projection position is summed, and an accumulated value of the gray data corresponding to the first projection position is generated, so that 2R accumulated values can be correspondingly obtained for the 2R first projection positions, and a first statistical value is generated according to the summation of the 2R accumulated values; for the condition that the pixel points are projected in the second direction, 2R second projection positions exist, the pixel point corresponding to each second projection position is determined based on the second position information of each pixel point, the gray data of the pixel points corresponding to the second projection positions are summed, the accumulated value of the gray data corresponding to the second projection positions is generated, therefore, 2R accumulated values can be correspondingly obtained for the 2R second projection positions, and a second statistical value is generated according to the summation of the 2R accumulated values, for example, the activated statistical tensor is converted into a one-dimensional array with the size [1802 ].
Step 204, determining the confidence of each candidate angle according to the first statistical value and the second statistical value, determining the maximum confidence from the confidence of the plurality of candidate angles, and determining the inclination angle according to the candidate angle corresponding to the maximum confidence.
As an example, for each candidate angle, a first statistical value and a second statistical value corresponding to the candidate angle may be determined, a one-dimensional array may be generated according to a product of the first statistical value and the second statistical value corresponding to each candidate angle, and softmax operation may be performed on the one-dimensional array to determine a confidence of each candidate angle. Taking 901 candidate angles as an example, obtaining a one-dimensional array T with the size [1802] in the step 203, where the index of the one-dimensional array T is 0-1801, the indexes 0-900 respectively correspond to the case that the pixel points respectively rotated by 901 candidate angles are projected in the first direction, and the indexes 901-. In the example, confidence degrees of the candidate angles are determined through softmax operation, the magnitude relation among the confidence degrees can be more accurately represented, the confidence degrees are further applied to determining the inclination angle, and the detection accuracy of the inclination angle of the text image is improved.
In this embodiment, after the inclination angle of the text image to be detected is determined, the text image to be detected may be corrected based on the inclination angle. For example, fig. 3 is a text image to be detected, and fig. 4 is a corrected text image to be detected.
Optionally, determining the inclination angle according to the candidate angle corresponding to the maximum confidence coefficient includes: and under the condition that the maximum confidence coefficient is larger than the threshold value, determining the candidate angle corresponding to the maximum confidence coefficient as the inclination angle. In this example, a confidence threshold (e.g., 0.9) may be set, and when the maximum confidence is greater than the threshold, the candidate angle corresponding to the maximum confidence is determined as the tilt angle, so as to adjust the text image to be detected based on the tilt angle, otherwise, prompt information for indicating that the detection result is invalid is generated, and the text image to be detected is not adjusted, so that the detection result with the confidence smaller than or equal to the threshold can be discarded, and the tilt angle detection accuracy is further improved.
In this embodiment, the inclination angle detection of the text image is realized based on the neural network, the processing process of the convolutional neural network is accelerated by using a GPU (graphics processing unit), the speed of the inclination angle detection is improved, and the X-direction and Y-direction constraints are introduced, so that the inclination angle of the text image in the horizontal/vertical direction of the text line can be accurately detected, the accuracy of the inclination angle detection of the text image is improved, and the inclination angle detection of a large angle (for example, 25 ° to 45 °) can be supported.
Fig. 5 is a schematic structural diagram of a device for detecting a tilt angle of a text image according to an embodiment of the present disclosure, and as shown in fig. 5, the device for detecting a tilt angle of a text image includes: the device comprises an acquisition module 51, a first determination module 52, a second determination module 53 and a third determination module 54.
The obtaining module 51 is configured to obtain first position information and pixel data of each pixel point in the text image to be detected.
The first determining module 52 is configured to determine, based on the first position information, second position information of each pixel point respectively rotated by a plurality of candidate angles.
The second determining module 53 is configured to determine, for each candidate angle, a first statistical value of pixel data generated after each pixel point is projected in the first direction and a second statistical value of pixel data generated after each pixel point is projected in the second direction, based on the second position information and the pixel data of each pixel point.
And a third determining module 54, configured to determine, based on the first statistical value and the second statistical value, an inclination angle of the text image to be detected from the multiple candidate angles.
In an embodiment of the present disclosure, the first determining module 52 is specifically configured to: determining a plurality of candidate angles according to a preset angle step length and a preset angle interval; and inputting the first position information of each pixel point into a convolutional neural network for processing so as to determine second position information of each pixel point after being respectively rotated by a plurality of candidate angles.
In one embodiment of the present disclosure, the angle step is 0.1 °, and the angle interval is [ -45 °, 45 ° ].
In an embodiment of the present disclosure, the obtaining module 51 is specifically configured to: acquiring two-dimensional coordinate information of each pixel point in a text image to be detected; converting the two-dimensional coordinate information into three-dimensional coordinate information based on the radius of the circumscribed circle of the text image to be detected;
the first determining module 52 is specifically configured to: generating input data according to the number of the text images to be detected, the three-dimensional coordinate information and the width and height of the text images to be detected; and inputting the input data into a convolutional neural network for processing.
In one embodiment of the present disclosure, the second determining module 53 includes:
and the first determining unit is used for determining the accumulated value of the pixel data corresponding to each first projection position in a plurality of first projection positions in the first direction based on the second position information and the pixel data of each pixel point.
And the first generation unit is used for generating a first statistical value according to the accumulated value of the pixel data corresponding to each first projection position.
And a second determining unit configured to determine an accumulated value of the pixel data corresponding to each of the plurality of second projection positions in the second direction based on the second position information and the pixel data of each of the pixel points.
And the second generating unit is used for generating a second statistical value according to the accumulated value of the corresponding pixel data of each second projection position.
In an embodiment of the disclosure, the first generating unit is specifically configured to: activating the accumulated value of the pixel data corresponding to each first projection position according to a preset activation function; and generating a first statistical value according to the accumulated value of the pixel data corresponding to each activated first projection position.
The second generating unit is specifically configured to: activating the accumulated value of the pixel data corresponding to each second projection position according to a preset activation function; and generating a second statistical value according to the accumulated value of the pixel data corresponding to each activated second projection position.
In one embodiment of the present disclosure, the activation function includes an sqrt function.
In one embodiment of the present disclosure, the third determining module 54 includes: and a third determining unit, configured to determine a confidence level of each candidate angle according to the first statistical value and the second statistical value.
A fourth determining unit, configured to determine a maximum confidence level from the confidence levels of the plurality of candidate angles.
And the fifth determining unit is used for determining the inclination angle according to the candidate angle corresponding to the maximum confidence coefficient.
In an embodiment of the disclosure, the fifth determining unit is specifically configured to: and under the condition that the maximum confidence coefficient is larger than the threshold value, determining the candidate angle corresponding to the maximum confidence coefficient as the inclination angle.
In an embodiment of the disclosure, the third determining unit is specifically configured to: generating a one-dimensional array according to the product of the first statistical value and the second statistical value of each candidate angle; and performing softmax operation on the one-dimensional array to generate confidence of each candidate angle.
The device for detecting the inclination angle of the text image, provided by the embodiment of the disclosure, can execute the method for detecting the inclination angle of any text image, and has the corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.
An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the disclosure.
The disclosed exemplary embodiments also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
Referring to fig. 6, a block diagram of a structure of an electronic device 600, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the electronic device 600, and the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 608 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as a bluetooth (TM) device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above. For example, in some embodiments, the method of tilt angle detection of a text image may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. In some embodiments, the calculation unit 601 may be configured to perform the tilt angle detection method of the text image by any other suitable means (e.g., by means of firmware).
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for detecting the inclination angle of a text image comprises the following steps:
acquiring first position information and gray data of each pixel point in a text image to be detected;
determining second position information of each pixel point after the pixel point is respectively rotated by a plurality of candidate angles based on the first position information;
for each candidate angle, determining a first statistical value of the gray data generated after projection of each pixel point in a first direction and a second statistical value of the gray data generated after projection of each pixel point in a second direction based on second position information and gray data of each pixel point, wherein an accumulated value of gray data corresponding to each of a plurality of first projection positions in the first direction is determined based on the second position information and gray data of each pixel point, the first statistical value is generated according to the accumulated value of gray data corresponding to each of the plurality of first projection positions, and the accumulated value of gray data corresponding to each of a plurality of second projection positions in the second direction is determined based on the second position information and gray data of each pixel point, generating the second statistical value according to the accumulated value of the corresponding gray data of each second projection position;
determining a confidence level of each candidate angle according to the first statistical value and the second statistical value of each candidate angle;
determining a maximum confidence level from the confidence levels of the plurality of candidate angles;
and determining the inclination angle of the text image to be detected according to the candidate angle corresponding to the maximum confidence coefficient.
2. The method of claim 1, wherein the determining second position information of each pixel point after being respectively rotated by a plurality of candidate angles based on the first position information comprises:
determining the plurality of candidate angles according to a preset angle step length and a preset angle interval;
and inputting the first position information of each pixel point into a convolutional neural network for processing so as to determine second position information of each pixel point after being respectively rotated by a plurality of candidate angles.
3. The method according to claim 2, wherein the angular step is 0.1 °, and the angular interval is [ -45 °, 45 ° ].
4. The method of claim 2, wherein the obtaining the first position information of each pixel point in the text image to be detected comprises:
acquiring two-dimensional coordinate information of each pixel point in the text image to be detected;
converting the two-dimensional coordinate information into three-dimensional coordinate information based on the radius of the circumscribed circle of the text image to be detected;
inputting the first position information of each pixel point into a convolutional neural network for processing, including:
generating input data according to the number of the text images to be detected, the three-dimensional coordinate information and the width and height of the text images to be detected;
and inputting the input data into the convolutional neural network for processing.
5. The method of claim 1, wherein the generating the first statistical value according to the accumulated value of the gray data corresponding to each first projection position comprises:
activating the accumulated value of the gray data corresponding to each first projection position according to a preset activation function;
generating the first statistical value according to the accumulated value of the gray data corresponding to each activated first projection position;
generating the second statistical value according to the accumulated value of the corresponding gray data of each second projection position, including:
activating the accumulated value of the corresponding gray data of each second projection position according to the preset activation function;
and generating the second statistical value according to the accumulated value of the gray data corresponding to each activated second projection position.
6. The method of claim 5, wherein the activation function comprises an sqrt function.
7. The method of claim 1, wherein the determining the tilt angle according to the candidate angle corresponding to the maximum confidence level comprises:
and determining the candidate angle corresponding to the maximum confidence coefficient as the inclination angle under the condition that the maximum confidence coefficient is greater than a threshold value.
8. The method of claim 1, wherein the determining the confidence level for each of the candidate angles from the first statistical value and the second statistical value for each of the candidate angles comprises:
generating a one-dimensional array according to the product of the first statistical value and the second statistical value of each candidate angle;
performing softmax operation on the one-dimensional array, and generating confidence of each candidate angle.
9. An inclination angle detection apparatus for a text image, comprising:
the acquisition module is used for acquiring first position information and gray data of each pixel point in the text image to be detected;
a first determining module, configured to determine, based on the first position information, second position information obtained by respectively rotating each pixel point by multiple candidate angles;
a second determining module, configured to determine, for each candidate angle, a first statistical value of the gray scale data generated after the projection of each pixel point in a first direction and a second statistical value of the gray scale data generated after the projection of each pixel point in a second direction based on the second position information and the gray scale data of each pixel point, wherein an accumulated value of the gray scale data corresponding to each of a plurality of first projection positions in the first direction is determined based on the second position information and the gray scale data of each pixel point, the first statistical value is generated according to the accumulated value of the gray scale data corresponding to each of the plurality of first projection positions, and the accumulated value of the gray scale data corresponding to each of a plurality of second projection positions in the second direction is determined based on the second position information and the gray scale data of each pixel point, generating the second statistical value according to the accumulated value of the corresponding gray data of each second projection position;
a third determining module, configured to determine a confidence level of each candidate angle according to the first statistical value and the second statistical value of each candidate angle; determining a maximum confidence level from the confidence levels of the plurality of candidate angles; and determining the inclination angle of the text image to be detected according to the candidate angle corresponding to the maximum confidence coefficient.
10. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-8.
11. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, carries out the method of any of the preceding claims 1-8.
CN202111228759.1A 2021-10-21 2021-10-21 Method, device and equipment for detecting inclination angle of text image and storage medium Active CN113673522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111228759.1A CN113673522B (en) 2021-10-21 2021-10-21 Method, device and equipment for detecting inclination angle of text image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111228759.1A CN113673522B (en) 2021-10-21 2021-10-21 Method, device and equipment for detecting inclination angle of text image and storage medium

Publications (2)

Publication Number Publication Date
CN113673522A CN113673522A (en) 2021-11-19
CN113673522B true CN113673522B (en) 2022-04-19

Family

ID=78550759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111228759.1A Active CN113673522B (en) 2021-10-21 2021-10-21 Method, device and equipment for detecting inclination angle of text image and storage medium

Country Status (1)

Country Link
CN (1) CN113673522B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751571A (en) * 2009-12-28 2010-06-23 山东大学 Practical binary document image tilt angle detection method
CN102938062A (en) * 2012-10-16 2013-02-20 山东山大鸥玛软件有限公司 Document image slant angle estimation method based on content

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7142727B2 (en) * 2002-12-17 2006-11-28 Xerox Corporation Non-iterative method of calculating image skew
US8422788B2 (en) * 2008-08-26 2013-04-16 Microsoft Corporation Automatic image straightening
CN106997470A (en) * 2017-02-28 2017-08-01 信雅达***工程股份有限公司 Tilt bearing calibration and the system of text image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751571A (en) * 2009-12-28 2010-06-23 山东大学 Practical binary document image tilt angle detection method
CN102938062A (en) * 2012-10-16 2013-02-20 山东山大鸥玛软件有限公司 Document image slant angle estimation method based on content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
radon变换用于车牌图像倾斜矫正;qq_38211852;《CSDN blog.csdn.net/qq_38211852/article/details/80448833》;20180525;全文 *

Also Published As

Publication number Publication date
CN113673522A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
US9697423B1 (en) Identifying the lines of a table
US20220261968A1 (en) Image optimization method and apparatus, computer storage medium, and electronic device
US20220215507A1 (en) Image stitching
US20210200971A1 (en) Image processing method and apparatus
CN113673519B (en) Character recognition method based on character detection model and related equipment thereof
CN113343958B (en) Text recognition method, device, equipment and medium
CN110827301B (en) Method and apparatus for processing image
CN111898610A (en) Card unfilled corner detection method and device, computer equipment and storage medium
CN114445825A (en) Character detection method and device, electronic equipment and storage medium
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN113791425A (en) Radar P display interface generation method and device, computer equipment and storage medium
CN113673522B (en) Method, device and equipment for detecting inclination angle of text image and storage medium
CN112651399A (en) Method for detecting same-line characters in oblique image and related equipment thereof
CN115345895B (en) Image segmentation method and device for visual detection, computer equipment and medium
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
CN115063826A (en) Mobile terminal driver license identification method and system based on deep learning
CN113032071B (en) Page element positioning method, page testing method, device, equipment and medium
CN114495105A (en) Image tilt correction method, device, storage medium and computer equipment
CN113791426A (en) Radar P display interface generation method and device, computer equipment and storage medium
CN114119990A (en) Method, apparatus and computer program product for image feature point matching
CN116468611B (en) Image stitching method, device, equipment and storage medium
CN114820575B (en) Image verification method and device, computer equipment and storage medium
CN117636370A (en) Method and device for detecting image content
CN118135286A (en) Data set augmentation method, device, equipment and storage medium
CN117975772A (en) Interaction method, device, equipment and storage medium based on object jigsaw

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant