CN112789623A

CN112789623A - Text detection method, device and storage medium

Info

Publication number: CN112789623A
Application number: CN201880098360.6A
Authority: CN
Inventors: 柯福全; 王喜顺; 王俊
Original assignee: Bitmain Technologies Inc
Current assignee: Bitmain Technologies Inc
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2021-05-11
Also published as: WO2020097909A1

Abstract

The invention provides a text detection method, a text detection device and a storage medium.A mask map of a target image including a character area is obtained through a neural network model; acquiring a first detection frame of the character area based on the mask image; if the first detection frame meets the preset cutting condition, cutting the first detection frame to obtain a second detection frame; and taking the image corresponding to the second detection frame as a text detection result. The text detection method can be used for processing the long text box and the curved text box, and the accuracy of obtaining the text detection box is improved.

Description

Text detection method, device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a text detection method, a text detection device and a storage medium.

Background

With the development of communication technology, a user can conveniently acquire an interested image through an intelligent terminal and acquire character information contained in the image. The intelligent terminal can recognize according to characters contained in the image, and then converts the characters in the image into editable texts according to recognition results, so that secondary editing and rapid sharing of text information in the image are achieved.

Text detection is a precondition step of text recognition, and the region of characters in the image is determined through text detection. Current detection methods can be divided into two categories: one is single character detection, and then detection boxes are combined; one type is detection frame regression, which is mainly to output a plurality of candidate rectangular frames through neural network detection and then screen out the final detection frame by carrying out non-maximum value inhibition based on the candidate rectangular frames.

The labeling workload of single character detection is large, and large-scale training data is difficult to obtain. The rectangular frames selected by the regression of the detection frames either have cross areas or can not completely cover the original text areas, so that multiple detections or missed detections are caused.

Disclosure of Invention

The text detection method, the text detection device and the storage medium provided by the embodiment of the invention improve the precision of obtaining the text detection box.

In order to achieve the purpose, the invention provides the following technical scheme:

a first aspect of the present invention provides a text detection method, including:

acquiring a mask map of a target image including a character area through a neural network model;

acquiring a first detection frame of the character area based on the mask image;

if the first detection frame meets a preset cutting condition, cutting the first detection frame to obtain a second detection frame;

and taking the image corresponding to the second detection frame as a text detection result.

In a possible implementation manner, the neural network model is obtained by training the image data labeled with the text true value box by adopting a convolutional neural network U-Net structure.

In a possible implementation manner, the obtaining a first detection box of the text region based on the mask map includes:

extracting an outer contour of the mask map;

and fitting the external contour to obtain a first detection frame of the character area.

In a possible implementation manner, if the first detection frame meets a preset cutting condition, cutting the first detection frame to obtain a second detection frame includes:

if the ratio of the area of the outer contour to the area of the first detection frame is smaller than a preset ratio and the length-width ratio of the first detection frame is larger than a preset length-width ratio, cutting the first detection frame to obtain a second detection frame.

In a possible implementation manner, the cutting the first detection frame to obtain a second detection frame includes:

and performing equal proportion segmentation on the first detection frame according to the preset length-width ratio to obtain at least two second detection frames.

In a possible implementation manner, the taking the image corresponding to the second detection box as a text detection result includes:

judging whether a connecting line of the cutting points of the second detection frame is cut to the characters or not, and if so, adjusting the positions of the cutting points;

and taking the image corresponding to the adjusted second detection frame as a text detection result.

In a possible implementation manner, the adjusting the position of the cutting point includes:

intercepting a first image in a preset range of the cutting point connecting line in the second detection frame;

acquiring an average gradient curve corresponding to the first image;

and determining a new cutting point position according to the average gradient curve.

In a possible implementation, the determining a new cutting point position according to the average gradient curve includes:

and taking the position of the first image corresponding to the minimum average gradient value in the average gradient curve as a new cutting point position.

A second aspect of the present invention provides a character detection apparatus, including:

the acquisition module is used for acquiring a mask map of a target image including a character area through a neural network model;

the obtaining module is further configured to obtain a first detection box of the text area based on the mask map;

the cutting module is used for cutting the first detection frame to obtain a second detection frame if the first detection frame meets a preset cutting condition;

and the determining module is used for taking the image corresponding to the second detection frame as a text detection result.

A third aspect of the present invention provides a character detection apparatus, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the text detection method according to any one of the first aspect of the invention.

A fourth aspect of the invention provides a computer readable storage medium having stored thereon a computer program for execution by a processor to implement the text detection method according to any one of the first aspect of the invention.

The embodiment of the invention provides a text detection method, a text detection device and a storage medium, wherein a mask map of a target image including a character area is obtained through a neural network model; acquiring a first detection frame of the character area based on the mask image; if the first detection frame meets the preset cutting condition, cutting the first detection frame to obtain a second detection frame; and taking the image corresponding to the second detection frame as a text detection result. The text detection method can be used for processing the long text box and the curved text box, and the accuracy of obtaining the text detection box is improved.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only exemplary embodiments, and that other drawings can be obtained by those skilled in the art without inventive efforts.

Fig. 1 is a schematic flowchart of a text detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a target image according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a mask map corresponding to a target image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an outer contour of a white region in a mask map according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a first detection frame of a target image according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a second testing frame after dicing according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of a text detection method according to another embodiment of the present invention;

fig. 8 is a schematic diagram illustrating an adjustment of the position of the cutting point of the second detection frame according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a hardware structure of a text detection apparatus according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terms "comprising" and "having," and any variations thereof, in the description and claims of this invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

"and/or" in the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B, and may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Reference throughout this specification to "one embodiment" or "another embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in some embodiments" or "in this embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The text detection method provided by the embodiment of the invention specifically provides a new generation mode of the detection box, after the mask image of the text is generated through the neural network model, the final text detection box is determined through image processing based on the mask image, and the image corresponding to the text detection box is taken as the final text detection result so as to be convenient for subsequent processing such as text recognition. Compared with the prior art, the text detection method provided by the embodiment can process the long text box and the curved text box, and has high detection precision.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a schematic flowchart of a text detection method according to an embodiment of the present invention, fig. 2 is a schematic diagram of a target image according to an embodiment of the present invention, fig. 3 is a schematic diagram of a mask image corresponding to the target image according to an embodiment of the present invention, fig. 4 is a schematic diagram of an outer contour of a white area in the mask image according to an embodiment of the present invention, and fig. 5 is a schematic diagram of a first detection box of the target image according to an embodiment of the present invention; fig. 6 is a schematic diagram of a second detection frame after cutting according to an embodiment of the present invention.

As shown in fig. 1, the text detection method provided in this embodiment includes the following steps:

s101, obtaining a mask map of a target image including a character area through a neural network model;

the target image of the embodiment is a color or black-and-white image shot by a user through an intelligent terminal, and the image comprises text information. For example, a user takes a child picture, and the target image includes a cartoon character and a text description, as shown in fig. 2.

It should be noted that, the text information in the target image captured by the user may be bent and deformed due to different capturing angles or different states of the target object, for example, when the user captures a child picture book, the text information in the captured target image may be bent and deformed due to the fact that the book itself is not flat. In contrast, the text detection method provided by the embodiment can accurately confirm the deformed character area in the image.

The neural network model in this embodiment is obtained by training image data labeled with a text truth box by using a convolutional neural network U-Net structure. The training process is as follows:

marking characters in the sample image according to lines, drawing a truth value frame for each line of characters, and properly reducing the marked truth value frame (mainly considering deformation); and inputting the sample image marked with the true value frame into a convolution neural network U-Net structure for training. The U-Net structure in this embodiment actually solves a two-classification problem, where the text box of the sample image is a positive sample and the background is a negative sample. Because the samples are not balanced, the neural network model is trained by adopting dice pass as a loss function.

U-Net is a variant of the convolutional neural network, and the structure of the convolutional neural network is similar to the letter U, so that the name of U-Net is obtained. U-Net improves based on FCN (full convolutional Neural Network), and some data of comparatively few samples can be trained with data enhancement. The whole neural network mainly comprises two parts: a contracted path and an expanded path. The contraction path is mainly used for capturing context information in the target image, and the symmetrical expansion path is used for accurately positioning a part needing to be segmented in the target image.

In this step, by inputting the target image into the convolutional neural network model, a mask map of the target image including the text region can be obtained, that is, positions of a plurality of candidate regions and a plurality of candidate regions corresponding to the text information in the target image can be obtained.

The mask image is composed of black and white, a black area of the mask image is a non-character area in the target image, and a white area of the mask image is a character area in the target image. As shown in fig. 3, the hatched portions indicate black areas of the mask image, that is, non-character areas in the target image, and the white areas are character areas in the target image.

S102, acquiring a first detection frame of the character area based on the mask image;

after the mask map including the text region in the target image is acquired in S101, the outer contour of the mask map, specifically, the outer contour of the white region in the mask map, is extracted based on the mask map, as shown by 3 dashed boxes in fig. 4.

And fitting the external contour to obtain a first detection frame of the character area, wherein a black solid line rectangular frame shown in fig. 5 is the first detection frame of the character area. It should be noted that the first detection frame is an initial detection frame of the text region in the target image.

In general, there is more than one first detection frame of the acquired target image, and therefore, there is a possible case where there is a possibility that an intersection region may exist in the acquired first detection frame. In addition, there is another possible case that the acquired first detection box may not cover the original text region, and may include too many non-text regions. In view of the above problems, the prior art solutions do not solve well. In this regard, in this embodiment, further image processing is performed on the acquired first detection frame to obtain a more accurate detection frame, which is specifically referred to in S103.

S103, if the first detection frame meets a preset cutting condition, cutting the first detection frame to obtain a second detection frame;

in this step, the preset cutting conditions include a first preset cutting condition and a second preset cutting condition. And only when the first detection frame simultaneously meets the first preset cutting condition and the second cutting condition, the first detection frame is cut.

Specifically, if the ratio of the area of the outer contour to the area of the first detection frame is smaller than a preset ratio and the aspect ratio of the first detection frame is larger than the preset aspect ratio, the first detection frame is cut to obtain a second detection frame.

It can be understood by those skilled in the art that if there is distortion in the text information in the target image captured by the user, the area of the outer contour of the text region in the target image extracted in S102 is inevitably smaller than the area of the fitted first detection box.

As the first detection frame '0' in fig. 5, if the ratio of the area of the outer contour to the area of the first detection frame is 0.6 and is smaller than the preset ratio (e.g., 0.8), the first detection frame '0' satisfies the first preset cutting condition; in addition, if the size of the first detection box '0' is 24 × 2, that is, the length direction is 24pix, the width direction is 2pix, and the preset aspect ratio is 8, it may be determined that the aspect ratio of the first detection box '0' is 12 and is greater than the preset aspect ratio 8, and at this time, the first detection box '0' satisfies the second preset cutting condition. Therefore, the first detection frame '0' needs to be cut. Similarly, it is determined that the cutting process of the first detection frame '1' is required based on the preset cutting condition.

It should be noted that, the size of the first detection box '2' in fig. 5 is 28 × 2, the aspect ratio thereof is 14, which is greater than the preset aspect ratio 8, but since the ratio of the area of the outer contour thereof to the area of the first detection box '2' is 0.9, which is greater than the preset ratio 0.8, it indicates that the detection box has fully covered the text area in the target image, and in the case that only the second preset cutting condition is satisfied, no further cutting of the detection box is performed.

In addition to the first detection frame '2' in fig. 5, there is a possible case where the first detection frame satisfies the first preset cutting condition but does not satisfy the second preset cutting condition, and further cutting of the detection frame is not performed at this time. For example, a shorter first detection frame with some deformation.

In this embodiment, the cutting of the first detection frame is to divide the first detection frame in equal proportion according to the preset length-width ratio to obtain at least two second detection frames. For example, if the first detection frame '0' in fig. 5 has a size of 24 × 2 and a predetermined aspect ratio of 8, the first detection frame '0' is cut into two parts in equal proportion, and the cut sizes are 16 × 2 and 8 × 2, respectively, to obtain second detection frames '3' and '4', as shown in fig. 6.

And S104, taking the image corresponding to the second detection frame as a text detection result.

After the second detection frame is determined, the image corresponding to the second detection frame is used as a text detection result so as to be convenient for subsequent text recognition and other processing.

According to the text detection method provided by the embodiment of the invention, a mask map of a target image including a character area is obtained through a neural network model; acquiring a first detection frame of the character area based on the mask image; if the first detection frame meets the preset cutting condition, cutting the first detection frame to obtain a second detection frame; and taking the image corresponding to the second detection frame as a text detection result. The text detection method can be used for processing the long text box and the curved text box, and the accuracy of obtaining the text detection box is improved.

On the basis of the foregoing embodiment, the text detection method provided in this embodiment is mainly for solving the problem existing when the first detection frame is cut in the foregoing embodiment, the first detection frame is cut in an equal proportion cutting manner, and a connection line of a cut point of the detection frame may be aligned to a character in a target image, which may result in a failure of text recognition because the position of the cut point needs to be adjusted.

The text detection method provided in this embodiment is described in detail below with reference to the drawings.

Fig. 7 is a schematic flowchart of a text detection method according to another embodiment of the present invention, and fig. 8 is a schematic diagram of adjusting a position of a cut point of a second detection frame according to an embodiment of the present invention.

As shown in fig. 7, the text detection method provided in this embodiment includes the following steps:

s201, obtaining a mask map of a target image including a character area through a neural network model;

s202, acquiring a first detection box of the character area based on the mask image;

s203, if the first detection frame meets the preset cutting condition, cutting the first detection frame to obtain a second detection frame;

s201 to S203 in this embodiment are the same as S101 to S103 in the above embodiments, and the implementation principle and technical effect thereof are the same, which refer to the above embodiments specifically, and are not described herein again.

S204, judging whether the connecting line of the cutting point of the second detection frame is cut to the characters, if so, adjusting the position of the cutting point;

in this embodiment, when determining that the line of the cut point of the second detection frame cuts the text in the target image, the position of the cut point needs to be adjusted. The specific adjustment rules are as follows:

intercepting a first image in a preset range of a cutting point connecting line in a second detection frame;

acquiring an average gradient curve corresponding to the first image;

and determining the position of a new cutting point according to the average gradient curve. In particular, the method comprises the following steps of,

As shown in fig. 8, after the first detection frame '0' in the above embodiment is cut, two second detection frames '3' and '4' are obtained, and the two second detection frames just cut the text "you" in the target image. Including four cutting points p in the graph₀、p ₁、p ₂、p ₃. Wherein, the corresponding side of the left second detection frame '3' is p₁p ₂The corresponding side of the second detection box '4' on the right is p₀p ₃. The adjusting process is to adjust the positions of the two edges at the left side and the right side of the cutting position, and comprises the following steps:

1) with edge p to the left of the cutting point₁p ₂H pixels are expanded along the horizontal axis direction for the left and right sides of the center, wherein h is the side p₁p ₂Obtaining a position adjusting rectangular frame by the height of the second detection frame;

2) and (3) intercepting an image (namely a first image) on the original target image corresponding to the position adjusting rectangular frame, zooming the image to a preset height, for example, the height of the original image is 8pix, and amplifying to obtain an image with the height of 32 pix. Based on the scaled image, a gradient map of the image is calculated, for example, by sliding a small window with a height of 32pix and a width of 4pix along the horizontal direction of the image, and an average gradient of all positions of the image is calculated, wherein the average gradient of a position is equal to the gradient of each pixel in the sliding window of the position and divided by the number of pixels in the sliding window.

3) Taking the mean gradient minimumThe position is used as a new cutting point position, the position is converted back to the scale position of the second detection frame according to the proportional relation to obtain a new cutting point position of the second detection frame, the cutting points of the two second detection frames are updated by using the position to obtain p₀’、p ₁’、p ₂’、p ₃’。

And S205, taking the image corresponding to the adjusted second detection frame as a text detection result.

And after the updated cutting point position is determined, obtaining an adjusted second detection frame, and taking an image corresponding to the adjusted second detection frame as a text detection result. The adjusted second detection box obtained by the text detection method provided by the embodiment has no problem of character cutting, and the text detection precision is improved.

According to the text detection method provided by the embodiment of the invention, a mask map of a target image including a character area is obtained through a neural network model; acquiring a first detection frame of the character area based on the mask image; if the first detection frame meets the preset cutting condition, cutting the first detection frame to obtain a second detection frame; when the connecting line of the cutting point of the second detection frame is determined to cut the characters, adjusting the position of the cutting point; and taking the image corresponding to the adjusted second detection frame as a text detection result. Compared with the above embodiments, the text detection method of the present embodiment has higher text detection accuracy.

Fig. 9 shows a text detection apparatus, which is only illustrated in fig. 9, and the embodiment of the present invention does not show that the present invention is limited thereto.

Fig. 9 is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention, and as shown in fig. 9, a text detection apparatus 30 according to this embodiment includes:

the acquiring module 31 is configured to acquire a mask map including a text region in a target image through a neural network model;

the obtaining module 32 is further configured to obtain a first detection box of the text area based on the mask map;

the cutting module 33 is configured to cut the first detection frame to obtain a second detection frame if the first detection frame meets a preset cutting condition;

and the determining module 34 is configured to use the image corresponding to the second detection box as a text detection result.

The text detection device provided by the embodiment of the invention comprises an acquisition module, a cutting module and a determination module, wherein the acquisition module is used for acquiring a mask image comprising a character area in a target image through a neural network model, and acquiring a first detection frame of the character area based on the mask image; if the first detection frame meets the preset cutting condition, the cutting module is used for cutting the first detection frame to obtain a second detection frame; the determining module is used for taking the image corresponding to the second detection frame as a text detection result. The text detection device can be used for processing the long text box and the curved text box, and the accuracy of obtaining the text detection box is improved.

On the basis of the above embodiment, optionally, the neural network model is obtained by training the image data labeled with the text true value box by using a convolutional neural network U-Net structure.

Optionally, the obtaining module 31 is specifically configured to:

extracting an outer contour of the mask map;

Optionally, the cutting module 33 is specifically configured to:

The determining module 34 is specifically configured to:

Optionally, the adjusting the position of the cutting point includes:

acquiring an average gradient curve corresponding to the first image;

Optionally, the determining a new cutting point position according to the average gradient curve includes:

The text detection apparatus provided in this embodiment may implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

Fig. 10 shows a text detection device, which is only illustrated in fig. 10, and the embodiment of the present invention does not show that the present invention is limited thereto.

Fig. 10 is a schematic diagram of a hardware structure of a text detection apparatus according to an embodiment of the present invention, and as shown in fig. 10, the text detection apparatus 40 according to the embodiment includes:

a memory 41;

a processor 42; and

a computer program;

wherein the computer program is stored in the memory 41 and configured to be executed by the processor 42 to implement the technical solution of any one of the foregoing method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

Alternatively, the memory 41 may be separate or integrated with the processor 42.

When the memory 41 is a device independent from the processor 42, the text detection apparatus 40 further includes:

a bus 43 for connecting the memory 41 and the processor 42.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor 42 to implement the steps performed by the text detection apparatus 40 in the above method embodiments.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

A text detection method, comprising:

acquiring a mask map of a target image including a character area through a neural network model;

acquiring a first detection frame of the character area based on the mask image;

if the first detection frame meets a preset cutting condition, cutting the first detection frame to obtain a second detection frame;

and taking the image corresponding to the second detection frame as a text detection result.
The method of claim 1, wherein the neural network model is obtained by training image data labeled with text truth boxes by using a convolutional neural network U-Net structure.
The method of claim 1, wherein the obtaining a first detection box of the text region based on the mask map comprises:

extracting an outer contour of the mask map;

and fitting the external contour to obtain a first detection frame of the character area.
The method according to claim 3, wherein the cutting the first detection frame to obtain a second detection frame if the first detection frame meets a preset cutting condition includes:

if the ratio of the area of the outer contour to the area of the first detection frame is smaller than a preset ratio and the length-width ratio of the first detection frame is larger than a preset length-width ratio, cutting the first detection frame to obtain a second detection frame.
The method of claim 4, wherein the cutting the first detection frame to obtain a second detection frame comprises:

and performing equal proportion segmentation on the first detection frame according to the preset length-width ratio to obtain at least two second detection frames.
The method according to claim 1, wherein the taking the image corresponding to the second detection box as the text detection result comprises:

judging whether a connecting line of the cutting points of the second detection frame is cut to the characters or not, and if so, adjusting the positions of the cutting points;

and taking the image corresponding to the adjusted second detection frame as a text detection result.
The method of claim 6, wherein the adjusting the position of the cut point comprises:

intercepting a first image in a preset range of the cutting point connecting line in the second detection frame;

acquiring an average gradient curve corresponding to the first image;

and determining a new cutting point position according to the average gradient curve.
The method of claim 7, wherein determining a new cut point location from the mean gradient profile comprises:

and taking the position of the first image corresponding to the minimum average gradient value in the average gradient curve as a new cutting point position.
A character detection apparatus, comprising:

the acquisition module is used for acquiring a mask map of a target image including a character area through a neural network model;

the obtaining module is further configured to obtain a first detection box of the text area based on the mask map;

the cutting module is used for cutting the first detection frame to obtain a second detection frame if the first detection frame meets a preset cutting condition;

and the determining module is used for taking the image corresponding to the second detection frame as a text detection result.
A character detection apparatus, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement a text detection method as claimed in any one of claims 1 to 8.
A computer-readable storage medium, having stored thereon a computer program for execution by a processor to implement a text detection method according to any one of claims 1 to 8.