CN114973288A

CN114973288A - Non-commodity image text detection method, system and computer storage medium

Info

Publication number: CN114973288A
Application number: CN202210595852.4A
Authority: CN
Inventors: 周昌世
Original assignee: Chengdu Renren Mutual Entertainment Technology Co ltd
Current assignee: Chengdu Renren Mutual Entertainment Technology Co ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-08-30

Abstract

The invention discloses a non-commodity image text detection method, a non-commodity image text detection system and a computer storage medium. Wherein, the method comprises the following steps: 100, acquiring an original picture and carrying out channel extraction; 200, synthesizing the extracted R, G, B channels into a white background image and synthesizing the extracted Alpha channels into a gray scale image; 300, performing text detection on the gray level image, and entering 400 when no character is detected; otherwise, entering 800; 400, carrying out contour detection on the gray-scale image to obtain a contour region on the gray-scale image and calculating to obtain a circumscribed rectangular frame of the contour region; 500, performing text detection based on pixel segmentation on the white background image to obtain a character rectangular frame of characters on the white background image; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image; entering 800 when the area intersection ratio is not less than a first preset threshold value; and 800, judging that the original picture is a unqualified picture. The method greatly improves the efficiency of manual review and saves manpower and financial resources.

Description

Non-commodity image text detection method, system and computer storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a non-commodity image-text book detection method, a non-commodity image-text book detection system and a computer storage medium.

Background

With the development of the internet, online shopping becomes a more and more fierce life style and electric business of people, and digital consumption plays a positive role in promoting the vigorous development of the consumer market in China and expanding internal demand.

At present, the online shopping malls such as Tanbao, Jingdong, Sunning, Amazon and the like are opened, online payment is carried out after satisfactory commodities are selected, and the courier sends the commodities of the heart instrument to the home for two or three days. The experience of shopping online is a daily life of a considerable part of people. In recent years, the increase of mobile phone netizens and the development of mobile payment enable online shopping to become a thing to buy anytime and anywhere. In the era of online shopping, a large number of enterprises and individuals take online shopping express trains to share the cake.

However, each e-commerce will recruit sellers of various goods, and the goods sellers show the goods on each e-commerce platform by uploading pictures of the goods sold, so that the buyers can decide whether to buy the goods by the physical pictures of the goods. However, the uploaded pictures are different in times, some pictures are taken randomly, a large number of descriptive characters of descriptive commodities are randomly added to some pictures, shopping experience of users is seriously influenced, contradictions can be caused to promotion of unified festivals of the platform by some descriptive characters, so that the e-commerce platform has certain requirements on the pictures uploaded by merchants, such as the fact that four channels of pictures need to be uploaded and only a main part of the commodities are displayed, and descriptive characters irrelevant to the commodities are not added, so that the platform can be used for uniformly synthesizing festival atmosphere commodity drawings. In order to avoid such situations, a manual review method is usually adopted to review the pictures uploaded by the seller. The picture auditing team adopts a manual auditing mode to detect, a large amount of labor and time cost is consumed, and the efficiency of manual auditing is difficult to meet the continuously increasing picture auditing requirement.

Aiming at the problems of great labor and time cost waste and low efficiency caused by detecting whether the pictures uploaded by the seller have descriptive characters or not by adopting a manual auditing mode in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a non-commodity image text detection method, a non-commodity image text detection system and a computer storage medium, and aims to solve the problems that in the prior art, whether a picture uploaded by a seller has descriptive characters or not is detected in a manual auditing mode, a large amount of labor and time cost are wasted, and the efficiency is low.

In order to achieve the above object, in one aspect, the present invention provides a method for detecting non-commercial texts, the method comprising: step 100, acquiring an original picture and carrying out channel extraction on the original picture; step 200, synthesizing the extracted R channel, G channel and B channel into a white background image, and synthesizing the extracted Alpha channel into a gray image; step 300, carrying out text detection based on pixel segmentation on the gray level image, and entering step 400 when no text is detected; otherwise, go to step 800; step 400, carrying out contour detection on the gray-scale image to obtain a contour region on the gray-scale image and calculating to obtain a circumscribed rectangular frame of the contour region; step 500, performing text detection based on pixel segmentation on the white background image to obtain a character rectangular box of characters on the white background image; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image; when the area intersection ratio is smaller than a first preset threshold, entering step 600; otherwise, go to step 800; step 600, summing the areas of all character rectangular frames in the current circumscribed rectangular frame and calculating the area ratio of the summed area in the current circumscribed rectangular frame; step 700, judging whether the area ratio is larger than a second preset threshold value, if so, entering step 800; step 800, determining that the original picture is a non-conforming picture.

Optionally, when it is determined that the area ratio is not greater than the second preset threshold, step 900 is performed; step 900, calculating the width and height of the white background image and the width and height of each character rectangular box; and judging whether the ratio of the character rectangular frame in the white background image is larger than a third preset threshold value or not according to the width and the height of the character rectangular frame and the width and the height of the white background image, and if yes, entering the step 800.

Optionally, the obtaining an original picture and performing channel extraction on the original picture includes: acquiring an original picture; judging whether the original picture is a four-channel transparent picture or not; and if so, carrying out channel extraction on the original picture.

Optionally, the step of obtaining a bounding rectangle of the outline region by calculation includes: calculating the area of the contour region and filtering out the contour region with the area smaller than a third preset threshold value; and calculating to obtain a circumscribed rectangle frame of the filtered outline area.

Optionally, a calculation formula of the area intersection ratio between the text rectangular frame and the circumscribed rectangular frame on the white background map is as follows:

x ₁ ＝max(x _a1 ,x _b1 )；

y ₁ ＝max(y _a1 ,y _b1 )；

x ₂ ＝max(x _a2 ,x _b2 )；

y ₂ ＝max(y _a2 ,y _b2 )；

intersection＝max(x ₂ -x ₁ +1.0,0)*max(y ₂ -y ₁ +1.0,0)；

SA＝(x _a2 -x _a1 +1.0)*(y _a2 -y _a1 +1.0)；

SB＝(x _b2 -x _b1 +1.0)*(y _b2 -y _b1 +1.0)；

Union＝SA+SB-intersection；

iou＝intersection/Union；

wherein the coordinate of the character rectangular box is [ x ] _a1 ,y _a1 ,x _a2 ,y _a2 ]The coordinate of the circumscribed rectangular frame is [ x ] _b1 ,y _b1 ,x _b2 ,y _b2 ]，x ₁ ，y ₁ Coordinates of the upper left corner of the intersection part of the character rectangular frame and the external rectangular frame are obtained; x is the number of ₂ ，y ₂ Coordinates of the lower right corner of the intersection part of the character rectangular frame and the external rectangular frame; the intersections is the area of the intersection part of the character rectangular frame and the circumscribed rectangular frame; SA is the area of the character rectangular frame; SB is the area of the circumscribed rectangular frame;union is the total area of the character rectangular frame and the external rectangular frame; iou is the area intersection ratio of the character rectangular frame and the external rectangular frame.

Optionally, the performing text detection on the grayscale image based on pixel segmentation includes: zooming the gray scale image; and inputting the scaled gray-scale image into a pre-trained text detection model for text detection based on pixel segmentation.

In another aspect, the present invention provides a non-commodity image text detection system, including: the device comprises an acquisition unit, a channel extraction unit and a processing unit, wherein the acquisition unit is used for acquiring an original picture and carrying out channel extraction on the original picture; the synthesis unit is used for synthesizing the extracted R channel, G channel and B channel into a white background image and synthesizing the extracted Alpha channel into a gray image; a first judging unit, which is used for carrying out text detection based on pixel division on the gray level image, and entering an outline detecting unit when no text is detected; otherwise, entering a judging unit; the contour detection unit is used for carrying out contour detection on the gray-scale image to obtain a contour region on the gray-scale image and calculating to obtain a circumscribed rectangular frame of the contour region; the second judgment unit is used for carrying out text detection based on pixel segmentation on the white background image to obtain a character rectangular frame of characters on the white background image; mapping the circumscribed rectangular frame to the white background image, and calculating the area intersection ratio of the character rectangular frame and the circumscribed rectangular frame on the white background image; when the area intersection ratio is smaller than a first preset threshold value, entering an area ratio calculation unit; otherwise, entering a judging unit; the area ratio calculation unit is used for summing the areas of all character rectangular frames in the current circumscribed rectangular frame and calculating the area ratio of the summed area in the current circumscribed rectangular frame; a third judging unit, configured to judge whether the area ratio is greater than a second preset threshold, and if so, enter the determining unit; and the judging unit is used for judging that the original picture is a unqualified picture.

Optionally, when it is determined that the area ratio is not greater than the second preset threshold, entering a fourth determination unit; the fourth judging unit is used for calculating the width and the height of the white background image and the width and the height of each character rectangular frame; and judging whether the ratio of the character rectangular frame to the white background map is larger than a third preset threshold value or not according to the width and the height of the character rectangular frame and the width and the height of the white background map, and if so, entering a judging unit.

Optionally, the obtaining unit includes: an obtaining subunit, configured to obtain an original picture; the judging subunit is used for judging whether the original picture is a four-channel transparent picture; and if so, carrying out channel extraction on the original picture.

In another aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the non-commodity image text detection method described above.

The invention has the beneficial effects that:

the invention provides a non-commodity image text detection method, a non-commodity image text detection system and a computer storage medium, wherein the method carries out outline detection through a gray-scale image to obtain an outline area on the gray-scale image and calculate to obtain an external rectangular frame of the outline area; carrying out text detection based on pixel segmentation through a white background image to obtain a character rectangular frame of characters on the white background image; and mapping the circumscribed rectangle frame to the white background picture. By the method, the characters on the original picture are positioned, the problem that the characters of the commodity picture are detected and are judged as unqualified pictures by mistake due to the fact that the characters of the commodity picture are detected by directly using a text detection algorithm is effectively solved, the efficiency of manual examination and verification is greatly improved, and manpower and financial resources are saved.

Drawings

Fig. 1 is a flowchart of a non-commercial product text book detection method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a non-commercial product text book detection system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In online shopping malls such as Taobao, Jingdong, Sunning and Amazon, sellers of various commodities are recruited by each electronic commerce, and the commodity sellers show the commodities on each electronic commerce platform by uploading pictures of the commodities sold, so that buyers can determine whether to buy the commodities or not through the physical pictures of the commodities. However, the uploaded pictures are different in times, some pictures are taken randomly, a large number of descriptive characters of descriptive commodities are randomly added to some pictures, shopping experience of users is seriously influenced, contradictions can be caused to promotion of unified festivals of the platform by some descriptive characters, so that the e-commerce platform has certain requirements on the pictures uploaded by merchants, such as the fact that four channels of pictures need to be uploaded and only a main part of the commodities are displayed, and descriptive characters irrelevant to the commodities are not added, so that the platform can be used for uniformly synthesizing festival atmosphere commodity drawings. In order to avoid such situations, a manual review method is usually adopted to review the pictures uploaded by the seller. The picture auditing team adopts a manual auditing mode to detect, a large amount of labor and time cost is consumed, and the efficiency of manual auditing is difficult to meet the continuously increasing picture auditing requirement.

Therefore, the present invention provides a non-commodity image text detection method, and fig. 1 is a flowchart of the non-commodity image text detection method provided by the embodiment of the present invention, and as shown in fig. 1, the method includes:

step 100, acquiring an original picture and carrying out channel extraction on the original picture;

in an alternative embodiment, the step 100 comprises:

1001, acquiring an original picture;

step 1002, judging whether the original picture is a four-channel transparent picture; and if so, carrying out channel extraction on the original picture.

Specifically, in an uncompressed 32-bit RGB image, each pixel is composed of four parts: one Alpha channel and three R pass, G pass, B channel. When the Alpha value is 0, the pixel is completely transparent, and when the Alpha value is 255, the pixel is completely opaque. When a merchant uploads pictures to the e-commerce platform, the e-commerce platform generally requires the merchant to upload a transparent picture with four channels of 32 bits and an Alpha value of 0 in order to uniformly manage the display effect of various festivals. Therefore, the channel judgment is firstly carried out on the pictures (original pictures) uploaded by the merchants, and if the pictures are not the pictures with four channels, the pictures are directly rejected to allow the merchants to upload again; if the image is a transparent image of four channels, the channels are further separated and extracted. Namely, R channel, G channel, B channel and Alpha channel are extracted.

Step 200, synthesizing the extracted R channel, G channel and B channel into a white background image, and synthesizing the extracted Alpha channel into a gray image;

step 300, carrying out text detection based on pixel segmentation on the gray level image, and entering step 400 when no text is detected; otherwise, go to step 800;

in an alternative embodiment, the conventional text detection method generally uses a manual feature extraction method to detect text, such as SWT, MSER, and the like, and then uses a template matching or model training method to identify the detected text. The traditional method has a plurality of limitations, however, the method adopts a text detection method based on pixel segmentation, and not only can detect horizontal texts and vertical texts, but also can accurately position texts in any shapes and irregular-shaped text examples. The robustness and detection performance of text detection are greatly improved.

The pixel segmentation based text detection of the grayscale map comprises:

zooming the gray scale image;

specifically, in order to prevent the situation that the memory of the display card is full when the individual image pixels are too large and the image is input into the network, the gray-scale image needs to be scaled (resize), that is, the gray-scale image with the image width and height larger than 1080 pixels needs to be scaled to 1080 pixels at the maximum, the small sides are scaled by the same times in equal proportion, and the gray-scale image with the image width and height smaller than 1080 pixels does not need to be processed.

And inputting the scaled gray-scale image into a pre-trained text detection model for text detection based on pixel segmentation.

Specifically, a pre-trained text detection model is read in, a scaled gray scale image is input into the text detection model, features of the gray scale image are extracted through a backbone network, four-dimensional features are output, then, high-level features and low-level features are fused to obtain (P2, P3, P4 and P5) four feature layers, wherein the number of channels (channels) of each feature layer is 256. The four feature layers concat are then merged to give F, which is sent to the Conv (3,3) -BN-ReLU layer and the number of channels of the feature layer is changed to 256. Then, F was sent to multiple Conv (1,1) -Up-Sigmod layers to obtain n segmentation results S1, S2, … Sn (actually obtained by convolution with 1 × n1 × 1 × n1 × 1 × n).

In the above, F ═ C (P2, P3, P4, P5) ═ P2| | Upx2(P3) | Upx4(P4) | Upx8 (P5); wherein, | | represents concat (merge); up represents Up sampling; upx2, Upx4, Upx8 represent 2-fold, 4-fold and 8-fold upsampling, respectively.

Now the network outputs S1, S2, … Sn, assuming that n is 3, and now S1, S2, S3, where S1 represents the segmentation result of the minimum kernel (convolution), and it has four connected domains C ═ { C1, C2, C3, C4}, CC operation gets four connected domains in S1 (small scale, large margin between different text lines, and easy to distinguish), and we know that the kernel (convolution) in S2 is larger than the kernel (convolution) in S1, that is, the kernel in S2 is the one containing the kernel in S1. And then distributing the pixel points which belong to the kernel in S2 but do not belong to the kernel in S1. And expanding each pixel of the found connected component in the BFS mode one by one to the upper, lower, left and right sides, namely, gradually widening the area of the text line predicted in S1 (or in other words, for each pixel point of kernel in S2, respectively allocating the pixel points to a certain connected component in S1). Following S3, we finally extract the connected component after S3 as the final text line detection result (i.e. whether the text is detected or not).

When no text is detected, the process proceeds to step 400, otherwise, the original picture is determined to be a non-qualified picture (i.e. the picture uploaded by the merchant has descriptive text).

Step 400, carrying out contour detection on the gray-scale image to obtain a contour region on the gray-scale image and calculating to obtain a circumscribed rectangular frame of the contour region;

in an optional embodiment, the calculating a bounding rectangle of the outline region includes:

calculating the area of the contour region and filtering out the contour region with the area smaller than a third preset threshold value;

and calculating to obtain a circumscribed rectangle frame of the filtered outline area.

For example: some characters are not written outside the commodity image independently, but only one row of characters (3000W) are written outside the commodity image and in a small circular area, and the characters cannot be detected when the characters are directly detected on the gray level image based on text detection of pixel division. Therefore, further detection is needed, namely, the gray-scale image is subjected to contour detection to obtain a contour region on the gray-scale image; calculating the area of the contour region and filtering out the contour region with the area smaller than a third preset threshold value; the outline area of the commodity image and the small circular area are reserved; and solving a circumscribed rectangle frame of the two outline areas.

Step 500, performing pixel segmentation-based text detection on the white background image to obtain a character rectangular frame of characters on the white background image; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image; when the area intersection ratio is smaller than a first preset threshold, entering step 600; otherwise, go to step 800;

in an alternative embodiment, the white background map is image scaled;

specifically, in order to prevent the situation that the memory of the display card is full when the individual image pixels are too large and the image is input into the network, image scaling (resize) needs to be performed on the white background image, that is, the gray image with the image width and height larger than 1080 pixels is scaled, the maximum edge is scaled to 1080 pixels, the small edges are scaled by the same times in equal proportion, and the white background image with the image width and height smaller than 1080 pixels is not processed.

Inputting the zoomed white background picture into a pre-trained text detection model (psenet) to perform text detection based on pixel segmentation to obtain a character rectangular frame of characters on the white background picture; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image;

the calculation formula of the area intersection ratio of the character rectangular frame and the external rectangular frame on the white background picture is as follows:

x ₁ ＝max(x _a1 ,x _b1 )；

y ₁ ＝max(y _a1 ,y _b1 )；

x ₂ ＝max(x _a2 ,x _b2 )；

y ₂ ＝max(y _a2 ,y _b2 )；

intersection＝max(x ₂ -x ₁ +1.0,0)*max(y ₂ -y ₁ +1.0,0)；

SA＝(x _a2 -x _a1 +1.0)*(y _a2 -y _a1 +1.0)；

SB＝(x _b2 -x _b1 +1.0)*(y _b2 -y _b1 +1.0)；

Union＝SA+SB-intersection；

iou＝intersection/Union；

wherein the coordinate of the character rectangular box is [ x ] _a1 ,y _a1 ,x _a2 ,y _a2 ]The coordinate of the circumscribed rectangular frame is [ x ] _b1 ,y _b1 ,x _b2 ,y _b2 ]，x ₁ ，y ₁ Coordinates of the upper left corner of the intersection part of the character rectangular frame and the external rectangular frame are obtained; x is the number of ₂ ，y ₂ Coordinates of the lower right corner of the intersection part of the character rectangular frame and the external rectangular frame; the intersections is the area of the intersection part of the character rectangular frame and the circumscribed rectangular frame; SA is the area of the character rectangular frame; SB is the area of the circumscribed rectangular frame; union is the character rectangular frame andthe total area of the circumscribed rectangular frame; iou is the area intersection ratio of the character rectangular frame and the external rectangular frame.

When the area intersection ratio is smaller than a first preset threshold value, the process proceeds to step 600, otherwise, the original picture is determined to be an unqualified picture (that is, the picture uploaded by the merchant has descriptive characters).

Step 600, summing the areas of all character rectangular frames in the current circumscribed rectangular frame and calculating the area ratio of the summed area in the current circumscribed rectangular frame;

in an alternative embodiment, there may be two, three, … … text rectangle boxes within a small circular area; therefore, on the basis of the method, the area of the character rectangular frames of all the lines in the small circular area needs to be summed to obtain Sc, and the area ratio A of the Sc to the circumscribed rectangular frame in the small circular area is calculated;

step 700, judging whether the area ratio is larger than a second preset threshold value, if so, entering step 800;

and judging whether the area ratio A is larger than a second preset threshold value, if so, regarding that the character region detected on the white background image is the character on the merchant P and is an irregular character region, and rejecting the image to ask the merchant to upload again. Otherwise, go to step 900.

Step 800, determining that the original picture is a non-conforming picture.

Step 900, calculating the width and height of the white background image and the width and height of each character rectangular box; and judging whether the ratio of the character rectangular frame in the white background map is larger than a third preset threshold value according to the width and the height of the character rectangular frame and the width and the height of the white background map, if so, entering the step 800.

In an alternative embodiment, for example: the picture, the commodity picture occupies 2/3 of the original picture, the text area occupies 1/3 of the original picture, the text area is connected with the commodity picture, the text area is in a small rectangular frame, and the picture cannot be filtered by the method. The text detection based on pixel segmentation is directly carried out through a gray image, a detected region is a full white region, and characters cannot be detected; if the outline detection is performed through the gray-scale image, the obtained circumscribed rectangle frame is a total frame combining the commodity image and the character area (since two are connected, the detected circumscribed rectangle frame is an area, and the circumscribed rectangle frame is also one). At the moment, text detection based on pixel segmentation is carried out on the white background image to obtain a character rectangular frame of characters on the white background image, and the area intersection ratio of the character rectangular frame and the external rectangular frame is smaller than a first preset threshold value; and the ratio of the area of all the character rectangular frames in the external rectangular frame to the area of the external rectangular frame is still smaller than a second preset threshold, and the picture can not be filtered by the method.

Therefore, the invention also relates to a judgment rule, namely calculating the width and the height of the white background picture and the width and the height of each character rectangular box; and judging whether the ratio of the character rectangular frame to the white background map is larger than a third preset threshold value according to the width and the height of the character rectangular frame and the width and the height of the white background map, if so, regarding that the picture uploaded by the merchant has the descriptive characters on the P, rejecting the picture, and requesting the merchant to upload again. Otherwise, the original picture is considered to be a transparent picture (qualified picture) meeting the platform requirement.

According to the method, on one hand, text detection can be effectively carried out on the commodity pictures uploaded by the merchants, and positioning detection can be carried out on character areas (including bending, inclining, horizontal and vertical) in any shapes. On the other hand, the method can also effectively distinguish the characters of the commodity and the characters of the non-commodity (the descriptive characters on the merchant P), thereby effectively avoiding false detection caused by directly using character detection, and also obtaining 99% accuracy and 98% recall rate of the E-commerce picture in the scene.

Fig. 2 is a schematic structural diagram of a non-commercial product text book detection system according to an embodiment of the present invention, and as shown in fig. 2, the system includes:

an obtaining unit 201, configured to obtain an original picture and perform channel extraction on the original picture;

in an optional implementation, the obtaining unit 201 includes:

an acquiring subunit 2011, configured to acquire an original picture;

a determining subunit 2012, configured to determine whether the original picture is a four-channel transparent picture; and if so, carrying out channel extraction on the original picture.

A synthesizing unit 202 for synthesizing the extracted R channel, G channel, and B channel into a white background image, and synthesizing the extracted Alpha channel into a gray scale image;

a first judgment unit 203 for performing text detection based on pixel division on the gray scale image, and entering an outline detection unit when no text is detected; otherwise, entering a judging unit;

The pixel segmentation based text detection of the grayscale map comprises:

zooming the gray scale image;

Specifically, a pre-trained text detection model is read in, a scaled gray scale image is input into the text detection model, features of the gray scale image are extracted through a backbone network, four-dimensional features are output, then, high-level features and low-level features are fused to obtain (P2, P3, P4 and P5) four feature layers, wherein the number of channels (channels) of each feature layer is 256. The four feature layers concat are then merged to give F, which is sent to the Conv (3,3) -BN-ReLU layer and the number of channels of the feature layer is changed to 256. F was then sent to multiple Conv (1,1) -Up-Sigmod layers to obtain n segmentation results S1, S2, … Sn (actually obtained by 1 x n1 x n1 x 1 x n convolution).

In the above description, F ═ C (P2, P3, P4, P5) ═ P2| | Upx2(P3) | Upx4(P4) | | Upx8 (P5); wherein, | | represents concat (merge); up represents Up sampling; upx2, Upx4, Upx8 represent 2-fold, 4-fold and 8-fold upsampling, respectively.

When no text is detected, the method enters the outline detection unit 204, otherwise, the original picture is determined to be an unqualified picture (that is, the picture uploaded by the merchant has descriptive text).

The contour detection unit 204 is configured to perform contour detection on the grayscale to obtain a contour region on the grayscale and calculate to obtain a circumscribed rectangular frame of the contour region;

A second judging unit 205, configured to perform text detection based on pixel segmentation on the white background image, so as to obtain a character rectangular frame of characters on the white background image; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image; when the area intersection ratio is smaller than a first preset threshold value, entering an area ratio calculation unit; otherwise, entering a judging unit;

in an alternative embodiment, the white background map is image scaled;

x ₁ ＝max(x _a1 ,x _b1 )；

y ₁ ＝max(y _a1 ,y _b1 )；

x ₂ ＝max(x _a2 ,x _b2 )；

y ₂ ＝max(y _a2 ,y _b2 )；

intersection＝max(x ₂ -x ₁ +1.0,0)*max(y ₂ -y ₁ +1.0,0)；

SA＝(x _a2 -x _a1 +1.0)*(y _a2 -y _a1 +1.0)；

SB＝(x _b2 -x _b1 +1.0)*(y _b2 -y _b1 +1.0)；

Union＝SA+SB-intersection；

iou＝intersection/Union；

wherein the character rectangular frameThe coordinate is [ x ] _a1 ,y _a1 ,x _a2 ,y _a2 ]The coordinate of the circumscribed rectangular frame is [ x ] _b1 ,y _b1 ,x _b2 ,y _b2 ]，x ₁ ，y ₁ Coordinates of the upper left corner of the intersection part of the character rectangular frame and the external rectangular frame are obtained; x is the number of ₂ ，y ₂ Coordinates of the lower right corner of the intersection part of the character rectangular frame and the external rectangular frame; the intersections is the area of the intersection part of the character rectangular frame and the circumscribed rectangular frame; SA is the area of the character rectangular frame; SB is the area of the circumscribed rectangular frame; union is the total area of the character rectangular frame and the external rectangular frame; iou is the area intersection ratio of the character rectangular frame and the external rectangular frame.

When the area intersection ratio is smaller than a first preset threshold, the method enters an area ratio calculation unit 206, otherwise, the original picture is determined to be an unqualified picture (that is, the picture uploaded by the merchant has descriptive characters).

An area ratio calculation unit 206, configured to sum up areas of all character rectangular frames in the current circumscribed rectangular frame and calculate an area ratio of the summed area in the current circumscribed rectangular frame;

in an alternative embodiment, there may be two rows and three rows of … … text rectangles within a small circular area; therefore, on the basis of the method, the area of the character rectangular frames of all the lines in the small circular area needs to be summed to obtain Sc, and the area ratio A of the Sc to the circumscribed rectangular frame in the small circular area is calculated;

a third determining unit 207, configured to determine whether the area ratio is greater than a second preset threshold, and if so, enter the determining unit; when the area ratio is judged to be not larger than a second preset threshold value, entering a fourth judgment unit;

and judging whether the area ratio A is larger than a second preset threshold value, if so, regarding that the character region detected on the white background image is the character on the merchant P and is an irregular character region, and rejecting the image to ask the merchant to upload again. Otherwise, the process proceeds to the fourth judging unit 209.

A determining unit 208, configured to determine that the original picture is a failed picture.

A fourth judging unit 209, configured to calculate the width and height of the white background image and calculate the width and height of each of the rectangular text boxes; and judging whether the ratio of the character rectangular frame to the white background map is larger than a third preset threshold value or not according to the width and the height of the character rectangular frame and the width and the height of the white background map, and if so, entering a judging unit.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the non-commodity image text detection method described above.

The storage medium stores the software, and the storage medium includes but is not limited to: optical disks, floppy disks, hard disks, erasable memory, etc.

The invention has the beneficial effects that:

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A non-commodity picture text detection method is characterized by comprising the following steps:

step 500, performing text detection based on pixel segmentation on the white background image to obtain a character rectangular box of characters on the white background image; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image; when the area intersection ratio is smaller than a first preset threshold, entering step 600; otherwise, go to step 800;

step 800, determining that the original picture is a non-conforming picture.

2. The method according to claim 1, wherein when it is determined that the area ratio is not greater than a second preset threshold, step 900 is entered;

3. The method of claim 1, wherein obtaining the original picture and performing channel extraction on the original picture comprises:

acquiring an original picture;

judging whether the original picture is a four-channel transparent picture or not; and if so, carrying out channel extraction on the original picture.

4. The method of claim 1, wherein said computing a bounding rectangle for the outline region comprises:

5. The method according to claim 1, wherein the calculation formula of the area intersection ratio of the character rectangular frame and the circumscribed rectangular frame on the white background map is as follows:

x ₁ ＝max(x _a1 ,x _b1 )；

y ₁ ＝max(y _a1 ,y _b1 )；

x ₂ ＝max(x _a2 ,x _b2 )；

y ₂ ＝max(y _a2 ,y _b2 )；

intersection＝max(x ₂ -x ₁ +1.0,0)*max(y ₂ -y ₁ +1.0,0)；

SA＝(x _a2 -x _a1 +1.0)*(y _a2 -y _a1 +1.0)；

SB＝(x _b2 -x _b1 +1.0)*(y _b2 -y _b1 +1.0)；

Union＝SA+SB-intersection；

iou＝intersection/Union；

wherein, theThe coordinate of the character rectangular box is [ x ] _a1 ,y _a1 ,x _a2 ,y _a2 ]The coordinate of the circumscribed rectangular frame is [ x ] _b1 ,y _b1 ,x _b2 ,y _b2 ]，x ₁ ，y ₁ Coordinates of the upper left corner of the intersection part of the character rectangular frame and the external rectangular frame are obtained; x is the number of ₂ ，y ₂ Coordinates of the lower right corner of the intersection part of the character rectangular frame and the external rectangular frame; the intersections is the area of the intersection part of the character rectangular frame and the circumscribed rectangular frame; SA is the area of the character rectangular frame; SB is the area of the circumscribed rectangular frame; union is the total area of the character rectangular frame and the external rectangular frame; iou is the area intersection ratio of the character rectangular frame and the external rectangular frame.

6. The method of claim 1, wherein the pixel segmentation based text detection of the grayscale map comprises:

zooming the gray scale image;

7. A non-commodity picture text detection system is characterized by comprising:

the device comprises an acquisition unit, a channel extraction unit and a processing unit, wherein the acquisition unit is used for acquiring an original picture and carrying out channel extraction on the original picture;

the synthesis unit is used for synthesizing the extracted R channel, G channel and B channel into a white background image and synthesizing the extracted Alpha channel into a gray image;

a first judging unit, which is used for carrying out text detection based on pixel division on the gray level image, and entering an outline detecting unit when no text is detected; otherwise, entering a judging unit;

the contour detection unit is used for carrying out contour detection on the gray-scale image to obtain a contour region on the gray-scale image and calculating to obtain a circumscribed rectangular frame of the contour region;

the second judgment unit is used for carrying out text detection based on pixel segmentation on the white background image to obtain a character rectangular frame of characters on the white background image; mapping the circumscribed rectangle frame to the white background image, and calculating the area intersection ratio of the character rectangle frame and the circumscribed rectangle frame on the white background image; when the area intersection ratio is smaller than a first preset threshold value, entering an area ratio calculation unit; otherwise, entering a judging unit;

the area ratio calculation unit is used for summing the areas of all character rectangular frames in the current circumscribed rectangular frame and calculating the area ratio of the summed area in the current circumscribed rectangular frame;

a third judging unit, configured to judge whether the area ratio is greater than a second preset threshold, and if so, enter the determining unit;

and the judging unit is used for judging that the original picture is a unqualified picture.

8. The system according to claim 7, characterized in that when the area ratio is judged not to be larger than a second preset threshold, a fourth judgment unit is entered;

the fourth judging unit is used for calculating the width and the height of the white background image and the width and the height of each character rectangular frame; and judging whether the ratio of the character rectangular frame to the white background map is larger than a third preset threshold value or not according to the width and the height of the character rectangular frame and the width and the height of the white background map, and if so, entering a judging unit.

9. The system of claim 7, wherein the obtaining unit comprises:

an obtaining subunit, configured to obtain an original picture;

the judging subunit is used for judging whether the original picture is a four-channel transparent picture; and if so, carrying out channel extraction on the original picture.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a non-commercial text detection method according to any one of claims 1 to 6.