CN115482538A

CN115482538A - Material label extraction method and system based on Mask R-CNN

Info

Publication number: CN115482538A
Application number: CN202211420644.7A
Authority: CN
Inventors: 范柘
Original assignee: Wuxi Dingshi Technology Co ltd; Shanghai Aware Information Technology Co ltd
Current assignee: Wuxi Dingshi Technology Co ltd; Shanghai Aware Information Technology Co ltd
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2022-12-16
Anticipated expiration: 2042-11-15
Also published as: CN115482538B

Abstract

The invention provides a material label extraction method and system based on Mask R-CNN, and belongs to the technical field of image recognition. According to the invention, a Mask R-CNN model is used for extracting a first rectangular detection frame of a material label, a plurality of pairs of first control points positioned on the boundary of a text area of the material label and a Mask of the material label, so that the correction transformation of a material label image is realized, and a more reasonable external rectangle is further determined based on a transformation matrix to cut the material label of the corrected and transformed material label image, so that a corresponding material label character can be accurately extracted.

Description

Material label extraction method and system based on Mask R-CNN

Technical Field

The invention relates to the technical field of image recognition, in particular to a material label extraction method and system based on Mask R-CNN, electronic equipment and a computer storage medium.

Background

With the development of artificial intelligence and computer vision technology, the automatic identification and sorting of materials become common technologies in logistics transportation, and the automatic identification and sorting of materials seriously depend on the character identification result of material labels. The character recognition technology for material labels mainly comprises a two-stage method, namely, a character region of interest (RoI) is obtained through a character detection network, and then the character RoI region is handed to a character recognition network for recognition. However, because many materials have various morphological characteristics such as circular cross section, the labels are irregularly arranged in an arch shape, a fan shape and the like during printing, and the detection of the label characters of the materials in the prior art has the following defects and shortcomings: 1) The material labels are arranged in an arch shape, and a rectangular RoI area generated by the existing character detection algorithm is difficult to tightly wrap the labels, so that a large amount of background interference is brought, and the identification result of a subsequent character identification algorithm is inaccurate; 2) The arched area where the label is located can be effectively segmented by the segmentation-based character detection method, but due to the lack of a corresponding correction means, the label is still irregularly arranged in the RoI area of the segmented character by only using minimum external rectangle cutting, and the character recognition network is difficult to accurately recognize.

Therefore, the material label extraction method in the prior art has the defects, and cannot meet the actual use requirement.

Disclosure of Invention

In order to at least solve the technical problems in the background art, the invention provides a material label extraction method, a material label extraction system, electronic equipment and a computer storage medium based on Mask R-CNN.

The invention provides a Mask R-CNN-based material label extraction method, which is characterized by comprising the following steps of:

s1, standardizing a first material picture to be detected, and performing character detection on the first material picture subjected to the standardized processing by adopting a Mask R-CNN model to obtain a first rectangular detection frame of a material label, a plurality of pairs of first control points positioned on the boundary of a character area of the material label and a Mask of the material label;

s2, correcting any one first rectangular detection frame in the step S1 to obtain a second rectangular detection frame;

s3, uniformly taking second control points with the same number as the first control points on the upper and lower boundaries of the second rectangular detection frame, and calculating the corrected coordinates of the second control points;

s4, calculating a transformation matrix corresponding to a transformation algorithm according to the first coordinate of the first control point and the second coordinate of the second control point;

s5, transforming the first material picture in the step S1 by using the transformation matrix obtained in the step S4 to obtain a corrected second material picture;

s6, calculating a first ordered point set of the circumscribed polygon of the Mask obtained in the step S1, and performing transformation processing on the first ordered point set by using the transformation matrix in the step S4 to obtain a second ordered point set;

s7, calculating coordinates of a circumscribed rectangle of the second ordered point set;

s8, cutting the second material picture according to the coordinates of the circumscribed rectangle to obtain a corrected material label character picture;

s9, performing character recognition on the material label character picture to obtain the material label;

and S10, circulating the step S2 to the step S9 until all detected material labels are traversed.

Further, the coordinates of the jth first control point in step S1

Relative distance by regression from the Mask R-CNN model

And the first rectangle detection frame is obtained by calculation, and the calculation formula is as follows:

wherein the first rectangular detection frame is composed ofCorner coordinates of upper left corner

Wide, wide

And height

By definition, the following are expressed:

。

further, the step S2 specifically includes:

taking any one of the first rectangular detection frames in the step S1, taking the average length of the upper and lower boundaries of the text region boundary as the width of a rectangular template, taking the average height of the text region boundary as the height of the rectangular template, and scaling the rectangular template to a preset size under the condition of keeping the aspect ratio to obtain the second rectangular detection frame.

Further, in step S3, uniformly taking the same number of second control points as the first control points on the upper and lower boundaries of the second rectangular detection frame, and calculating coordinates of the second control points after correction, includes:

if the logarithm of the first control points is k, the total number of the first control points is 2k, and the size of the second rectangular detection frame obtained in step S2 is

；

Uniformly taking k second control points from left to right on the upper boundary of the second rectangular detection frame obtained in the step S2, and also taking k second control points at corresponding positions on the lower boundary;

wherein the coordinates of the second control point on the upper boundary of the second rectangular detection frame are:

the coordinates of the second control point on the lower boundary of the second rectangular detection frame are:

。

further, the corner point at the upper left corner of the second rectangular detection frame is placed on the first material picture

Obtaining the coordinates of each second control point in the first material picture, wherein,

、

is an offset.

Further, the transformation algorithm in step S4 uses a thin-plate spline interpolation (TPS) transformation.

Further, in step S6, the Mask is subjected to connected region extraction using the findContours function of OpenCV.

The second aspect of the invention provides a material label extraction system based on Mask R-CNN, which comprises a shooting module, a processing module and a storage module; the processing module is connected with the shooting module and the storage module;

the storage module is used for storing executable computer program codes;

the shooting module is used for shooting a first material picture and transmitting the first material picture to the processing module;

the processing module is configured to execute the method according to any one of the preceding claims by calling the executable computer program code in the storage module.

A third aspect of the present invention provides an electronic device comprising: a memory storing executable program code; a processor coupled with the memory; the processor calls the executable program code stored in the memory to perform the method of any of the preceding claims.

A fourth aspect of the invention provides a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the method as set out in any one of the preceding claims.

According to the scheme, aiming at the material labels which are arranged in a non-rectangular mode, a Mask R-CNN model is used for extracting a first rectangular detection frame of the material labels, a plurality of pairs of first control points located on the boundaries of character areas of the material labels and Mask masks of the material labels, accordingly, correction transformation of material label images is achieved, and more reasonable external rectangles are further determined based on a transformation matrix to cut the material labels of the corrected and transformed material label images, so that corresponding material label characters can be accurately extracted.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic flow chart of a material label extraction method based on Mask R-CNN disclosed in the embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a Mask R-CNN model disclosed in an embodiment of the present invention;

fig. 3 is a schematic diagram of a result of performing text detection on a first material picture according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a process of cutting a material label text picture according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a material label extraction system based on Mask R-CNN disclosed in the embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that, although the terms first, second, third, etc. may be used in the embodiments of the present application to describe \8230; \8230, these \8230; should not be limited to these terms. These terms are used only to distinguish between 8230; and vice versa. For example, a first 8230; also referred to as a second 8230; without departing from the scope of embodiments of the present application, similarly, the second one (8230) \\8230; also known as the first one (8230); 8230).

The words "if", as used herein may be interpreted as "at \8230; \8230whenor" when 8230; \8230when or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrases "comprising one of \8230;" does not exclude the presence of additional like elements in an article or system comprising the element.

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart of a material label extraction method based on Mask R-CNN according to an embodiment of the present invention. As shown in fig. 1, the method for extracting a material label based on Mask R-CNN of this embodiment includes the following steps:

s2, carrying out correction processing on any one first rectangular detection frame in the step S1 to obtain a second rectangular detection frame;

s5, transforming the first material picture in the step S1 by using the transformation matrix obtained by calculation in the step S4 to obtain a corrected second material picture;

s8, cutting the second material picture according to the coordinates of the external rectangle to obtain a corrected material label character picture;

and S10, the steps S2 to S9 are circulated until all detected material labels are traversed.

In the embodiment of the invention, since the material labels in many cases are not in a standard rectangular form, for example, an arc-shaped structure as shown in fig. 2, the invention firstly uses a character detector (i.e., a trained Mask R-CNN model) to perform preliminary detection of a character region on an initial first material picture, thereby determining a first rectangular detection frame, a plurality of pairs of first control points and Mask masks of the material labels; then, correcting the first rectangular detection frames containing the material labels, specifically arranging the label characters from left to right to obtain a second rectangular detection frame; and then, taking the same number of second control points from the second rectangular detection frame, determining a transformation matrix according to the coordinates of the first control points and the second control points, and transforming the initial first material picture by using the transformation matrix to obtain a corrected second material picture. And finally, processing the originally obtained first ordered point set of the circumscribed polygon corresponding to the Mask based on the transformation matrix to obtain a second ordered point set, determining the coordinates of the circumscribed rectangle of the second ordered point set at the moment, and cutting the second material picture by using the coordinates to obtain a material label character picture, namely, extracting the material label region image, and obtaining the material label by using a proper character extraction algorithm.

It should be noted that, in the step S1, the standardizing the material picture to be detected may include: and dividing each pixel of the picture to be detected by 255 for normalization, and then subtracting the mean value and dividing by the square difference to obtain the standardized picture to be detected.

Further, the coordinates of the jth first control point in step S1

Relative distance by regression from the Mask R-CNN model

wherein the first rectangle detection frame is composed of coordinates of corner points at the upper left corner

Wide, wide

And height

By definition, the following are expressed:

。

in the embodiment of the invention, referring to fig. 3, the logarithm of the control points obtained by the Mask R-CNN model regression is 7 pairs, and the total number of the control points is 14, wherein the number of the control points is 7 for each upper text boundary and lower text boundary. The control points on the upper and lower boundaries are uniformly and orderly arranged, and the upper and lower boundary control points form a pair pairwise.

In particular, the relative distance

Can be calculated by the following formula:

in the formula (I), the compound is shown in the specification,

is composed of

Corresponding normalized weights.

As shown in fig. 3, the regression result of the control points in the Mask R-CNN model in the present invention is the normalized relative distance between each control point and the focus distance at the top left corner of the first rectangular detection frame. And, normalizing the weights

Can be preset manually, e.g.

，

。

Further, the Mask R-CNN model adopts the following loss function when training:

wherein, the first and the second end of the pipe are connected with each other,

representing regression box classification loss;

representing the regression loss of the rectangular detection box;

represents the regression loss for each of the control points (e.g., the 14 control points shown in FIG. 3);

representing the Mask segmentation loss.

In the embodiment of the present invention, it is,

a cross-entropy loss function may be used in particular,

the Smooth L1 loss function may be used specifically;

a Smooth L1 loss function may be specifically used;

a binary cross entropy loss function may be used in particular.

Further, the step S2 specifically includes:

In the embodiment of the invention, the upper edge of the text area boundary of the material label is calculated in the following way: and calculating first distance values between every two control points of the upper boundary of the character area boundary obtained in the step S1, and taking the sum of the first distance values as the length of the upper boundary. The lower bound calculation is: and calculating second distance values between every two control points of the lower boundary of the character region boundary obtained in the step S1, and taking the sum of the second distance values as the length of the lower boundary. The average height of the character area boundary is calculated in the following mode: and calculating third distance values of the plurality of pairs of control points obtained in the step S1, wherein the average value of the third distance values between each pair of control points is used as the average height.

Specifically, the scaling the rectangular template to a preset size to obtain the second rectangular detection frame includes:

set the target short side length to

The short side of the rectangular template is recorded as

Scaling the short side of the rectangular template to

；

Calculating a scaling ratio

And scaling the long side of the rectangular template according to the scaling ratio r to obtain the second rectangular detection frame.

Wherein the target short side length can be set to

。

Further, as shown in fig. 4, in step S3, uniformly taking the same number of second control points as the first control points on the upper and lower boundaries of the second rectangular detection frame, and calculating coordinates of the second control points after correction, includes:

；

wherein the coordinates of the second control point of the upper boundary of the second rectangular detection frame are:

。

in the embodiment of the invention, the second control points are calibrated in the corrected second rectangular detection frame according to the same number of corresponding first control points, and then the coordinates of the second control points on the upper/lower boundary of the second rectangular detection frame are obtained.

Further, placing the corner point at the upper left corner of the second rectangular detection frame on the first material picture

、

is an offset.

In the embodiment of the invention, since the limited control points cannot accurately represent the character outline, and therefore, part of characters can be out of the control points during conversion, the rectangular template is not placed at the boundary position, namely, the point at the upper left corner of the second rectangular detection frame is not placed at (0, 0), but is placed at the picture

Wherein the offset amount can be set to

、

。

At this time, as shown in fig. 4, the coordinates of the second control point on the upper boundary of the second rectangular detection frame are changed to:

the coordinates of the second control point of the lower boundary of the second rectangular detection frame are changed to:

。

Step 6, performing connected region extraction on the Mask by using a findContours function of OpenCV.

In the embodiment of the present invention, the first ordered point set of the circumscribed polygon is calculated by using the transformation matrix obtained in step S4, so as to obtain the transformed second ordered point set, and then the transformed second ordered point set can be calculated by using the bounding function of OpenCVThe bounding rectangle of the ordered set of points can be represented as

。

Further, in step S8, the cutting the second material picture to obtain a corrected material label text picture includes:

cutting out the second material image

Is at the upper left corner and has a width of

A height of

And obtaining the corrected material label character picture in the rectangular area.

Example two

Referring to fig. 5, fig. 5 is a schematic structural diagram of a material label extraction system based on Mask R-CNN according to an embodiment of the present invention. As shown in fig. 5, the system for extracting a material label based on Mask R-CNN of the present embodiment includes a shooting module 101, a processing module 102, and a storage module 103; the processing module 102 is connected with the shooting module 101 and the storage module 103;

the storage module 103 is used for storing executable computer program codes;

the shooting module 101 is used for shooting a first material picture and transmitting the first material picture to the processing module 102;

the processing module 102 is configured to execute the method according to the first embodiment by calling the executable computer program code in the storage module 103.

The specific functions of the Mask R-CNN-based material label extraction system in this embodiment refer to the first embodiment, and since the system in this embodiment adopts all technical solutions of the first embodiment, at least all beneficial effects brought by the technical solutions of the first embodiment are achieved, and details are not repeated herein.

EXAMPLE III

Referring to fig. 6, fig. 6 is an electronic device according to an embodiment of the present invention, including: a memory storing executable program code; a processor coupled with the memory; the processor calls the executable program code stored in the memory to execute the method according to the first embodiment.

Example four

The embodiment of the invention also discloses a computer storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method in the first embodiment is executed.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It should be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in more detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention.

Claims

1. A material label extraction method based on Mask R-CNN is characterized by comprising the following steps:

s1, standardizing a first material picture to be detected, and performing character detection on the first material picture after standardized processing by adopting a Mask R-CNN model to obtain a first rectangular detection frame of a material label, a plurality of pairs of first control points positioned on the boundary of a character area of the material label and a Mask of the material label;

s6, calculating a first ordered point set of a circumscribed polygon of the Mask obtained in the step S1, and performing transformation processing on the first ordered point set by using the transformation matrix in the step S4 to obtain a second ordered point set;

2. The Mask R-CNN-based material label extraction method as claimed in claim 1, wherein: coordinates of jth first control point in step S1

Relative distance regressed by the Mask R-CNN model

wherein the first rectangular detection frame is composed of the coordinates of the corner point at the upper left corner

Wide, wide

And height

By definition, we mean as follows:

。

3. the Mask R-CNN-based material label extraction method according to claim 1 or 2, characterized in that: the step S2 specifically includes:

4. The Mask R-CNN-based material label extraction method according to claim 3, characterized in that: in step S3, uniformly taking the same number of second control points as the first control points from the upper and lower boundaries of the second rectangular detection frame, and calculating coordinates of the second control points after correction, including:

；

。

5. the Mask R-CNN-based material label extraction method according to claim 4, characterized in that: placing the corner point at the upper left corner of the second rectangular detection frame on the first material picture

、

is an offset.

6. The material label extraction method based on Mask R-CNN as claimed in claim 1, 2, 4 or 5, wherein: the transformation algorithm in step S4 uses a thin-plate spline interpolation (TPS) transformation.

7. The Mask R-CNN-based material label extraction method as claimed in claim 1, 2, 4 or 5, wherein: in step S6, the Mask is subjected to connected region extraction using the findContours function of OpenCV.

8. A material label extraction system based on Mask R-CNN comprises a shooting module, a processing module and a storage module; the processing module is connected with the shooting module and the storage module;

the storage module is used for storing executable computer program codes;

the method is characterized in that: the processing module for performing the method of any one of claims 1-7 by invoking the executable computer program code in the storage module.

9. An electronic device, comprising:

a memory storing executable program code;

a processor coupled with the memory;

the method is characterized in that: the processor calls the executable program code stored in the memory to perform the method of any of claims 1-7.

10. A computer storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, performs the method of any one of claims 1-7.