CN113191358B

CN113191358B - Metal part surface text detection method and system

Info

Publication number: CN113191358B
Application number: CN202110603294.7A
Authority: CN
Inventors: 谷朝臣; 官同坤; 王臻
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2023-01-24
Anticipated expiration: 2041-05-31
Also published as: CN113191358A

Abstract

The invention provides a method and a system for detecting a text on the surface of a metal part, which comprises the following steps: a pretreatment step: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image; foreground feature focusing step: based on the preprocessed image, obtaining a saliency map through the image characteristics of the depth convolution network highlight text area; multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box; post-treatment: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score. The method solves the problem of text detection with complex background caused by metal attribute and industrial environment, realizes automatic segmentation of the character image of the metal part, outputs a high-precision text positioning frame, and improves the detection precision.

Description

Metal part surface text detection method and system

Technical Field

The invention relates to the technical field of text detection, in particular to a method and a system for detecting a text on the surface of a metal part.

Background

Text information is used as a key ring in the information era, is applied to network electronic information, text printing, traffic signs, product trademarks and the like, and plays an increasingly important role in the scientific era, so that the research on Optical Character Recognition (OCR) plays an important role in the fields of intelligent automation, information processing, AI and the like. Optical Character Recognition (OCR) applications hatched in the business scenario of enterprise resource planning have received a great deal of attention, such as gesture recognition, package print recognition and metal surface character recognition. Among them, tracking of metal parts is the most challenging in many industrial scenarios.

The direct metal part marking technology is a main means for marking part products, and means that determined part information is directly printed on the products when the parts are manufactured and produced, and the direct metal part marking technology mainly comprises three modes of laser engraving, pinhole marking and ink jet marking. The research and the analysis of OCR technique to metal parts surface character sign can be at the processing lines of all kinds of machines on the quick identification part model, information such as production information and producer, prevent the artifical emergence that leads to the mistake because of discerning fatigue, improve production efficiency.

The existing text detection method mainly researches the influence of the complexity of a natural scene, however, because a character data set on the surface of a metal part is difficult to collect, and in the field of text detection of the metal part, the problems of strong light reflection on the metal surface, large difference of metal textures, inconsistent character arrangement, poor foreground and background contrast, complex metal texture background and the like exist, the positioning of a text detection frame is not accurate enough, and the character identification tracked by the metal part is difficult.

Patent document CN110222680A (application number: CN 201910416098.1) discloses a text detection method for outer packaging of municipal solid waste articles: collecting an image data set of an outer package of the urban garbage article, and labeling a text region of each image in the image data set; generating a text score characteristic map and a multi-channel position characteristic map for each image in the image data set subjected to labeling according to the labeling of the text region to form a training label of each image; images in the image data set are processed according to the following steps of 9:1 into a training set and a test set; constructing a full convolution neural network model and training to obtain a trained full convolution neural network model; acquiring a prediction text region of an image to be detected by using a trained full convolution neural network model; a threshold value screening stage; and a non-maximum value inhibition stage, and obtaining a final text area detection result.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for detecting a text on the surface of a metal part.

The invention provides a metal part surface text detection method, which comprises the following steps:

a pretreatment step: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;

foreground feature focusing: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;

multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;

and (3) post-treatment: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score.

Preferably, the pretreatment step comprises:

an image enhancement step: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal surface character image based on an RGB image self-adaptive histogram, sharpening the metal surface character image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.

Preferably, the foreground feature focusing step includes:

semantic segmentation step: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;

foreground focusing: and comparing the saliency map with a mask map with tag information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features.

Preferably, the multi-scale rectification step comprises:

polygon selection: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;

a position correction step: and coding the suggested text box into a fixed shape by applying the ROI pooling model, extracting the ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box.

Preferably, the post-processing step comprises:

and (4) re-grading: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;

a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.

The invention provides a metal part surface text detection system, which comprises:

a preprocessing module: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;

a foreground feature focusing module: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;

a multi-scale rectification module: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;

a post-processing module: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score.

Preferably, the preprocessing module comprises:

an image enhancement module: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal surface character image based on an RGB image self-adaptive histogram, sharpening the metal surface character image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.

Preferably, the foreground feature focusing module includes:

a semantic segmentation module: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;

a foreground focusing module: and comparing the saliency map with a mask map with tag information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features.

Preferably, the multi-scale rectification module comprises:

a polygon selection module: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;

a position correction module: and coding the suggested text box into a fixed shape by applying the ROI pooling model, extracting the ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box.

Preferably, the post-processing module comprises:

a re-scoring module: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;

a non-maxima suppression module: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention relates to a metal part surface text detection method based on refined positions and classification characteristics, which aims at the phenomena of low contrast, strong light reflection, uneven characters and complex textures on a metal surface, enhances the character contrast of the metal surface through self-adaptive histogram equalization and image sharpening, designs a semantic segmentation method focusing on the foreground, and highlights the text characteristics of a character area;

(2) The invention provides a quick and effective polygon selection algorithm aiming at the phenomenon that the text positioning of metal parts is inaccurate, so that background frames are effectively filtered, more accurate foreground frames are provided for correcting a network for regression, and the positioning effect is improved;

(3) The invention provides a re-scoring mechanism by combining the example scores of the predicted position frames, thereby not only obtaining the detection frames with accurate positioning, but also improving the comprehensive index of text detection.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of text detection on a surface of a metal part.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any manner. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The embodiment is as follows:

the method adopts a self-adaptive histogram equalization and image sharpening method to preprocess the RGB image; aiming at the characteristics of low contrast, strong light reflection, character corrosion and the like of the surface of the metal part, respectively adopting attention mechanism focusing text characteristics of high-level characteristics and low-level characteristics; then, more text region features with low contrast and difficult distinguishing are detected by adopting a foreground-based segmentation loss function, so that the segmentation effect is further improved; aiming at the problem that the prediction box of the algorithm is prone to have inaccurate positioning in the text edge area, selecting a better prediction box according to a saliency map obtained by segmentation, and then carrying out secondary correction on a suggestion box; calculating the example score of the correction frame based on the position and the score of the correction frame, and screening all the correction frames by adopting a re-scoring mechanism; and finally, obtaining a final prediction frame according to a non-maximum suppression algorithm, and establishing a text detection evaluation system by combining the indexes of calculating the accuracy, recall rate, comprehensive index, IOU and the like of the IOU from the prediction frame, thereby realizing the digital quantitative evaluation of the text detection algorithm.

The invention provides a metal part surface text detection method based on a refined position and classification characteristics, which comprises the following steps:

a pretreatment step: identifying the metal surface character image, and carrying out image enhancement on the metal surface character image through preprocessing to obtain a high-quality preprocessed image;

foreground feature focusing step: inputting the preprocessed metal surface character image, and obtaining a segmented saliency map through the image characteristics of a deep convolution network highlight text area;

multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and sending the selected text boxes into the correction feature network for re-evaluation;

and (3) post-treatment: extracting the example scores of the corrected network prediction frame on the image features of the saliency map, re-scoring the suggestion frame based on the classification scores of the corrected feature network prediction by combining the example scores and the classification scores, and obtaining a final corrected text frame by applying a non-maximum suppression post-processing method.

Specifically, the pretreatment step includes:

an image enhancement step: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplacian operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.

Specifically, the foreground feature focusing step includes:

semantic segmentation step: sending the metal part character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding an adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;

foreground focusing step: and comparing the saliency map obtained in the semantic segmentation step with a mask map with label information, and setting segmentation threshold values for low-contrast and difficult-to-identify areas to exchange more foreground text features.

Specifically, the multi-scale rectification step comprises:

polygon selection: filtering text boxes generated by convolutional networks of different levels as masks according to the binarization result of the semantically segmented saliency map, excluding text boxes in a background area, and brushing to obtain multi-scale suggested text boxes;

and a position correction step: and coding the suggested text box generated in the polygon selection step into a fixed shape by applying an RROI pooling model, extracting the characteristics of the ROI area, and sending the ROI area characteristics into a classification and regression network to obtain a corrected text box.

Specifically, the post-processing step includes:

and (3) re-grading: calculating the score of the modified text box example according to the binarized saliency map, and re-evaluating the score of each modified text box by combining the prediction score of the modified text box;

As shown in fig. 1, the segmentation method based on foreground feature focusing, the multi-scale position correction method and the text box re-scoring mechanism of the present invention process the enhanced metal part image, so as to realize the detection and positioning of the text box, and include the following steps:

step 100, image preprocessing, including step 110 and step 120, wherein adaptive histogram equalization and image sharpening are adopted to enhance the image, improve the image quality and prepare for subsequent processing;

step 110: enhancing the contrast of the metal part image, and adopting different enhancement schemes for different parts of the image based on self-adaptive histogram equalization to enhance the contrast and simultaneously keep the image details;

step 120: realizing image sharpening based on a Laplace operator, keeping high-frequency information of the image, and highlighting detail characteristics of a text region, wherein the Laplace operator is as follows:

step 200: foreground feature focusing based segmentation comprises steps 210, 220 and 230, an attention mechanism is set based on pyramid network hierarchical levels of a ResNet-FPN framework, multi-level features are fused to enhance hierarchical feature relevance, high-level features and low-level features are combined to obtain a saliency map, and then a loss function focusing foreground is designed to promote segmentation of low-contrast text images.

Step 210: the low-level network has more details and poor semantic perception capability, while the high-level network has more semantic information and less detail characteristics. A parallel structure is designed for a high-level network, and a multi-resolution subnet of a high-level feature map is fused by mutually exchanging information, so that each level of the multi-resolution subnet is ensured to contain feature information with higher resolution, and the spatial features of the subnet are enriched. And weighting each channel of the low-level network to highlight text features, and acquiring more semantic information.

Specifically, let the pyramid network be divided into P ₂ ,P ₃ ,P ₄ ,P ₅ Four layers, conv, DConv and UConv respectively represent different convolution types, c is the channel number, j represents a pixel point on the characteristic diagram, CA is a channel attention mechanism, and P is selected ₂ As a low-level network, P ₃ ,P ₄ ,P ₅ As a high-level network, the following processing is performed on the low-level features:

L _map ＝Conv(P ₂ )

the high-level network has multi-scale subnets, and more detailed characteristics are obtained by the following processing:

then fusing the high-level feature maps

And low-level feature maps

Forming a supervised saliency map S _map 。

Step 220: according to the characteristics of low-contrast characters on a metal surface, when the background noise is eliminated, the character features of the text edge can be lost in a segmented mode, the regression position is easy to be incorrect, the foreground is concerned more, and firstly, dice is introduced to serve as a loss function L _dice To solve the problem of the imbalance of positive and negative samples, then the significance map segmentation is promoted according to the following two original design segmentation loss mechanisms: a) As many text features as possible are included; b) The number of false detections is minimized. Assume that the tag is S _gt And the label is related to S _map Is the difference of S _Diff The following loss function is proposed.

L _a ＝(S _Diff ≥1/2)*(1-F)

L _b ＝(-S _Diff ≥1/2)*F

Gamma denotes the equilibrium parameter, F denotes S _map The result after binarization, Δ, represents an upper limit of the allowable false detection rate in exchange for detecting more text features in low contrast and indistinguishable areas.

Step 230: the complicated metal part background is easy to interfere with a subsequent classification network and a regression network, so that a saliency map obtained by semantic segmentation is applied to each hierarchical subnet, text features in the classification network and the regression network are highlighted, and background noise is suppressed, and the method specifically comprises the following steps:

the classification network and regression network refer to two sub-networks of common structure and different parameters, which are composed of four 3 × 3 convolutional layers and one 3 × 5 convolutional layer. Each layer of the pyramid network is used for classification and regression tasks, respectively, after the saliency map is weighted. The method effectively inhibits background noise, retains more text features as far as possible, generates a prediction box with high matching precision, has the defect that the position of the prediction box cannot effectively cover the text, and is difficult to realize metal tracking subsequently, and the problem can be solved in secondary regression based on a saliency map subsequently.

Step 300 is a multi-scale saliency map-based rectification step, comprising steps 310 and 320, using foreground-focus based segmentation results, employing a polygon selection algorithm, and applying a refinement model to recalibrate the proposed box.

Step 310: each layer of the pyramid network generates a prediction box through classification and regression subnetworks, however, especially for oblique texts, the positions of the prediction boxes are not accurate enough, so k =500 prediction boxes attached to the foreground are obtained based on a saliency map obtained through segmentation and fed into a refinement feature network, and a specific algorithm is as follows:

step 320: and (3) coding the suggested text box generated in the polygon selection step into a fixed shape by applying an RROI pooling model, extracting the ROI regional characteristics from the ResNet-FPN frame, and sending the ROI regional characteristics into a classification and regression network to obtain the corrected text box position and score.

Step 400: a re-scoring mechanism, which comprises a step 410 and a step 420, wherein a score of a modified text box instance is calculated based on the binarized saliency map, and a score is re-evaluated for each modified text box in combination with a prediction score of the modified text box;

step 410: and calculating a score of a modified text box instance by combining the extracted saliency frames according to the positions and the scores of the text boxes generated by the modified network, and re-evaluating the score of each text box. Assume a classification score of S for the suggested text box _c Example score of S _I The specific process comprises the following steps:

P _V ＝{ρ ¹ ,…,ρ ⁿ }

wherein: mu is set to 1/4 _V For the suggestion box from S _map The extracted set of pixels.

Step 420: and an NMS (non-maximum rendering) algorithm is applied to remove repeated text boxes, and a text detection evaluation system is established based on indexes such as the IOU value establishing accuracy, the recall rate and the comprehensive score of the prediction box and the label.

The accuracy calculation formula:

recall ratio calculation formula:

the comprehensive fraction calculation formula is as follows:

where tp, fp, fn represent the number of hit text boxes, identify erroneous text boxes and missing text boxes, respectively.

The method solves the problem of positioning the text information on the surface of the metal part in the industrial environment so as to help the metal part to track and record on the industrial production line; through research and analysis to metal parts surface character sign, can discern information such as part model, size and producer fast on the processing lines of all kinds of machines, prevent the artifical emergence that leads to the mistake because of discerning fatigue, improve production efficiency.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for detecting texts on the surfaces of metal parts is characterized by comprising the following steps:

a pretreatment step: recognizing the metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;

foreground feature focusing step: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;

post-treatment: calculating an example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain a final text box position in combination with the prediction score;

the pretreatment step comprises:

an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of a metal surface character image is enhanced in a balanced mode, meanwhile, a Laplacian operator is adopted to sharpen the metal surface character image, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;

the foreground feature focusing step comprises:

foreground focusing step: comparing the saliency map with a mask map with label information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features;

the multi-scale rectification step comprises:

a position correction step: coding the suggested text box to a fixed shape by applying an ROI pooling model, extracting ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box;

the post-processing step comprises:

and (3) re-grading: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;

2. A metal part surface text detection system, comprising:

a foreground feature focusing module: based on the preprocessed image, obtaining a saliency map through the image characteristics of the depth convolution network highlight text area;

a post-processing module: calculating an example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain a final text box position in combination with the prediction score;

the preprocessing module comprises:

an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of a metal surface character image is enhanced in a balanced mode, meanwhile, a Laplacian operator is adopted to sharpen the metal surface character image, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;

the foreground feature focusing module includes:

a semantic segmentation module: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding an adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;

a foreground focusing module: comparing the saliency map with a mask map with label information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features;

the multi-scale rectification module comprises:

a position correction module: coding the suggested text box to a fixed shape by applying an ROI pooling model, extracting ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box;

the post-processing module comprises: