CN114266881A

CN114266881A - Pointer type instrument automatic reading method based on improved semantic segmentation network

Info

Publication number: CN114266881A
Application number: CN202111365897.4A
Authority: CN
Inventors: 徐望明; 何钦; 闫富海; 黄酋淦; 伍世虔
Original assignee: Wuhan University of Science and Engineering WUSE
Current assignee: Wuhan University of Science and Engineering WUSE
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-04-01

Abstract

The invention relates to an automatic reading method of a pointer instrument based on an improved semantic segmentation network. The method comprises the following steps: s1: semantic segmentation, namely acquiring original images of a target instrument and inputting the original images into a target neural network model to acquire a semantic prediction graph, and respectively acquiring binary graphs of scale lines, pointers and range numbers from the target semantic prediction graph; s2: image correction, namely fitting an ellipse through the center point of each scale line profile in the target scale line binary image, establishing perspective transformation between the ellipse and a standard circle, and projecting the target instrument image, the target scale line, the pointer and the binary image of the range number to the plane of the target standard circle to obtain a corrected image; s3: and reading calculation, namely identifying the corrected range digital binary image and the instrument image to obtain start-stop scale reading, obtaining a polar coordinate expansion image of the corrected scale mark and the pointer binary image, positioning the scale mark and the pointer position in the expansion image, repairing the target scale mark, calculating the relative position of the pointer and the scale mark, and obtaining final reading.

Description

Pointer type instrument automatic reading method based on improved semantic segmentation network

Technical Field

The invention relates to the field of computer processing, in particular to an automatic reading method of a pointer instrument based on an improved semantic segmentation network.

Background

The pointer instrument is widely applied to industries such as electric power, metallurgy, chemical engineering and the like due to the advantages of simple structure, low cost, strong anti-interference performance and the like. In the face of pointer instruments which are used in large quantity, if manual reading is carried out, the problems of large reading error, low working efficiency, possibility of endangering personal safety and the like exist, so the method for automatically identifying and accurately reading the pointer instruments based on the computer vision technology has certain practical value.

The existing meter identification method mainly adopts the traditional mode identification, and meter reading is realized through a series of steps such as image preprocessing, target area detection, pointer identification, reading calculation and the like. In document "reading identification system of pointer instrument in inspection robot", a pointer area is extracted by a target segmentation method, then a pointer is positioned by hough transformation, finally, binarization processing is carried out on an image, and the reading of the instrument is identified by using the relation between the angle of the central line of the pointer relative to the initial scale of the measurement range of the instrument and the measurement range.

However, in the conventional method for identifying the meter, when in use, the interference of factors such as imaging vision, illumination, complex background, noise and the like is difficult to accurately extract the useful information from the dial. Therefore, before the reading calculation of the instrument, accurate semantic extraction through the trained neural network model becomes necessary.

For example, chinese patent with patent application No. CN202010114874.5 describes a method for calibrating a dial plate of an instrument based on a neural network, which locates scale numbers of the dial plate through the neural network, fits an ellipse according to the center of the number, determines feature points according to parameters of the ellipse, and then perspectively transforms and calibrates the dial plate of the instrument. However, in actual work, some stains exist, and further extracted scale marks may be lost or increased, so that the method lacks operation for repairing the scale marks, and the final effect is seriously affected. Meanwhile, the network model used by the method has large volume and is difficult to be applied to various embedded systems in a large area.

Therefore, a method for accurately obtaining the semantic information of the meter pointer, the scale mark and the range number and repairing the scale mark during automatic reading calculation to improve the reading accuracy becomes necessary.

Disclosure of Invention

Therefore, in order to solve the above problems, the invention provides an automatic reading identification method for a pointer instrument, which can accurately acquire semantic information of a pointer, a scale mark and a range number of the instrument, and can repair the scale mark during automatic reading calculation to improve reading accuracy.

The technical scheme of the invention comprises the following steps: s1: semantic segmentation, namely acquiring original images of a target instrument and inputting the original images into a target neural network model to acquire a semantic prediction graph, and respectively acquiring binary graphs of scale lines, pointers and range numbers from the target semantic prediction graph; s2: image correction, namely fitting an ellipse through coordinates of the center point of each scale contour in a binary image of the target scale mark, establishing perspective transformation of the ellipse and a standard circle, and projecting a binary image of the target instrument image, the target scale mark, a pointer and a range number to a plane where the target standard circle is located to obtain a corrected image; s3: and reading calculation, namely identifying the corrected range digital binary image and the instrument image to obtain start-stop scale reading, obtaining a polar coordinate expansion image of the corrected scale mark and the pointer binary image, positioning each scale mark and the pointer in the expansion image, repairing a target scale mark, and calculating the relative position of the pointer and the scale mark to obtain the final reading of the instrument.

Further, step S2 specifically includes: s21, performing morphological corrosion operation on the target scale mark binary image, filtering external noise points by using the size and position characteristics of the outline, and fitting by using the least square method by using the coordinates of the central points of the remaining outlines to obtain an ellipse; s22, acquiring coordinates of a central point of the ellipse, the lengths of the long axis and the short axis and coordinates of an end point, and establishing a standard circle for coating the target ellipse by taking the long axis of the target ellipse as the diameter; s23, establishing perspective transformation of the target ellipse and the standard circle, projecting the target instrument image and the binary image of the target scale mark, the pointer and the range number to the plane of the target standard circle to obtain a corresponding correction image.

Further, in step S3, the corrected range digital binary image and the original meter image are identified, specifically: and identifying the corrected target range digital binary image and the original meter image through an OCR algorithm, and acquiring the start-stop scale reading of the target image.

Further, the step S3 of acquiring the corrected scale mark and the polar coordinate expansion map of the pointer binary map specifically includes: denoising the calibration images of the scale lines and the pointer, and merging the denoised binary images of the scale lines and the pointer into the same binary image; and (5) unfolding the target binary image into a rectangle by utilizing polar coordinate transformation to obtain a corresponding unfolded image.

Further, in step S3, positioning each scale mark and the pointer position in the expanded view, and repairing the target scale mark specifically includes: extracting the central lines of the pointer and the scale marks in the expanded image through an image thinning algorithm; and performing statistical analysis on the distance set of adjacent scale marks to obtain a reference value of the distance between the real scale marks, and filling missing scale marks in the expanded graph or deleting redundant scale marks according to the target reference value.

Further, step S1 is preceded by: collecting pictures of a plurality of meters, labeling a plurality of semantic labels in the pictures of the meters, obtaining corresponding Gaussian thermodynamic diagram label graphs, and constructing a meter image data set; training an image semantic segmentation network by adopting an error back propagation algorithm through the instrument image data set; the semantic labels in the instrument picture at least comprise a 0-scale position point of the instrument, a middle-range digital marking point of the instrument, a pointer fixing point in the instrument and a pointer top end position point in the instrument.

Further, the labeling of the semantic tags in the meter picture in the previous step specifically includes: extracting semantic labels of scale lines, pointers and range numbers in the instrument picture, and marking and filling a range number area in a rectangular form; binarizing the semantic tags of the instrument picture, detecting the outline of each semantic tag, and respectively solving corresponding minimum circumscribed rectangles; and respectively establishing perspective transformation of the circumscribed rectangle and the target square, and performing standard Gaussian thermodynamic diagram projection on the semantic labels of the scale lines, the pointer and the range numbers through the perspective transformation to obtain a labeled Gaussian thermodynamic diagram label diagram.

Further, the neural network model in step S1 includes: the first Stage subnetwork Stage1 is used for acquiring a feature map of the downsampling size of the original image 1/2; the second Stage sub-network Stage2 and the third Stage sub-network Stage3 are used for obtaining feature maps of original images 1/4 and 1/8 in the downsampling size, wherein the Stage1, the Stage2 and the Stage3 are sequentially connected; the Gaussian thermodynamic regression network is used for up-sampling the feature map output by the Stage1, performing channel number matching with the Stage3, mapping to a corresponding Gaussian thermodynamic value domain [0,1] interval, and finally performing weighted fusion with the output feature map of the Stage3 to obtain a feature map to be predicted; and the semantic classification network is used for performing semantic prediction classification on the target characteristic graph to be predicted and acquiring a semantic prediction graph containing semantic information such as scale lines, pointers, range numbers and the like.

The invention has the beneficial effects that:

1. the improved semantic segmentation network model used by the invention has small volume and is convenient to be deployed in embedded terminal equipment. 2. According to the method, the attention module and the Gaussian thermodynamic diagram are added on the basis of the semantic segmentation network model, the Gaussian thermodynamic diagram label is used for training the attention module to learn the spatial weight of the pixels corresponding to the semantic elements of the pointer instrument image, and the spatial weight is fused with the feature diagram of the original semantic segmentation network to improve the segmentation precision of the model, so that the accuracy of the improved network for extracting the semantic information of the original image is improved. 3. The invention uses the image thinning algorithm to extract and correct the central lines of the pointer and the scale mark in the image, and reduces the complexity of operation while ensuring the accuracy compared with the traditional method for positioning the pointer by hough transformation. And further, the reference value of the distance between the real scale lines can be conveniently obtained by carrying out statistical analysis on the distance set of the adjacent scale lines, so that missing scale lines in the corrected image are filled or redundant scale lines are deleted, the anti-interference capability of the method is improved, and the accuracy of the meter reading calculation is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of the main steps of the method provided in the present application;

FIG. 2 is a general flow diagram of a method provided herein;

FIG. 3 is a diagram of a Gaussian thermodynamic diagram tag generation process for a meter picture of the present application;

FIG. 4 is a structural reference diagram of a neural network model of the method provided herein;

FIG. 5 is a flowchart of step S2 of the method provided herein;

FIG. 6-a is a diagram showing the ellipse fitting result of the dial plate in the example;

FIG. 6-b is a schematic diagram of perspective transformation in an embodiment;

FIG. 7 is a diagram illustrating the results of image correction and denoising in the embodiment;

fig. 8 is a polar coordinate transformation effect diagram of the target binary image in step S24 according to the present embodiment;

fig. 9 is a diagram illustrating the positioning and repairing effect of the scale lines and the pointer in step S3 according to the embodiment;

FIG. 10 is a graph of the result of an ablation contrast experiment of an image semantic segmentation model according to an embodiment;

fig. 11 is a table showing the results of the pointer instrument reading experiment of the example.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.

The embodiment of the application provides an automatic reading method for a pointer instrument, which can be applied to an image processing device, wherein the image processing device can be a stand-alone device, or can be integrated into an electronic terminal or other devices with image processing capability. Optionally, the operating system installed in the device or the electronic terminal that integrates the image processing apparatus may be a windows system, an android system, or another operating system, which is not limited herein. In addition, the semantic segmentation network used in the invention is not limited to the improved network provided in the embodiment of the method, and other networks with the same function can also be used.

Example one

Referring to fig. 1 and 2, in the embodiment of the present application, the method includes steps S1 to S3.

And step S1, performing semantic segmentation, namely acquiring the original image of the target instrument, inputting the original image into the target neural network model, and further acquiring a semantic prediction graph, so as to respectively acquire binary graphs of scale lines, pointers and range numbers from the target semantic prediction graph.

In addition, before step S1 is executed, the method further includes: collecting pictures of a plurality of meters, labeling a plurality of semantic labels in the pictures of the meters, obtaining corresponding Gaussian thermodynamic diagram label graphs, and further constructing a meter image data set; training an image semantic segmentation network by adopting an error back propagation algorithm through the instrument image data set; the semantic labels in the instrument picture at least comprise a 0-scale position point of the instrument, a middle-range digital marking point of the instrument, a pointer fixing point in the instrument and a pointer top end position point in the instrument.

As shown in fig. 3, labeling a plurality of semantic tags in a meter picture specifically includes: and extracting the semantic labels of the scale lines, the pointer and the range numbers in the instrument picture to obtain a corresponding pixel-level semantic label graph, and marking and filling the range number area in a closed quadrilateral form. Binarizing the pixel-level semantic label graph, detecting the outline of each semantic label, and respectively solving corresponding minimum circumscribed rectangles; and respectively establishing perspective transformation of the external rectangle and the target square, and then projecting the two-dimensional isotropic standard Gaussian thermodynamic diagram through the perspective transformation to obtain a Gaussian thermodynamic diagram label, thereby obtaining the labeled Gaussian thermodynamic diagram label diagram.

In this embodiment, Labelme software is used to perform pixel-level labeling on semantic components in an instrument image, so as to obtain a pixel-level semantic tag map preliminarily. In addition, in the generated pixel level semantic label graph, the value of each pixel point corresponds to the semantic category, the semantic categories used by the invention comprise 3 types of scale lines, pointers and range numbers, and other pixels are background types, so that 4 types of pixels are provided in total. In addition, in the embodiment of the invention, the pixel-level semantic label only marks the position of the pixel corresponding to the instrument semantic element, and the Gaussian thermodynamic diagram label can represent the importance degree of the pixel corresponding to the semantic element and is reduced from the center to the periphery. Therefore, the attention training module can learn the spatial weight of the pixels corresponding to the semantic elements of the pointer instrument image by using the Gaussian thermodynamic diagram labels, and the spatial weight is fused with the feature diagram of the original semantic segmentation network to improve the segmentation precision of the model.

In addition, in this embodiment, when the image semantic segmentation network is trained by using the error back propagation algorithm, the total loss function used is as follows:

L＝L_cls+λL_mse

wherein L represents the total loss of the model, L_clsLoss function, L, representing pixel classification results_mseAnd the loss function represents the regression result of the Gaussian thermodynamic diagram, and the lambda represents the balance coefficient of the two types of loss functions.

The pixel classification adopts the difference between the prediction result of the cross entropy loss evaluation model and the semantic label, and the calculation formula is as follows:

wherein N is the total number of samples, M represents the number of categories, y_cIs a 1-dimensional vector, the elements only have two values of 0 and 1, if the categories of the sample and the label are the same, the 1 is taken, otherwise, the 0 and the p are taken_cRepresenting the probability that the prediction sample belongs to class c. L is_clsThe smaller the pixel classification result, the more accurate the pixel classification result is represented.

The Gaussian thermodynamic regression adopts mean square error loss to evaluate the fitting degree of the model to the Gaussian thermodynamic label, and the calculation formula is as follows:

wherein N is the total number of samples, y_iThe predicted value of the model is represented,

representing the tag value. L is_mseThe smaller the representation, the more accurate the gaussian thermodynamic diagram position regression.

As shown in fig. 4, in this embodiment, the neural network model used is improved on the basis of the original CGNet, and attention mechanism and gaussian thermodynamic regression are introduced, so that the detailed features of the semantic elements of the instrument image are enhanced through feature fusion, and the classification layer is appropriately deepened when the pixel semantic category prediction is finally performed. The CG-Net network model is prior art and therefore not explained much in this embodiment. The first-Stage sub-network Stage1, the second-Stage sub-network Stage2, the third-Stage sub-network Stage3 and the input injection mechanism (input injection) in the neural network model are all from the original CG-Net network model, so the explanation is not too much. As shown in fig. 4, the model structure of the present application specifically includes: a first Stage subnetwork Stage1, configured to acquire a feature map of a down-sampling (down sample) size of an original image 1/2; the second Stage subnetwork Stage2 and the third Stage subnetwork Stage3 are used for obtaining feature maps of original images 1/4 and 1/8 in downsampling (down sample) sizes, wherein the Stage1, the Stage2 and the Stage3 are connected in sequence; the Gaussian thermodynamic regression network is used for performing up-sampling (Upesample) on the feature map output by Stage1, further performing channel number matching with Stage3, mapping to a corresponding Gaussian thermodynamic map (Gaussian Heatmap) value domain [0,1] interval, and finally performing weighted fusion with the output feature map of Stage3 to obtain a feature map to be predicted; and the semantic classification network is used for performing semantic prediction classification on the target characteristic graph to be predicted so as to obtain a semantic prediction graph containing semantic information such as scale lines, pointers, range numbers and the like.

Wherein, the first Stage sub-network Stage1 is composed of an input layer, three 3 × 3 convolutional layers, an active layer and a batch normalization layer which are connected in sequence; the second Stage subnetwork 2 and the third Stage subnetwork 3 respectively consist of 3 and 21 stacked bootstrap modules CG Block; the input ends of the first Stage sub-Stage 1, the second Stage sub-network Stage2 and the third Stage sub-network Stage3 are respectively connected with the same input injection mechanism (input injection), and the input injection mechanism (input injection) can respectively channel and connect the input images which are down-sampled (down sample) by Stage1 and Stage2 with the output feature maps thereof, so as to input the input images into Stage2 and Stage3 to improve the information flow in the network. The bootstrap modules CG Block and input injection mechanism (input injection) of the second Stage sub-network Stage2 and the third Stage sub-network Stage3 in the present neural network model are all from the original CG-Net network model, and therefore, they are not explained much.

In addition, the Gaussian thermodynamic diagram regression network in this embodiment includes an attention module SENet, two 1 × 1 convolutional layers, and a Gaussian thermodynamic diagram (Gaussian heat map) connected in sequence, where an output end of the first 1 × 1 convolutional layer is connected to a Sigmoid active layer, and further, the Sigmoid active layer is fused with an output end of the third-Stage sub-network Stage3, so as to be connected to the semantic classification network. The semantic classification network comprises three 3 multiplied by 3 convolutional layers, an activation layer, a batch normalization layer and an output layer which are connected in sequence.

As shown in fig. 4, in the present embodiment, the model retains 3 subnets and Input Injection mechanism (Input Injection) in CGNet. Stage1 is composed of 3 × 3 convolutional layers, and a feature map of the original 1/2 downsampling (down sample) size is obtained, and Stage2 and Stage3 are stacked with 3 CG blocks and 21 CG blocks, respectively, to obtain feature maps of the original 1/4 and 1/8 downsampling (down sample) size. The input injection mechanism (input injection) additionally channels 1/2 and 1/4 down-sampled (down sample) input images to Stage2 and Stage3, respectively, with the output feature maps of the previous Stage (Concat) to improve information flow in the network. To share the feature map output by Stage1 subnetwork and reduce the number of model parameters, the lightweight attention module SENET is introduced. Here, 2 times of upsampling (Upsample) of the feature map output by Stage1 is sent to sentet to obtain an attention feature map, the number of channels of the attention feature map is converted by 1 × 1 convolution in order to match the number of channels of the feature map output by Stage3, and then the converted attention feature map is mapped to a Gaussian thermodynamic map (Gaussian Heatmap) value field [0,1] interval by a Sigmoid activation function and then is fused with 8 times of upsampling (Upsample) output feature map of Stage3 by means of multiplication of corresponding elements. In addition, the original CGNet only uses 1 layer of 1 × 1 convolutional layer to predict pixel semantic classes, and considering that the performance can be improved by appropriately deepening the network, the present invention uses 3 layers of convolutional layers for prediction, including 2 depth separable convolutions (DSConv) and 1 standard convolution. It should be noted that, in fig. 4, both the active layer and the batch normalization layer are arranged after the convolution layer, and the picture size is limited, so that the picture cannot be completely drawn, and therefore characters are used for supplementing the picture.

Although the original network stages 2 and 3 adopt the attention mechanism concept, the attention mechanism concept is mainly used for better learning the joint features of local features and global context, the SENET attention module branch is added behind stage1 and is used for regression Gaussian heat map (Gaussian heat map), and the feature map learned under the supervision of the regression task can be used as spatial weight to further perform weighted fusion with the output feature map of the original network, so that the semantic segmentation precision is further improved. Stage1 is shallow in the original network relative to

stages

2 and 3 which are at higher levels, so the SENET attention module added by the invention is also an effective complement to the attention mechanism adopted by the original CGNet.

And step S2, image correction, fitting an ellipse through coordinates of the center point of each scale line profile in the binary image of the target scale line, establishing a standard circle corresponding to the ellipse according to a certain method, and projecting the target instrument image and the binary image of the target scale line, the pointer and the range number to the plane of the target standard circle through a certain method, thereby obtaining a corrected image.

Step S2 of the present embodiment will be explained with reference to fig. 5 and 6.

As shown in fig. 5, step S2 specifically includes:

s21, performing morphological corrosion operation on the target scale mark binary image, and further filtering out external noise points by using the size and position characteristics of the contour, so as to obtain an ellipse by using least square fitting according to the center point coordinates of the remaining contour. S22, acquiring coordinates of the snack in the center of the ellipse and the length of the long axis and the short axis and the coordinates of the end points, and establishing a standard circle wrapping the target ellipse by taking the long axis of the target ellipse as the diameter; s23, establishing perspective transformation of the target ellipse and the standard circle, projecting the target instrument image and the binary image of the target scale mark, the pointer and the range number to the plane of the target standard circle to obtain a corrected image.

Wherein, as shown in fig. 6, 4 sets of corresponding point coordinates are required to establish the perspective transformation between the ellipse and the circle. Since scale scaling and rotation do not affect the reading of the pointer instrument, the present invention employs an efficient method of determining the corresponding points, with figure 6-a showing the ellipse fitting results and figure 6-b showing 4 sets of corresponding points between the ellipse and the circle<E_i，C_i>(i is 0,1, 2, 3) by taking the two ends of the major axis of the ellipse as E0 and E1, taking the two ends of the minor axis as E2 and E3, taking the center point O of the ellipse as the center, taking the length of the major axis of the ellipse as the diameter as the standard circle, and intersecting the ellipse at the pointC0, C1 (i.e. coinciding with E0, E1 respectively), extending outward from the line E2E3 along which the minor axis of the ellipse lies, may intersect a standard circle at points C2, C3. And 4, calculating a perspective transformation matrix by using the 4 groups of corresponding point coordinates, and transforming each point on the oblique distorted image to a plane where a standard circle is located so as to realize image correction.

Further, step S3 of the present embodiment will be described in detail with reference to fig. 7, 8 and 9.

Step S3: and reading calculation, namely identifying the corrected range digital binary image and the original meter image to obtain the reading of the start-stop scale, obtaining a polar coordinate expansion image of the corrected scale mark and the pointer binary image, positioning each scale mark and the pointer position in the expansion image, repairing the target scale mark, and calculating the relative position of the pointer and the scale mark to obtain the final reading of the meter.

In this embodiment, in step S3, the corrected target stroke digital binary image and the meter original image are specifically recognized by the OCR algorithm, so as to obtain the start-stop scale reading of the target image. The OCR algorithm generally determines the positions and shapes of characters in an image through image preprocessing such as binarization, graying, filtering and the like, then converts the characters in the image into texts by utilizing algorithms such as template matching, feature classification and the like, and finally outputs all the texts to a computer in sequence.

In addition, in this embodiment, the step S3 of acquiring the corrected polar coordinate expansion map of the scale mark and the pointer binary map specifically includes: denoising the calibration images of the scale lines and the pointer, and merging the denoised binary images of the scale lines and the pointer into the same binary image; and (5) unfolding the target binary image into a rectangle by utilizing polar coordinate transformation to obtain a corresponding unfolded image.

As shown in fig. 7, on the corrected image, noise due to semantic segmentation inaccuracy is further removed by using constraint conditions such as the properties of circles and the area of outlines. The upper and lower partial graphs are respectively a corrected binary graph of the scale mark, the pointer and the range number and a corresponding denoising result. In this embodiment, the longest segment is selected, and the calculation of the final reading is not affected as long as the pointer is pointed correctly.

In step S3 of the present embodiment, the denoised scale lines and the binary image of the pointer are merged onto the same binary image, and the target binary image is expanded into a rectangle by polar coordinate transformation. And establishing a Cartesian coordinate system by taking the center of the standard circle as an origin, selecting the radius of the standard circle as a conversion radius, and converting clockwise from the positive half shaft of the Y axis to enable the initial scale mark of the dial to be positioned on the left side of the converted image. The before and after effect of the polar transformation is shown in fig. 8.

As shown in fig. 9, in step S3, the center lines of the pointer and the scale line in the corrected image are extracted specifically by the Zhang thinning algorithm; and acquiring a reference value of the distance between the real scale lines by collecting and statistically analyzing the distances between the adjacent scale lines, so that missing scale lines in the corrected image are filled or redundant scale lines are deleted according to the target reference value. Compared with the traditional method for positioning the pointer by hough transformation, the method ensures the accuracy, reduces the complexity of operation, and simultaneously repairs the scale mark to ensure that the subsequent pointer reading calculation can be more accurate.

In this embodiment, the reading of the pointer meter can be obtained according to the relative position relationship between the scale lines and the pointer, the measurement range and the number of the scale lines. Assuming that the total number of complete scale marks is N, the current pointer points between the scale mark i and the scale mark i +1 (i ═ 0,1.., N)), and the normalized meter reading calculation formula is as follows:

wherein D is the distance between the pointer and the scale mark i, D is the distance between the scale mark i +1 and the scale mark i,

is a normalized index value. And a and b are respectively the range numbers corresponding to the identified start-stop scales, and the final reading of the pointer meter is as follows: r ═ a + R_norm×(b-a)。

As shown in fig. 10 and fig. 11, the present embodiment further provides experimental result display and analysis by the present method, where the experimental result display and analysis of the present embodiment includes: (1) an image semantic segmentation model ablation contrast experiment; (2) range number identification experiment; (3) reading experiment of pointer instrument. The experimental result display provided by the embodiment of the application is to better show the actual effect of the application, and the effect display of the method provided by the application is not limited to the above experiment, and can be shown by other display experimental methods in practice.

In the embodiment of the application, the average intersection ratio mIoU and the Pixel Accuracy (PA) are used as quantitative indexes to measure the performance of semantic segmentation of the instrument image.

The mIoU is defined as the ratio of the intersection and union of the two sets of the real value and the predicted value of the image pixel category:

where K is the number of label categories, TP is the number of correctly predicted positive samples, FP is the number of predicted positive samples for negative samples, and FN is the number of predicted negative samples for positive samples.

PA is defined as the ratio of the number of correctly classified pixels in the test set to the number of all pixels:

where K denotes the number of label categories, p_ijIndicating the number of pixels classified as class i into class j.

As shown in fig. 10, the image semantic segmentation model ablation contrast experiment of the present application is as follows:

in this embodiment, an ablation contrast experiment study is performed to verify the effectiveness of the method of the present application on a CGNet improvement strategy, where a contrast model a is a CGNet improved by using 3 convolutional layers as classification layers, a contrast model B is a CGNet improved by directly inserting an attention mechanism on the basis of the contrast model a, and a contrast model C is a CGNet improved by introducing the attention mechanism in an increase branch manner on the basis of the contrast model a but not using a gaussian thermodynamic diagram regression. As can be seen from the comparison of parameters in FIG. 10, the method of the present application introduces an attention mechanism in a branch-adding manner and effectively improves the segmentation accuracy by adopting a Gaussian thermodynamic diagram regression idea, and obtains the highest mIoU and PA values.

The range number identification experiment of this example is as follows:

and according to the semantic segmentation prediction result of the instrument image, intercepting a sub-image corresponding to the range digital area from the corrected original instrument image, then carrying out graying and binarization, and further segmenting the number into single digital characters and decimal points by a vertical projection segmentation method. Single character recognition employs a trained convolutional neural network model.

The concentrated range numbers tested by the experiment contain 572 groups, which contain 934 single-digit characters, the identification accuracy of the single-digit characters (0-9) is as high as 99.79%, but the identification errors of a small number of numbers and decimal points are caused by poor binarization effect of a small number of images due to blurring or noise interference, so that the overall accuracy of the range numbers identified according to the groups is 97.88%. However, most range numbers in the same dial are correctly identified, and the arithmetic arrangement relationship is further utilized for verification after the numerical values are sequenced, so that the accuracy of all range numbers can be ensured and the range numbers are used for final reading calculation.

The reading experiment of the pointer instrument of the present example is as follows:

the experiment is to verify the applicability and stability of the method in automatic reading of a pointer meter, the method is used for reading and calculating the image of a centralized test meter, the result of manual reading is used as a true value, and the error between program reading and manual reading is calculated. Wherein the evaluation index is expressed by using a relative error:

wherein m is the meter reading value of the method of the embodiment, v is the manual reading value, and r is the meter measuring range.

Experimental results the experimental results for a portion of the representative test images are presented in fig. 11, where the reading values are retained to the two last decimal places. The experimental result shows that the method used in the embodiment has small relative error in practical application, and can meet the requirement of the reading measurement precision of the pointer instrument.

In addition, the model reasoning speed used by the method can reach 27FPS, most real-time application requirements can be met, and the size of a model of only 2.7M is very suitable for being deployed on embedded terminal equipment.

The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A pointer instrument automatic reading method based on an improved semantic segmentation network is characterized by comprising the following steps: s1: semantic segmentation, namely acquiring original images of a target instrument and inputting the original images into a target neural network model to acquire a semantic prediction graph, and respectively acquiring binary graphs of scale lines, pointers and range numbers from the target semantic prediction graph; s2: image correction, fitting an ellipse through coordinates of the center point of each scale contour in the binary image of the target scale mark, establishing a standard circle corresponding to the ellipse, and projecting the target instrument image and the binary image of the target scale mark, the pointer and the range number onto the target standard circle to obtain a corrected image; s3: and reading calculation, namely identifying the corrected range digital binary image and the instrument image to obtain start-stop scale reading, obtaining a polar coordinate expansion image of the corrected scale mark and the pointer binary image, positioning the scale mark and the pointer in the expansion image, repairing the target scale mark, calculating the relative position of the pointer and the scale mark, and obtaining final reading.

2. The method according to claim 1, wherein the step S2 specifically includes: s21, performing morphological corrosion operation on the target scale mark binary image, filtering external noise points by using the size and position characteristics of the outline, and fitting by using the least square method by using the coordinates of the central points of the remaining outlines to obtain an ellipse; s22, acquiring coordinates of the snack in the center of the ellipse and the length of the long axis and the short axis and the coordinates of the end points, and establishing a standard circle wrapping the target ellipse by taking the long axis of the target ellipse as the diameter; and S23, establishing perspective transformation to project the target instrument image and the binary image of the target scale mark, the pointer and the range number onto the target standard circle to obtain a corrected image.

3. The method according to claim 1, wherein the step S3 of identifying the corrected range digital binary image and the meter original image includes: and recognizing the corrected target range digital binary image and the original meter image through an OCR algorithm to obtain the start-stop scale reading of the target image.

4. The method according to claim 1, wherein the step S3 of obtaining the polar expansion map of the corrected scale mark and pointer binary map specifically includes: denoising the calibration images of the scale lines and the pointer, and merging the denoised binary images of the scale lines and the pointer into the same binary image; and (5) unfolding the target binary image into a rectangle by utilizing polar coordinate transformation to obtain a corresponding unfolded image.

5. The method according to claim 1, wherein the step S3 of locating the respective graduation marks and the pointer position in the expanded view and repairing the target graduation marks includes: extracting the central lines of the pointer and the scale marks in the expanded image through an image thinning algorithm; and performing statistical analysis on the distance set of adjacent scale marks to obtain a reference value of the distance between the real scale marks, and filling missing scale marks in the expanded graph or deleting redundant scale marks according to the target reference value.

6. The method according to claim 1, wherein the step S1 is preceded by: collecting pictures of a plurality of meters, labeling a plurality of semantic labels in the pictures of the meters, obtaining corresponding Gaussian thermodynamic diagram label graphs, and constructing a meter image data set; training an image semantic segmentation network by adopting an error back propagation algorithm through the instrument image data set; the semantic labels in the instrument picture at least comprise a 0-scale position point of the instrument, a middle-range digital marking point of the instrument, a pointer fixing point in the instrument and a pointer top end position point in the instrument.

7. The method according to claim 5, wherein the labeling the semantic tags in the meter picture specifically comprises: extracting semantic labels of scale lines, pointers and range numbers in the instrument picture, and marking and filling a range number area in a rectangular form; binarizing the semantic tags of the instrument picture, detecting the outline of each semantic tag, and respectively solving corresponding minimum circumscribed rectangles; and respectively establishing perspective transformation of the circumscribed rectangle and the target square, and performing standard Gaussian thermodynamic diagram projection on the semantic labels of the scale lines, the pointer and the range numbers through the perspective transformation to obtain a labeled Gaussian thermodynamic diagram label diagram.

8. The method according to claim 1, wherein the neural network model in step S1 includes: the first Stage subnetwork Stage1 is used for acquiring a feature map of the downsampling size of the original image 1/2; the second Stage sub-network Stage2 and the third Stage sub-network Stage3 are used for obtaining feature maps of original images 1/4 and 1/8 in the downsampling size, wherein the Stage1, the Stage2 and the Stage3 are sequentially connected; the Gaussian thermodynamic regression network is used for up-sampling the feature map output by the Stage1, performing channel number matching with the Stage3, mapping to a corresponding Gaussian thermodynamic value domain [0,1] interval, and finally performing weighted fusion with the output feature map of the Stage3 to obtain a feature map to be predicted; and the semantic classification network is used for performing semantic prediction classification on the target characteristic graph to be predicted and acquiring a semantic prediction graph containing semantic information such as scale lines, pointers, range numbers and the like.