CN113191361A - Shape recognition method - Google Patents
Shape recognition method Download PDFInfo
- Publication number
- CN113191361A CN113191361A CN202110418108.2A CN202110418108A CN113191361A CN 113191361 A CN113191361 A CN 113191361A CN 202110418108 A CN202110418108 A CN 202110418108A CN 113191361 A CN113191361 A CN 113191361A
- Authority
- CN
- China
- Prior art keywords
- shape
- segmentation
- layer
- points
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a shape identification method, which comprises the steps of extracting contour key points of a shape sample; defining an approximate bias curvature value at each key point and judging the concavity and convexity at the key points to obtain candidate segmentation points; adjusting a curvature screening threshold value to obtain shape division points; calculating the minimum segmentation cost to perform shape segmentation to obtain a plurality of sub-shape parts; constructing a topological structure of the shape sample; obtaining a feature expression image of a corresponding sub-shape part by using a full-scale visual representation method of the shape; inputting each feature expression image into a convolutional neural network for training, and learning to obtain a feature vector of each sub-shape part; constructing a feature matrix of the shape sample; constructing a graph convolution neural network; and training the graph convolution neural network, acquiring a characteristic matrix and an adjacency matrix of the test sample, and inputting the characteristic matrix and the adjacency matrix into the trained graph convolution network model to realize shape classification and identification.
Description
Technical Field
The invention relates to a shape recognition method, and belongs to the technical field of shape recognition.
Background
Outline shape recognition is an important research direction in the field of machine vision, and object recognition by using object shape features is a main research subject of machine vision, and the main result of the research is to sufficiently extract object shape features for better similarity measurement by improving a shape matching algorithm or designing effective shape descriptors. The method is widely applied to engineering in a plurality of fields such as radar, infrared imaging detection, image and video matching and retrieval, robot automatic navigation, scene semantic segmentation, texture recognition, data mining and the like.
In general, the expression and retrieval of outline shapes is based on hand-designed Shape descriptors to extract target outline features, such as Shape Contexts, Shape Vocalbulary, and Bag of constraint fragments. However, the shape information extracted by the manual descriptor is usually incomplete, and the descriptor cannot be guaranteed to have robustness to changes such as local changes, occlusion, overall deformation and the like of the target shape. And designing too many descriptors leads to feature extraction redundancy and higher computational complexity. Therefore, recognition accuracy and efficiency are low. In recent years, convolutional neural networks have been applied to shape recognition tasks as they have achieved better performance in image recognition tasks. And because the outline shape lacks information of surface texture, color and the like, the identification effect of directly applying the convolutional neural network is lower.
In view of the above problems of the shape recognition algorithm, how to provide a target recognition method capable of accurately classifying target contour shapes is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention is provided for solving the problems in the prior art, the technical proposal is as follows,
a method of shape recognition, the method comprising the steps of:
firstly, extracting outline key points of a shape sample;
step two, defining approximate bias curvature values at each key point and judging the curve concavity and convexity at the key points to obtain candidate shape segmentation points;
step three, adjusting a curvature screening threshold value to obtain shape division points;
fourthly, segmenting the shape based on the principle that the segmentation line segments are positioned in the shape and do not intersect with each other, and segmenting at the minimum segmentation cost to obtain a plurality of sub-shape parts;
step five, constructing a topological structure of the shape sample;
step six, obtaining a feature expression image of a corresponding sub-shape part by using a full-scale visual representation method of the shape;
inputting each feature expression image into a convolutional neural network for training, and learning to obtain a feature vector of each sub-shape part;
step eight, constructing a feature matrix of the shape sample;
constructing a graph convolution neural network;
step ten, training a graph convolution neural network, carrying out shape segmentation on the test sample, obtaining the feature vector of each sub-shape part, calculating the feature matrix and the adjacency matrix of the test sample, and inputting the feature matrix and the adjacency matrix into the trained graph convolution network model to realize shape classification and recognition.
Preferably, in the first step, the method for extracting the key points of the contour includes:
the contour of each shape sample is composed of a series of sampling points, and for any shape sample S, sampling n points on the contour results in:
S={(px(i),py(i))|i∈[1,n]},
wherein p isx(i),py(i) Is the horizontal and vertical coordinates of the contour sampling points p (i) in a two-dimensional plane, and n is the contour length, namely the number of the contour sampling points;
evolving the contour curve of the shape sample to extract key points, and deleting the point with the minimum contribution to target identification in each evolution process, wherein the contribution of each point p (i) is defined as:
wherein H (i, i-1) is the length of the curve between points p (i) and p (i-1), H (i, i +1) is the length of the curve between points p (i) and p (i +1), H1(i) The angle between the segment p (i) p (i-1) and the segment p (i) p (i +1) is defined, and the length h is normalized according to the perimeter of the contour; a larger value of Con (i) indicates a larger contribution of the point p (i) to the shape feature;
the method introduces an adaptive termination function F (t) based on a region to overcome the problem of excessive or insufficient extraction of the key points of the contour:
wherein S0Is the area of the original shape, SiFor the shape area after i evolutions, n0The total number of points on the outline of the original shape; when the end function value F (t) exceeds the set threshold, the extraction of the contour key points is ended and n is obtained*And (4) contour key points.
Further, in the second step, defining an approximate bias curvature value at each key point and judging the concave-convex of the curve at the key point to obtain candidate segmentation points includes:
in order to calculate an approximate bias curvature value of any key point p (i) in the shape sample S, taking adjacent contour points p (i-epsilon) before and after p (i), p (i + epsilon), wherein epsilon is an empirical value; because:
cosHε(i)∝cur(p(i)),
where H ε (i) is the angle between line segment p (i) p (i- ε) and line segment p (i) p (i + ε), and cur (p (i)) is the curvature at point p (i); the approximate bias curvature values cur (p (i)) at points p (i)) are defined as:
cur~(p(i))=cosHε(i)+1,
wherein H epsilon (i) is an angle between a line segment p (i) p (i-epsilon) and the line segment p (i) p (i + epsilon), cosH epsilon (i) ranges from-1 to 1, and cur (p (i)) ranges from 0 to 2;
according to a shape segmentation method conforming to visual naturalness, shape segmentation points are all positioned at the concave curve of the outline; therefore, when screening candidate segmentation points for shape segmentation, a method for judging the concave-convex property of the curve at the key point p (i) is defined:
for the shape binarization image, the numerical values of the pixel points inside the shape sample S outline are all 255, and the numerical values of the pixel points outside the shape sample S outline are all 0; the method comprises the steps of equidistantly sampling a line segment p (i-epsilon) p (i + epsilon) to obtain R discrete points, if the pixel values of the R discrete points are all 255, the line segment p (i-epsilon) p (i + epsilon) is completely in the shape outline, namely, the curve at p (i) is shown to be convex, if the pixel values of the R discrete points are all 0, the line segment p (i-epsilon) p (i + epsilon) is completely outside the shape outline, namely, the curve at p (i) is shown to be concave, and recording the key points p (i) with all the curves shown to be concave as candidate segmentation points P (j).
Further, in the third step, the step of adjusting the curvature screening threshold Th and obtaining the shape segmentation point is as follows:
(1) regarding all candidate segmentation points P (j) obtained in the step two, taking the average approximate bias curvature value of the candidate segmentation points P (j) as an initial threshold Th0:
Wherein J is the total number of the candidate segmentation points;
(2) for threshold Th at the time of adjustment of the t-Th timeτBased on the approximate bias curvature value and Th of each candidate division point P (j)τThe size relationship of (a) can be divided into two types: approximate bias curvature value greater than ThτCandidate division point ofAnd approximate bias curvature value of Th or lessτCandidate division point ofCalculating and recording the segmentation division degree D under the current threshold valueτ:
Wherein the content of the first and second substances,
whereinRespectively represent threshold values ThτThe positive and negative curvature deviations of the next candidate division points P (j),represents the minimum of the positive curvature deviations of all candidate segmentation points,representing the maximum value of the negative curvature deviation of all candidate segmentation points;
judging whether an approximate bias curvature value larger than a threshold value Th exists or notτIf the candidate segmentation point does not exist, the adjustment is not carried out, and the step (4) is carried out; if there is an approximate bias curvature value greater than threshold ThτTurning to the step (3) to continuously adjust the threshold value;
(3) continuing to adjust the threshold, new threshold Thτ+1The minimum value of the positive curvature deviations of all candidate segmentation points in the last threshold value adjusting process is expressed by the following formula:
according to a threshold Thτ+1Calculating positive and negative curvature deviation of each candidate division point under the adjustment of the (tau +1) th timeAnd a division degree Dτ+1And recording; judging whether an approximate bias curvature value larger than a threshold value Th exists or notτ+1If the candidate segmentation point does not exist, the adjustment is not carried out, and the step (4) is carried out; if there is an approximate bias curvature value greater than threshold Thτ+1If the candidate segmentation point is τ +1, repeating the current step and continuously adjusting the threshold;
(4) the multiple threshold adjustment value has a plurality of segmentation degrees, the threshold corresponding to the maximum segmentation degree is the final curvature screening threshold Th, and the point with the approximate bias curvature value smaller than the threshold Th is the final shape segmentation point.
Further, in the fourth step, the specific method of segmenting the shape based on the principle that the segmentation line segments are located in the shape and do not intersect with each other to obtain a plurality of sub-shape portions by segmentation with the minimum segmentation cost is as follows:
(1) partition Point P (e) for any two shapes1),P(e2) Equally-spaced sample-divided line segment P (e)1)P(e2) C discrete points are obtained, and if there is a discrete point with a pixel value of 0 among the C discrete points, the line segment P (e)1)P(e2) The part outside the shape outline exists and is not selected as a segmentation line segment;
(2) partition Point P (e) for any two shapes3),P(e4) If there is a bar-shaped segment P (e)5)P(e6) Such that:
or
The line segment P (e)3)P(e4) With the existing segment P (e)5)P(e6) Intersect, no line segment P (e) is selected3)P(e4) As a segment;
(3) the segmentation line segment set meeting the two principles is further screened, and three measurement indexes I for evaluating the quality of the segmentation line segment are defined to realize segmentation under the minimum segmentation cost:
wherein D*(u,v)、L*(u,v)、S*(u, v) are three division measurement indexes of normalized division length, division arc length and division residual area, u and v are serial numbers of any two shape division points,the total number of the division points is;
for any shape segment p (u) p (v), three ways of calculating the segmentation evaluation index are as follows:
wherein DmaxThe length of the segment having the largest length among all the segments, D*The value range of (u, v) should be between 0 and 1, and the smaller the value is, the more remarkable the segmentation effect is;
whereinIs a profile curve between two points P (u) and P (v)Length of (L)*The value range of (u, v) should be between 0 and 1, and the smaller the value is, the more remarkable the segmentation effect is;
wherein SdIs the shape area divided by the line segment P (u) P (v), i.e. the line segment P (u) P (v) and the contour curveArea of the enclosed area formed, S*The value range of (u, v) should be between 0 and 1, and the smaller the value is, the more remarkable the segmentation effect is;
calculating the segmentation Cost for the segmentation line segment P (u) P (v) according to the steps:
Cost=αD*(u,v)+βL*(u,v)+γS*(u,v),
wherein, alpha, beta and gamma are the weight of each measurement index;
calculating the segmentation Cost of the segmentation line segments in the screened segmentation line segment set; sorting all the calculated Cost from small to large, and finally selecting N-1 segmentation line segments with the smallest Cost according to the number N of the segmentation sub-shape parts set by the type of the shape sample S, so as to realize optimal segmentation and obtain N sub-shape parts; the number N of the divided sub-shape portions depends on the category to which the current shape sample S belongs, and the corresponding number of the divided sub-shape portions is manually set for shapes of different categories.
Further, in the fifth step, a specific method for constructing the topological structure of the shape sample is as follows: for N sub-shape portions obtained by dividing any shape sample S, the central shape portion is recorded as a start vertex v1And the other adjacent shape parts are sorted in the clockwise direction as vertex { v }o|o∈[2,N]}; v. note connection1To the remaining vertices voIs (v)1,vo) And further forming a shape directed graph which meets the topological order:
G1=(V1,E1),
wherein V1={vo|o∈[1,N]},E1={(v1,vo)|o∈[2,N]};
After all the training shape samples are optimally segmented, recording the maximum number of the sub-shape parts obtained by segmenting the training shape samples asFor any shape sample S, its adjacency matrixThe calculation method is as follows:
further, in the sixth step, a specific method for obtaining the color feature expression image corresponding to the sub-shape part by using a full-scale visual representation method of the shape is as follows:
for a sub-shape portion S of any shape sample S1:
Wherein the content of the first and second substances,for the contour sampling point p of the sub-shape part1(i) Abscissa and ordinate in two-dimensional plane, n1The length of the outline, namely the number of the outline sampling points;
the profile of the sub-shape portion S1 is first described using a feature function M composed of three shape descriptors:
M={sk(i),lk(i),ck(i)|k∈[1,m],i∈[1,n1]},
wherein s isk,lk,ckThree invariant parameters of normalized area s, arc length l and gravity center distance c in a scale k, wherein k is a scale label, and m is a total scale degree; these three shape invariant descriptors are defined separately:
sampling points p by a contour1(i) As the center of circle, with the initial radiusMaking a preset circle C1(i) The preset circle is the initial semi-global scale for calculating the parameters of the corresponding contour points; obtaining a preset circle C according to the steps1(i) Then, the three shape descriptors under the scale k-1 are calculated as follows:
in calculating s1(i) When describing the son, the circle C will be preset1(i) And target contour point p1(i) Zone Z having a direct connection relationship1(i) Is marked asThen there are:
wherein, B (Z)1(i) Z) is an indicator function defined as
Will Z1(i) Area of and predetermined circle C1(i) The ratio of the areas is used as the area parameter s of the descriptor of the target contour point1(i):
s1(i) Should range between 0 and 1;
in the calculation of c1(i) When describing the sub-image, firstly, the contour point p of the target is calculated1(i) The center of gravity of the region having the direct connection relationship is specifically an average of coordinate values of all pixel points in the region, and an obtained result is a coordinate value of the center of gravity of the region, which can be expressed as:
wherein, w1(i) The gravity center of the area is obtained;
then, the target contour point p is calculated1(i) And center of gravity w1(i) Is a distance ofCan be expressed as:
finally will beWith the target contour point p1(i) Predetermined circle C1(i) Is taken as the gravity center distance parameter c of the descriptor of the target contour point1(i):
c1(i) Should range between 0 and 1;
in calculating l1(i) When describing the son, the circle C will be preset1(i) Inner and target contour points p1(i) The length of the arc segment having the direct connection relationship is recorded asAnd will beAnd a predetermined circle C1(i) The ratio of the perimeter is used as the arc length parameter l of the descriptor of the target contour point1(i):
Wherein l1(i) Should range between 0 and 1;
calculating according to the steps to obtain the initial radius of the scale label k which is 1Of the shape sample S at the semi-global scale1Characteristic function M of1:
M1={s1(i),l1(i),c1(i)|i∈[1,n1]},
Selecting a single pixel as a continuous scale change interval under a full-scale space because the digital image takes one pixel as a minimum unit; i.e. for the kth dimension label, set circle CkRadius r ofk:
I.e. when the initial dimension k is 1,after this radius rkReducing the size of the pixel by m-1 times in a constant amplitude mode by taking one pixel as a unit until the minimum size k is m; according to the characteristic function M under the calculation scale k being 11In the method, the characteristic functions under other scales are calculated, and finally the sub-shape part S of the shape sample S under the whole scale is obtained1The characteristic function of (1):
M={sk(i),lk(i),ck(i)|k∈[1,m],i∈[1,n1]},
respectively storing the characteristic functions under all scales into a matrix SM、LM、CM,SMFor storing sk(i),SMIs stored as a point p in the k-th row and i-th column of1(i) Area parameter s at k-th scalek(i);LMFor storing lk(i),LMIs stored as a point p in the k-th row and i-th column of1(i) Arc length parameter l at k-th scalek(i);CMFor storing ck(i),CMIs stored as a point p in the k-th row and i-th column of1(i) Center of gravity distance parameter c at k scalek(i);SM、LM、CMFinally as a sub-shape portion S of the shape sample S in the full-scale space1Grayscale map representation of three shape features of (1):
GM1={SM,LM,CM},
wherein S isM、LM、CMAll the matrixes are m multiplied by n in size and respectively represent a gray image;
then the sub-shape part S1The three gray images are used as RGB three channels to obtain a color image as the sub-shape part S1Is characteristic expression image
Further, in the seventh step, the feature expression image samples of the sub-shape parts of all the training shape samples are input into a convolutional neural network, and a convolutional neural network model is trained; different sub-shape portions of each type of shape have different class labels; after the convolutional neural network is trained to be convergent, for any shape sample S, the feature expression image { T) corresponding to N sub-shape parts formed by dividing the shape sample S is formednum|num∈[1,N]Inputting the trained convolutional neural network respectively, outputting the feature vector of the corresponding sub-shape part from the second layer full-connection layer of the networkWherein Vec is the number of neurons in the second fully-connected layer;
the structure of the convolutional neural network comprises an input layer, a pre-training layer and a full-connection layer; the pre-training layer consists of the first 4 modules of the VGG16 network model, parameters obtained after the 4 modules are trained in the imagenet data set are used as initialization parameters, and three full-connection layers are connected behind the pre-training layer;
the 1 st module in the pre-training layer specifically comprises 2 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional layers and convolutional kernels is 64, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 2 nd module specifically comprises 2 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 128, the size is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 3 rd module specifically comprises 3 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 256, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 4 th module specifically comprises 3 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 512, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the calculation formula of each convolution layer is as follows:
CO=φrelu(WC·CI+θC),
wherein, thetaCIs the offset vector of the convolutional layer; wCIs the weight of the convolutional layer; cIIs an input to the convolutional layer; cOIs the output of the convolutional layer;
the full-link layer module specifically comprises 3 full-link layers, wherein the 1 st full-link layer comprises 512 nodes, the 2 nd full-link layer comprises Vec nodes, and the 3 rd full-link layer comprises NTA node; n is a radical ofTThe sum of the number of split sub-shape parts for all classes of shapes; the calculation formula of the first 2 layers of full connection layers is as follows:
FO=φtanh(WF·FI+θF),
wherein phi istanhIs tan h activation function, θFIs the offset vector of the fully-connected layer; wFIs the weight of the fully-connected layer; fIIs the input to the fully-connected layer; fOIs the output of the fully-connected layer;
the last full-connection layer is an output layer, and the output calculation formula is as follows:
YO=φsoftmax(WY·YI+θY),
wherein phi issoftmaxFor softmax activation function, θYIs a bias vector of output layers, each output layer having neurons representing a corresponding one of the sub-shape portion classes, WYIs the weight of the output layer, YIIs an input to the output layer; y isOIs the output of the output layer.
Further, the specific method for constructing the feature matrix of the shape sample in the step eight is as follows:
for any shape sample S, N sub-shape parts formed by dividing the sample S are expressed by corresponding feature matrixesThe calculation formula of (2) is as follows:
wherein, FaA row vector, F, representing the matrix FaThe feature vector of the a-th sub-shape part output by the seventh step,representing a zero vector of dimension Vec.
Further, in the ninth step, a structure of the graph convolutional neural network is constructed, which includes a preprocessing input layer, a hidden layer and a classification output layer, and the preprocessing input layer is provided with an adjacency matrixThe normalization pretreatment specifically comprises the following steps:
whereinINIs a matrix of units, and is,is a matrix of the degrees, and the degree matrix,after normalization pretreatment
The hidden layer comprises 2 graph convolution layers, and the calculation formula of each graph convolution layer is as follows:
wherein the content of the first and second substances,is the weight of the graph convolution layer; hIIs the input of the graph convolution layer, the input of the 1 st convolution layer being the feature matrix of the shape sampleHOIs the output of the graph convolution layer;
the calculation formula of the classification output layer is as follows:
wherein phi issoftmaxFor softmax activation function, GIIs the input of the output layer, i.e. the output of the second layer map convolutional layer, GWIs the weight of the output layer; gOIs the output of the output layerDischarging; the neurons of each output layer represent a corresponding one of the shape classes.
Further, the specific method for realizing the contour shape classification and identification in the step ten is as follows: training the graph convolution neural network model until convergence; for any test shape sample, firstly extracting shape contour key points, calculating curvature values of the key points, judging the concavity and the convexity of the key points, obtaining candidate segmentation points, and then adjusting a curvature screening threshold value to obtain shape segmentation points; obtaining a segmentation line set according to the two principles of (1) and (2) in the fifth step, and calculating the segmentation cost of the segmentation line in the segmentation line set; if the number of the segment lines is less thanAll the segmentation line segments are used to segment the shape; otherwise, according to the minimum segmentation costEach segmentation line segment segments the shape; calculating the color feature expression image of each sub-shape part, inputting the color feature expression image into a trained convolutional neural network, and taking the output of a second layer full-connection layer of the convolutional neural network as a feature vector of the sub-shape part; and constructing a shape directed graph of the test shape sample, calculating an adjacency matrix and a feature matrix of the test shape sample, inputting the shape directed graph into a trained graph convolution neural network model, and judging the shape type corresponding to the maximum value in the output vector as the shape type of the test sample to realize shape classification and identification.
The invention provides a new shape recognition method, and designs a new shape classification method by utilizing a graph convolution neural network; the proposed topological graph expression of the shape features is a directed graph structure constructed based on graph segmentation, which not only distinguishes shape hierarchies, but also fully utilizes the stable topological feature relationship among the hierarchical parts of the shapes to replace the geometric position relationship. Compared with the method in the background technology, the method only calculates and compares the corresponding salient point characteristics for matching, and the method can be more robustly suitable for the interferences of shape hinge transformation, partial shielding, rigid body transformation and the like; the full-scale visual representation method of the shape can comprehensively express all information of each sub-shape part, and then the characteristics of each part in the full-scale space are extracted by utilizing the continuous convolution calculation of the neural network; compared with the method of directly applying the convolutional neural network, the designed graph convolutional neural network has the advantages that training parameters are greatly reduced, and the calculation efficiency is higher.
Drawings
FIG. 1 is a flow chart of the operation of a shape recognition method of the present invention.
FIG. 2 is a partial sample schematic of a target shape in a shape sample set.
Fig. 3 is a schematic diagram of the segmentation of a shape sample.
Fig. 4 is a schematic diagram of a full scale space.
Fig. 5 is a schematic diagram of the target shape after being cut by a preset scale.
Fig. 6 is a schematic diagram of the target shape after being segmented by the preset scale.
FIG. 7 is a schematic diagram of a feature function of a sub-shape portion of a target shape at a single scale.
FIG. 8 is a schematic diagram of a feature matrix of a sub-shape portion of an object shape in full scale space.
Fig. 9 is a schematic diagram of three gray-scale images calculated from a sub-shape portion of the target shape and a synthesized color image.
FIG. 10 is a diagram of a convolutional neural network structure for training each of the sub-shape portion feature representation images.
Fig. 11 is a characteristic configuration diagram of each sub-shape portion of the target shape.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a shape recognition method includes the following steps:
1. the number of lumped samples of shape samples is 1400 for 70 shape classes, each with 20 shape samples. Fig. 2 is a partial sample view of a target shape in a shape sample set. And randomly selecting half of the samples in each shape category to be divided into a training set, and dividing the rest half of the samples into a testing set to obtain 700 training samples and 700 testing samples. Sampling each shape sample to obtain 100 contour points, taking a shape sample S as an example:
S={px(i),py(i)|i∈[1,100]},
wherein p isx(i),py(i) The abscissa and ordinate of the contour sampling point p (i) in the two-dimensional plane.
And (3) evolving the contour curve of the shape sample to extract key points, and deleting the point which has the minimum contribution to target identification in each evolution process. Wherein the contribution of each point p (i) is defined as:
wherein H (i, i-1) is the length of the curve between points p (i) and p (i-1), H (i, i +1) is the length of the curve between points p (i) and p (i +1), H1(i) The length h is normalized by the perimeter of the contour for the angle between segment p (i) p (i-1) and segment p (i) p (i + 1). A larger value of con (i) indicates a larger contribution of the point p (i) to the shape feature.
The method introduces an adaptive termination function F (t) based on a region to overcome the problem of excessive or insufficient extraction of the key points of the contour:
wherein S0Is the area of the original shape, SiIs the area after i evolutions, n0Is the total number of points on the outline of the original shape. When the ending function value F (t) exceeds the set threshold, the extraction of the key points of the contour is ended. For the case shown in FIG. 324 contour key points are obtained by extracting the shape sample S.
2. And calculating the approximate bias curvature value and the curve concavity and convexity at each key point of the shape sample. Taking the shape sample S as an example, the approximate bias curvature value cur of the contour key point p (i)~The calculation formula of (p (i)) is as follows:
cur~(p(i))=cosHε(i)+1,
wherein Hε(i) The angle between line p (i) p (i-epsilon) and line p (i) p (i + epsilon), epsilon equals 3.
The method for judging the concave-convex performance of the curve at the key point p (i) of the contour is as follows:
equidistantly sampling a line segment p (i-epsilon) p (i + epsilon) to obtain R discrete points, and if the pixel values of the R discrete points are all 255, the line segment p (i-epsilon) p (i + epsilon) is completely in the shape contour, namely the curve at p (i) shows a convex shape; if the pixel values of the R discrete points are all 0, the line segment p (i-epsilon) p (i + epsilon) is all outside the shape contour, i.e. the curve at p (i) shows a concave shape. Let the key point p (i) where all curves represent a valley be the candidate segmentation point p (j). For the shape sample S, a total of 11 candidate segmentation points are extracted.
3. And adjusting a curvature screening threshold Th and obtaining a shape segmentation point. For 11 candidate segmentation points p (j) of the shape sample S, their average approximate bias curvature value is used as an initial threshold Th0:
Wherein the average approximate bias curvature value cur of each of the 11 candidate segmentation points P (j)~The sizes of (P (j)) are 0.1,0.2,0.25,0.35,0.4,0.5,0.5,0.64,0.7,0.7,0.8, respectively.
The threshold is sequentially increased according to the following method:
(1) for threshold Th at the time of adjustment of the t-Th timeτBased on the approximate bias curvature value and Th of each candidate division point P (j)τThe size relationship of (a) can be divided into two types: approximate bias curvature value greater than ThτCandidate division point ofAnd approximate bias curvature value of Th or lessτCandidate division point ofCalculating and recording the segmentation division degree D under the current threshold valueτ:
Wherein the content of the first and second substances,
whereinRespectively represent threshold values ThτThe positive and negative curvature deviations of the next candidate division points P (j),represents the minimum of the positive curvature deviations of all candidate segmentation points,represents the maximum value of the negative curvature deviations of all candidate segmentation points.
Judging whether an approximate bias curvature value larger than a threshold value Th exists or notτIf the candidate segmentation point does not exist, the adjustment is not performed, and the step (3) is carried out. If there is an approximate bias curvature value greater than threshold ThτGo to step (2) to continue adjusting the threshold.
(2) Continuing to adjust the threshold, new threshold Thτ+1Positive curvature deviation of all candidate segmentation points in the last threshold adjustment processIs formulated as follows:
according to a threshold Thτ+1Calculating positive and negative curvature deviation of each candidate division point under the adjustment of the (tau +1) th timeAnd a division degree Dτ+1And recorded. Judging whether an approximate bias curvature value larger than a threshold value Th exists or notτ+1If the candidate segmentation point does not exist, the adjustment is not performed, and the step (3) is carried out. If there is an approximate bias curvature value greater than threshold Thτ+1Let τ be τ +1, repeat the current step and continue to adjust the threshold.
(3) The multiple threshold adjustment value has a plurality of segmentation degrees, the threshold corresponding to the maximum segmentation degree is the final curvature screening threshold Th, and the point with the approximate bias curvature value larger than the threshold Th is the final shape segmentation point.
For the shape sample S, the segmentation degree and the threshold value recorded in the 4-time threshold value adjustment process are respectively:
therefore, the maximum segmentation degree D1Corresponding threshold Th1A threshold value was selected for the final curvature, i.e., Th ═ 0.5. The 5 candidate segmentation points with the approximate bias curvature values smaller than Th are the final shape segmentation points, and the corresponding approximate bias curvatures are 0.1,0.2,0.25,0.35 and 0.4 respectively.
4. For the shape sample S, 5 shape division points are connected pairwise in sequence to form 10 line segments, and 7 line segments which are positioned in the shape and do not intersect with each other are reserved as a division line segment set. Calculating the segmentation Cost of each segmentation line segment by using the metric index I according to the following method:
wherein D*(u,v),L*(u,v),S*(u, v) are three division measurement indexes of normalized division length, division arc length and division residual area, u and v are serial numbers of any two shape division points,is the total number of the division points.
For any shape segment p (u) p (v), three ways of calculating the segmentation evaluation index are as follows:
wherein DmaxThe length of the segment having the largest length among all the segments, D*The value range of (u, v) should be between 0 and 1, and the segmentation effect is more remarkable when the numerical value is smaller.
WhereinIs a profile curve between two points P (u) and P (v)Length of (L)*The value range of (u, v) should be between 0 and 1, and the segmentation effect is more remarkable when the numerical value is smaller.
Wherein SdIs the shape area divided by the line segment P (u) P (v), i.e. the line segment P (u) P (v) and the contour curveArea of the enclosed area formed, S*The value range of (u, v) should be between 0 and 1, and the segmentation effect is more remarkable when the numerical value is smaller.
Calculating the segmentation Cost for the segmentation line segment P (u) P (v) according to the steps:
Cost=αD*(u,v)+βL*(u,v)+γS*(u,v),
where α, β, γ are the weights of the metrics.
As shown in fig. 3, for the shape sample S, 2 segment with the smallest and the second smallest segmentation cost are selected as the final optimal segmentation segment, and 3 sub-shape portions are obtained.
5. For the shape sample S, the central shape portion is taken as the starting vertex v1And the other adjacent 2 shape parts are subjected to vertex sorting in the clockwise direction and are respectively recorded as vertexes { v2,v3}. V. note connection1To the vertex v2,v3Are each (v)1,v2),(v1,v3) And further forming a shape directed graph which meets the topological order:
G1=(V1,E1),
wherein V1={v1,v2,v3},E1={(v1,v2),(v1,v3)}。
Since the maximum number of sub-shape portions obtained by dividing each training sample of the contour shape set is 11, the adjacency matrix of the shape sample SExpressed as:
wherein a belongs to [1, 11], b belongs to [1, 11 ].
6. Respectively carrying out full-scale visual representation on the 3 sub-shape parts obtained by segmentation, wherein the specific method of the full-scale visual representation is as follows:
(1) for any one of the sub-shape portion contours, sampling the contour results in 100 contour sample points. As shown in fig. 4, the total scale number in the full scale space is set to be 100, and the normalized area, the arc length and the gravity center distance of each contour point corresponding to each layer scale are calculated based on the coordinates of 100 contour sampling points. In the form of a sub-shaped part S1For example, the specific calculation is as follows:
in the form of a sub-shaped part S1Sampling point p of contour1(i) As the center of circle, with the initial radiusMaking a preset circle C1(i) And the preset circle is the initial semi-global scale for calculating the parameters of the corresponding contour points. Obtaining a preset circle C according to the steps1(i) Then, the target shape must have a portion falling within the predetermined circle, which is schematically shown in fig. 5. If the part of the target shape falling within the preset circle is a single area, the single area is the target contour point p1(i) The region having a direct connection relationship, denoted as Z1(i) (ii) a If the portion of the target shape falling within the preset circle is divided into a plurality of regions which are not connected to each other, such as the region A and the region B shown in FIG. 5, the target contour point p is determined1(i) The area on the contour of the target is the point p corresponding to the target contour1(i) The region having a direct connection relationship, i.e., region A in FIG. 5, is denoted asZ1(i) In that respect Based on this, a circle C is preset1(i) And target contour point p1(i) Zone Z having a direct connection relationship1(i) Is marked asThen there are:
wherein, B (Z)1(i) Z) is an indicator function defined as
Will Z1(i) Area of and predetermined circle C1(i) The ratio of the areas is used as the target contour point p1(i) Area parameter s of descriptor (1)1(i):
s1(i) Should range between 0 and 1.
At the calculated and target contour point p1(i) When the center of gravity of the region having the direct connection relationship exists, specifically, the average value is calculated for the coordinate values of all the pixel points in the region, and the obtained result is the coordinate value of the center of gravity of the region, which can be expressed as:
wherein, w1(i) I.e. the center of gravity of the above-mentioned region.
And calculate the target contour point and the center of gravity w1(i) Is a distance ofCan be expressed as:
and will beThe ratio of the radius of the preset circle of the target contour point is used as the target contour point p1(i) Centroid distance parameter c of descriptor1(i):
c1(i) Should range between 0 and 1.
The contour of the target shape is cut by the predetermined circle and then one or more arc segments are inevitably formed in the predetermined circle, as shown in fig. 6. If only one arc segment of the target shape falls in the preset circle, determining the arc segment as the target contour point p1(i) The arc segments with direct connection relation, if the target shape has a plurality of arc segments falling in the preset circle, such as arc segments A (segment A), B (segment B), C (segment C) in FIG. 6, the target contour point p is determined1(i) The arc segment is the point p corresponding to the target contour1(i) The arc segment having the direct connection relationship is the arc segment A (segment A) in FIG. 6. Based on this, a circle C is preset1(i) Inner and target contour points p1(i) The length of the arc segment having the direct connection relationship is recorded asAnd will beAnd a predetermined circle C1(i) The ratio of the perimeter is used as the arc length parameter l of the descriptor of the target contour point1(i):
Wherein l1(i) Should range between 0 and 1.
Calculating according to the steps to obtain the initial radius of the scale label k which is 1Of the shape sample S at the semi-global scale1Characteristic function M of1:
M1={s1(i),l1(i),c1(i)|i∈[1,100]},
As shown in FIG. 7, the respective feature functions at 100 scales in the full scale space are calculated separately, where for the kth scale label, a circle C is setkRadius r ofk:
I.e. when the initial dimension k is 1,after this radius rkThe image is reduced by equal amplitude 99 times by taking a pixel as a unit until the minimum dimension k is 100. Calculating to obtain a sub-shape part S of the shape sample S under the whole scale space1The characteristic function of (1):
M={sk(i),lk(i),ck(i)|k∈[1,100],i∈[1,100]},
(2) as shown in fig. 8, the sub-shape portion S1Combining the characteristic functions under 100 scales in the full-scale space into three characteristic matrixes under the full-scale space according to the order of the scales:
GM1={SM,LM,CM},
wherein S isM、LM、CMEach representing a grayscale image is a grayscale matrix of size m × n.
(3) As shown in fig. 9, the sub-shape portion S1III of (2)The gray scale image is used as RGB three channels to synthesize a color image as the sub-shape part S1Is characteristic expression image
7. And constructing a convolutional neural network, which comprises an input layer, a pre-training layer and a full connection layer. The method inputs the characteristic expression image samples of all the sub-shape parts of the training shape samples into the convolutional neural network to train the convolutional neural network model. Different sub-shape portions of each type of shape have different class labels. After the convolutional neural network is trained to converge, taking the shape sample S as an example, the feature expression image { T with the size of 100 × 100 corresponding to the 3 sub-shape parts formed by dividing the shape sample Snum|num∈[1,3]Inputting the trained convolutional neural network respectively, outputting the feature vector of the corresponding sub-shape part from the second layer full-connection layer of the networkWherein Vec is the number of neurons in the second fully-connected layer, and Vec is set to 200.
The invention uses sgd optimizer, learning rate is set to 0.001, delay rate is set to 1e-6, cross entropy is selected as loss function, and size of batch size is selected to be 128. As shown in fig. 10, the pre-training layer is composed of the first 4 modules of the VGG16 network model, and the parameters obtained after the 4 modules are trained in the imagenet data set are used as initialization parameters, and three full-connection layers are connected behind the pre-training layer.
The 1 st module in the pre-training layer specifically comprises 2 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional layers and convolutional kernels is 64, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 2 nd module specifically comprises 2 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 128, the size is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 3 rd module specifically comprises 3 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 256, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 4 th module specifically comprises 3 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional layers is 512, the size is 3 × 3, and the size of the pooling layer is 2 × 2. The calculation formula of each convolution layer is as follows:
CO=φrelu(WC·CI+θC),
wherein, thetaCIs the offset vector of the convolutional layer; wCIs the weight of the convolutional layer; cIIs an input to the convolutional layer; coIs the output of the convolutional layer;
the full-link layer module specifically includes a 3-layer full-link layer, wherein the 1 st layer full-link layer includes 512 nodes, the 2 nd layer full-link layer includes 200 nodes, and the 3 rd layer full-link layer includes 770 nodes. The calculation formula of the first 2 layers of full connection layers is as follows:
FO=φtanh(WF·FI+θF),
wherein phi istanh is the tan h activation function, θFIs the offset vector of the fully-connected layer; wFIs the weight of the fully-connected layer; fIIs the input to the fully-connected layer; fOIs the output of the fully-connected layer;
the last full-connection layer is an output layer, and the output calculation formula is as follows:
YO=φsoftmax(WY·YI+θY),
wherein phi issoftmaxFor softmax activation function, θYIs a bias vector of output layers, each output layer having neurons representing a corresponding one of the sub-shape portion classes, WYIs the weight of the output layer, YIIs an input to the output layer; y isOIs the output of the output layer;
8. as shown in FIG. 11, the feature matrix of the shape sample is constructed from the 3 sub-shape feature vectors of the shape sample
Wherein, FaA row vector, F, representing the matrix FaThe feature vector of the a-th sub-shape part output for the above step,representing a zero vector of dimension size 200.
9. And constructing a graph convolution neural network, which comprises a preprocessing input layer, a hiding layer and a classification output layer. The invention relates to a adjacency matrix of a shape sample topological graphAnd feature matrixAnd inputting the graph convolution neural network structure model for training. The invention uses sgd optimizer, learning rate is set to 0.001, delay rate is set to 1e-6, cross entropy is selected as loss function, and size of batch size is selected to be 128.
Preprocessing of adjacency matrices in input layerThe normalization pretreatment specifically comprises the following steps:
whereinIs a matrix of units, and is,is a matrix of the degrees, and the degree matrix, after normalization pretreatment
The hidden layer comprises 2 graph convolution layers, and the calculation formula of each graph convolution layer is as follows:
wherein the content of the first and second substances,is the weight of the graph convolution layer; hIIs the input of the graph convolution layer, the input of the 1 st convolution layer being the feature matrix of the shape sampleHOIs the output of the graph convolution layer;
the calculation formula of the classification output layer is as follows:
wherein phi issoftmaxFor softmax activation function, GIIs the input of the output layer, i.e. the output of the second layer map convolutional layer, GWIs the weight of the output layer; gOIs the output of the output layer; the neurons of each output layer represent a corresponding one of the shape classes.
10. And inputting all training samples into the graph convolution neural network, and training the graph convolution neural network model. For any test shape sample, firstly extracting shape contour key points, calculating curvature values of the key points, judging the concavity and the convexity of the key points, obtaining candidate segmentation points, and then adjusting a curvature screening threshold value to obtain shape segmentation points. And connecting the shape segmentation points pairwise in sequence to form segmentation line segments, reserving line segments which are positioned in the shape and do not cross with each other as the segmentation line segments to obtain a segmentation line segment set, and calculating the segmentation cost of the segmentation line segments in the segmentation line segment set. If the number of segment lines is less than 10, all segment lines are used to segment the shape. Otherwise, the shape is segmented according to the 10 segmentation line segments with the minimum segmentation cost. And calculating the color feature expression image of each sub-shape part, inputting the color feature expression image into a trained convolutional neural network, and taking the output of a second layer full-connection layer of the convolutional neural network as the feature vector of the sub-shape part. And constructing a shape directed graph of the test shape sample, calculating an adjacency matrix and a feature matrix of the test shape sample, inputting the shape directed graph into a trained graph convolution neural network model, and judging the shape type corresponding to the maximum value in the output vector as the shape type of the test sample to realize shape classification and identification.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.
Claims (11)
1. A shape recognition method characterized by: the method comprises the following steps:
firstly, extracting outline key points of a shape sample;
step two, defining approximate bias curvature values at each key point and judging the curve concavity and convexity at the key points to obtain candidate shape segmentation points;
step three, adjusting a curvature screening threshold value to obtain shape division points;
fourthly, segmenting the shape based on the principle that the segmentation line segments are positioned in the shape and do not intersect with each other, and segmenting at the minimum segmentation cost to obtain a plurality of sub-shape parts;
step five, constructing a topological structure of the shape sample;
step six, obtaining a feature expression image of a corresponding sub-shape part by using a full-scale visual representation method of the shape;
inputting each feature expression image into a convolutional neural network for training, and learning to obtain a feature vector of each sub-shape part;
step eight, constructing a feature matrix of the shape sample;
constructing a graph convolution neural network;
step ten, training a graph convolution neural network, carrying out shape segmentation on the test sample, obtaining the feature vector of each sub-shape part, calculating the feature matrix and the adjacency matrix of the test sample, and inputting the feature matrix and the adjacency matrix into the trained graph convolution network model to realize shape classification and recognition.
2. A shape recognition method according to claim 1, characterized in that: in the first step, the method for extracting the key points of the contour comprises the following steps:
the contour of each shape sample is composed of a series of sampling points, and for any shape sample S, sampling n points on the contour results in:
S={(px(i),py(i))|i∈[1,n]},
wherein p isx(i),py(i) Is the horizontal and vertical coordinates of the contour sampling points p (i) in a two-dimensional plane, and n is the contour length, namely the number of the contour sampling points;
evolving the contour curve of the shape sample to extract key points, and deleting the point with the minimum contribution to target identification in each evolution process, wherein the contribution of each point p (i) is defined as:
wherein H (i, i-1) is the length of the curve between points p (i) and p (i-1), H (i, i +1) is the length of the curve between points p (i) and p (i +1), H1(i) The angle between the segment p (i) p (i-1) and the segment p (i) p (i +1) is defined, and the length h is normalized according to the perimeter of the contour; a larger value of Con (i) indicates a larger contribution of the point p (i) to the shape feature;
the method introduces an adaptive termination function F (t) based on a region to overcome the problem of excessive or insufficient extraction of the key points of the contour:
wherein S0Is the area of the original shape, SiFor the shape area after i evolutions, n0The total number of points on the outline of the original shape; when the end function value F (t) exceeds the set threshold, the extraction of the contour key points is ended and n is obtained*And (4) contour key points.
3. A shape recognition method according to claim 2, wherein: in the second step, the approximate bias curvature value at each key point is defined and the curve concavity and convexity at the key point is judged, so as to obtain the candidate segmentation points, the specific method comprises the following steps:
in order to calculate an approximate bias curvature value of any key point p (i) in the shape sample S, taking adjacent contour points p (i-epsilon) before and after p (i), p (i + epsilon), wherein epsilon is an empirical value; because:
cosHε(i)∝cur(p(i)),
wherein Hε(i) Is the angle between line p (i) p (i-epsilon) and line p (i) p (i + epsilon), cur (p (i)) is the curvature at point p (i);
the approximate bias curvature values cur (p (i)) at points p (i)) are defined as:
cur~(p(i))=cosHε(i)+1,
wherein Hε(i) Is the angle between line p (i) p (i-epsilon) and line p (i) p (i + epsilon), cosHε(i) The value range is between-1 and 1, and the value range of cur to (p (i)) is between 0 and 2;
according to a shape segmentation method conforming to visual naturalness, shape segmentation points are all positioned at the concave curve of the outline; therefore, when screening candidate segmentation points for shape segmentation, a method for judging the concave-convex property of the curve at the key point p (i) is defined:
for the shape binarization image, the numerical values of the pixel points inside the shape sample S outline are all 255, and the numerical values of the pixel points outside the shape sample S outline are all 0; equidistantly sampling a line segment p (i-epsilon) p (i + epsilon) to obtain R discrete points, and if the pixel values of the R discrete points are all 255, the line segment p (i-epsilon) p (i + epsilon) is completely in the shape contour, namely the curve at p (i) shows a convex shape; if the pixel values of the R discrete points are all 0, the line segment p (i-epsilon) p (i + epsilon) is all outside the shape outline, namely the curve at p (i) shows a concave shape; let the key point p (i) where all curves represent a valley be the candidate segmentation point p (j).
4. A shape recognition method according to claim 3, wherein: in the third step, the step of adjusting the curvature screening threshold Th and obtaining the shape segmentation point is as follows:
(1) regarding all candidate segmentation points P (j) obtained in the step two, taking the average approximate bias curvature value of the candidate segmentation points P (j) as an initial threshold Th0:
Wherein J is the total number of the candidate segmentation points;
(2) for threshold Th at the time of adjustment of the t-Th timeτBased on the approximate bias curvature value and Th of each candidate division point P (j)τThe size relationship of (a) can be divided into two types: approximate bias curvature value greater than ThτCandidate division point ofAnd approximate bias curvature value of Th or lessτCandidate division point ofCalculating and recording the segmentation division degree D under the current threshold valueτ:
Wherein the content of the first and second substances,
whereinRespectively represent threshold values ThτThe positive and negative curvature deviations of the next candidate division points P (j),represents the minimum of the positive curvature deviations of all candidate segmentation points,representing the maximum value of the negative curvature deviation of all candidate segmentation points;
judging whether an approximate bias curvature value larger than a threshold value Th exists or notτIf the candidate segmentation point does not exist, the adjustment is not carried out, and the step (4) is carried out; if there is an approximate bias curvature value greater than threshold ThτTurning to the step (3) to continuously adjust the threshold value;
(3) continuing to adjust the threshold, new threshold Thτ+1The minimum value of the positive curvature deviations of all candidate segmentation points in the last threshold value adjusting process is expressed by the following formula:
according to a threshold Thτ+1Calculating positive and negative curvature deviation of each candidate division point under the adjustment of the (tau +1) th timeAnd a division degree Dτ+1And recording; judging whether an approximate bias curvature value larger than a threshold value Th exists or notτ+1If the candidate segmentation point does not exist, the adjustment is not carried out, and the step (4) is carried out; if there is an approximate bias curvature value greater than threshold Thτ+1If the candidate segmentation point is τ +1, repeating the current step and continuously adjusting the threshold;
(4) the multiple threshold adjustment value has a plurality of segmentation degrees, the threshold corresponding to the maximum segmentation degree is the final curvature screening threshold Th, and the point with the approximate bias curvature value smaller than the threshold Th is the final shape segmentation point.
5. A shape recognition method according to claim 4, wherein: in the fourth step, the specific method for segmenting the shape based on the principle that the segmentation line segments are positioned in the shape and do not intersect with each other and obtaining a plurality of sub-shape parts by segmentation with the minimum segmentation cost comprises the following steps:
(1) partition Point P (e) for any two shapes1),P(e2) Equally-spaced sample-divided line segment P (e)1)P(e2) C discrete points are obtained, and if there is a discrete point with a pixel value of 0 among the C discrete points, the line segment P (e)1)P(e2) The part outside the shape outline exists and is not selected as a segmentation line segment;
(2) partition Point P (e) for any two shapes3),P(e4) If there is a bar-shaped segment P (e)5)P(e6) Such that:
or
The line segment P (e)3)P(e4) With the existing segment P (e)5)P(e6) Intersect, no line segment P (e) is selected3)P(e4) As a segment;
(3) the segmentation line segment set meeting the two principles is further screened, and three measurement indexes I for evaluating the quality of the segmentation line segment are defined to realize segmentation under the minimum segmentation cost:
wherein D*(u,v)、L*(u,v)、S*(u, v) are three division measurement indexes of normalized division length, division arc length and division residual area, u and v are serial numbers of any two shape division points,the total number of the division points is;
for any shape segment p (u) p (v), three ways of calculating the segmentation evaluation index are as follows:
wherein DmaxThe length of the segment having the largest length among all the segments, D*The value range of (u, v) should be between 0 and 1, and the smaller the value is, the more remarkable the segmentation effect is;
whereinIs a profile curve between two points P (u) and P (v)Length of (2),L*The value range of (u, v) should be between 0 and 1, and the smaller the value is, the more remarkable the segmentation effect is;
wherein SdIs the shape area divided by the line segment P (u) P (v), i.e. the line segment P (u) P (v) and the contour curveArea of the enclosed area formed, S*The value range of (u, v) should be between 0 and 1, and the smaller the value is, the more remarkable the segmentation effect is;
calculating the segmentation Cost for the segmentation line segment P (u) P (v) according to the steps:
Cost=αD*(u,v)+βL*(u,v)+γS*(u,v),
wherein, alpha, beta and gamma are the weight of each measurement index;
calculating the segmentation Cost of the segmentation line segments in the screened segmentation line segment set; sorting all the calculated Cost from small to large, and finally selecting N-1 segmentation line segments with the smallest Cost according to the number N of the segmentation sub-shape parts set by the type of the shape sample S, so as to realize optimal segmentation and obtain N sub-shape parts; the number N of the divided sub-shape portions depends on the category to which the current shape sample S belongs, and the corresponding number of the divided sub-shape portions is manually set for shapes of different categories.
6. A shape recognition method according to claim 5, wherein: in the fifth step, a specific method for constructing the topological structure of the shape sample is as follows: for N sub-shape portions obtained by dividing any shape sample S, the central shape portion is recorded as a start vertex v1And the other adjacent shape parts are sorted in the clockwise direction as vertex { v }o|o∈[2,N]}; v. note connection1To the remaining vertices voIs (v)1,vo) And further forming a shape directed graph which meets the topological order:
G1=(V1,E1),
wherein V1={vo|o∈[1,N]},E1={(v1,vo)|o∈[2,N]};
After all the training shape samples are optimally segmented, recording the maximum number of the sub-shape parts obtained by segmenting the training shape samples asFor any shape sample S, its adjacency matrixThe calculation method is as follows:
7. a shape recognition method according to claim 6, wherein: in the sixth step, a specific method for obtaining the color feature expression image corresponding to the sub-shape part by using a full-scale visual representation method of the shape is as follows:
for a sub-shape portion S of any shape sample S1:
Wherein the content of the first and second substances,for the contour sampling point p of the sub-shape part1(i) Abscissa and ordinate in two-dimensional plane, n1The length of the outline, namely the number of the outline sampling points;
first, the sub-shape portion S is described by using a feature function M composed of three shape descriptors1The profile of (c):
M={sk(i),lk(i),ck(i)|k∈[1,m],i∈[1,n1]},
wherein s isk,lk,ckThree invariant parameters of normalized area s, arc length l and gravity center distance c in a scale k, wherein k is a scale label, and m is a total scale degree; these three shape invariant descriptors are defined separately:
sampling points p by a contour1(i) As the center of circle, with the initial radiusMaking a preset circle C1(i) The preset circle is the initial semi-global scale for calculating the parameters of the corresponding contour points; obtaining a preset circle C according to the steps1(i) Then, the three shape descriptors under the scale k-1 are calculated as follows:
in calculating s1(i) When describing the son, the circle C will be preset1(i) And target contour point p1(i) Zone Z having a direct connection relationship1(i) Is marked asThen there are:
wherein, B (Z)1(i),z) is an indicator function defined as
Will Z1(i) Area of and predetermined circle C1(i) The ratio of the areas is used as the area parameter s of the descriptor of the target contour point1(i):
s1(i) Should range between 0 and 1;
in the calculation of c1(i) When describing the sub-image, firstly, the contour point p of the target is calculated1(i) The center of gravity of the region having the direct connection relationship is specifically an average of coordinate values of all pixel points in the region, and an obtained result is a coordinate value of the center of gravity of the region, which can be expressed as:
wherein, w1(i) The gravity center of the area is obtained;
then, the target contour point p is calculated1(i) And center of gravity w1(i) Is a distance ofCan be expressed as:
finally will beWith the target contour point p1(i) Predetermined circle C1(i) Radius ratio ofThe value is used as the gravity center distance parameter c of the descriptor of the target contour point1(i):
c1(i) Should range between 0 and 1;
in calculating l1(i) When describing the son, the circle C will be preset1(i) Inner and target contour points p1(i) The length of the arc segment having the direct connection relationship is recorded asAnd will beAnd a predetermined circle C1(i) The ratio of the perimeter is used as the arc length parameter l of the descriptor of the target contour point1(i):
Wherein l1(i) Should range between 0 and 1;
calculating according to the steps to obtain the initial radius of the scale label k which is 1Of the shape sample S at the semi-global scale1Characteristic function M of1:
M1={s1(i),l1(i),c1(i)|i∈[1,n1]},
Selecting a single pixel as a continuous scale change interval under a full-scale space because the digital image takes one pixel as a minimum unit; i.e. for the kth dimension label, set circle CkRadius r ofk:
I.e. when the initial dimension k is 1,after this radius rkReducing the size of the pixel by m-1 times in a constant amplitude mode by taking one pixel as a unit until the minimum size k is m; according to the characteristic function M under the calculation scale k being 11In the method, the characteristic functions under other scales are calculated, and finally the sub-shape part S of the shape sample S under the whole scale is obtained1The characteristic function of (1):
M={sk(i),lk(i),ck(i)|k∈[1,m],i∈[1,n1]},
respectively storing the characteristic functions under all scales into a matrix SM、LM、CM,SMFor storing sk(i),SMIs stored as a point p in the k-th row and i-th column of1(i) Area parameter s at k-th scalek(i);LMFor storing lk(i),LMIs stored as a point p in the k-th row and i-th column of1(i) Arc length parameter l at k-th scalek(i);CMFor storing ck(i),CMIs stored as a point p in the k-th row and i-th column of1(i) Center of gravity distance parameter c at k scalek(i);SM、LM、CMFinally as a sub-shape portion S of the shape sample S in the full-scale space1Grayscale map representation of three shape features of (1):
GM1={SM,LM,CM},
wherein S isM、LM、CMAll the matrixes are m multiplied by n in size and respectively represent a gray image;
8. A shape recognition method according to claim 7, wherein: in the seventh step, the feature expression image samples of all the sub-shape parts of the training shape samples are input into a convolutional neural network, and a convolutional neural network model is trained; different sub-shape portions of each type of shape have different class labels; after the convolutional neural network is trained to be convergent, for any shape sample S, the feature expression image { T) corresponding to N sub-shape parts formed by dividing the shape sample S is formednum|num∈[1,N]Inputting the trained convolutional neural network respectively, outputting the feature vector of the corresponding sub-shape part from the second layer full-connection layer of the networkWherein Vec is the number of neurons in the second fully-connected layer;
the structure of the convolutional neural network comprises an input layer, a pre-training layer and a full-connection layer; the pre-training layer consists of the first 4 modules of the VGG16 network model, parameters obtained after the 4 modules are trained in the imagenet data set are used as initialization parameters, and three full-connection layers are connected behind the pre-training layer;
the 1 st module in the pre-training layer specifically comprises 2 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional layers and convolutional kernels is 64, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 2 nd module specifically comprises 2 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 128, the size is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 3 rd module specifically comprises 3 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 256, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the 4 th module specifically comprises 3 convolutional layers and 1 maximum pooling layer, wherein the number of convolutional cores of the convolutional layers is 512, the size of the convolutional layers is 3 multiplied by 3, and the size of the pooling layer is 2 multiplied by 2; the calculation formula of each convolution layer is as follows:
CO=φrelu(WC·CI+θC),
wherein, thetaCIs the offset vector of the convolutional layer; wCIs the weight of the convolutional layer; cIIs an input to the convolutional layer; cOIs the output of the convolutional layer;
the full-link layer module specifically comprises 3 full-link layers, wherein the 1 st full-link layer comprises 512 nodes, the 2 nd full-link layer comprises Vec nodes, and the 3 rd full-link layer comprises NTA node; n is a radical ofTThe sum of the number of split sub-shape parts for all classes of shapes; the calculation formula of the first 2 layers of full connection layers is as follows:
FO=φtanh(WF·FI+θF),
wherein phi istanhIs tan h activation function, θFIs the offset vector of the fully-connected layer; wFIs the weight of the fully-connected layer; fIIs the input to the fully-connected layer; fOIs the output of the fully-connected layer;
the last full-connection layer is an output layer, and the output calculation formula is as follows:
YO=φsoftmax(WY·YI+θY),
wherein phi issoftmaxFor softmax activation function, θYIs a bias vector of output layers, each output layer having neurons representing a corresponding one of the sub-shape portion classes, WYIs the weight of the output layer, YIIs an input to the output layer; y isOIs the output of the output layer.
9. A shape recognition method according to claim 8, wherein: the specific method for constructing the feature matrix of the shape sample in the step eight comprises the following steps:
for any shape sample S, N sub-shape parts formed by dividing the sample S are expressed by corresponding feature matrixesIs calculated byThe formula is as follows:
10. A shape recognition method according to claim 9, wherein: in the ninth step, a structure of the graph convolution neural network is constructed, the structure comprises a preprocessing input layer, a hiding layer and a classification output layer, and the preprocessing input layer is subjected to adjacency matrixThe normalization pretreatment specifically comprises the following steps:
whereinINIs a matrix of units, and is,is a matrix of the degrees, and the degree matrix, after normalization pretreatment
The hidden layer comprises 2 graph convolution layers, and the calculation formula of each graph convolution layer is as follows:
wherein the content of the first and second substances,is the weight of the graph convolution layer; hIIs the input of the graph convolution layer, the input of the 1 st convolution layer being the feature matrix of the shape sampleHOIs the output of the graph convolution layer;
the calculation formula of the classification output layer is as follows:
wherein phi issoftmaxFor softmax activation function, GIIs the input of the output layer, i.e. the output of the second layer map convolutional layer, GWIs the weight of the output layer; gOIs the output of the output layer; the neurons of each output layer represent a corresponding one of the shape classes.
11. A shape recognition method according to claim 10, wherein: the specific method for realizing the contour shape classification and identification in the step ten comprises the following steps: training the graph convolution neural network model until convergence; for any test shape sample, firstly extracting shape contour key points, calculating curvature values of the key points, judging the concavity and the convexity of the key points, obtaining candidate segmentation points, and then adjusting a curvature screening threshold value to obtain shape segmentation points; obtaining a segmentation line segment set according to the two principles of (1) and (2) in the fifth step, and calculating the scoreThe segmentation cost of the segmentation line segments in the segmentation line segment set; if the number of the segment lines is less thanAll the segmentation line segments are used to segment the shape; otherwise, according to the minimum segmentation costEach segmentation line segment segments the shape; calculating the color feature expression image of each sub-shape part, inputting the color feature expression image into a trained convolutional neural network, and taking the output of a second layer full-connection layer of the convolutional neural network as a feature vector of the sub-shape part; and constructing a shape directed graph of the test shape sample, calculating an adjacency matrix and a feature matrix of the test shape sample, inputting the shape directed graph into a trained graph convolution neural network model, and judging the shape type corresponding to the maximum value in the output vector as the shape type of the test sample to realize shape classification and identification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110418108.2A CN113191361B (en) | 2021-04-19 | 2021-04-19 | Shape recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110418108.2A CN113191361B (en) | 2021-04-19 | 2021-04-19 | Shape recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113191361A true CN113191361A (en) | 2021-07-30 |
CN113191361B CN113191361B (en) | 2023-08-01 |
Family
ID=76977535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110418108.2A Active CN113191361B (en) | 2021-04-19 | 2021-04-19 | Shape recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113191361B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113392819A (en) * | 2021-08-17 | 2021-09-14 | 北京航空航天大学 | Batch academic image automatic segmentation and labeling device and method |
CN116486265A (en) * | 2023-04-26 | 2023-07-25 | 北京卫星信息工程研究所 | Airplane fine granularity identification method based on target segmentation and graph classification |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834922A (en) * | 2015-05-27 | 2015-08-12 | 电子科技大学 | Hybrid neural network-based gesture recognition method |
CN106934419A (en) * | 2017-03-09 | 2017-07-07 | 西安电子科技大学 | Classification of Polarimetric SAR Image method based on plural profile ripple convolutional neural networks |
CN108139334A (en) * | 2015-08-28 | 2018-06-08 | 株式会社佐竹 | Has the device of optical unit |
WO2020199468A1 (en) * | 2019-04-04 | 2020-10-08 | 平安科技(深圳)有限公司 | Image classification method and device, and computer readable storage medium |
CN111898621A (en) * | 2020-08-05 | 2020-11-06 | 苏州大学 | Outline shape recognition method |
CN112464942A (en) * | 2020-10-27 | 2021-03-09 | 南京理工大学 | Computer vision-based overlapped tobacco leaf intelligent grading method |
-
2021
- 2021-04-19 CN CN202110418108.2A patent/CN113191361B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834922A (en) * | 2015-05-27 | 2015-08-12 | 电子科技大学 | Hybrid neural network-based gesture recognition method |
CN108139334A (en) * | 2015-08-28 | 2018-06-08 | 株式会社佐竹 | Has the device of optical unit |
CN106934419A (en) * | 2017-03-09 | 2017-07-07 | 西安电子科技大学 | Classification of Polarimetric SAR Image method based on plural profile ripple convolutional neural networks |
WO2020199468A1 (en) * | 2019-04-04 | 2020-10-08 | 平安科技(深圳)有限公司 | Image classification method and device, and computer readable storage medium |
CN111898621A (en) * | 2020-08-05 | 2020-11-06 | 苏州大学 | Outline shape recognition method |
CN112464942A (en) * | 2020-10-27 | 2021-03-09 | 南京理工大学 | Computer vision-based overlapped tobacco leaf intelligent grading method |
Non-Patent Citations (2)
Title |
---|
杨冰 等: "融合局部特征与深度学习的三维掌纹识别", 《浙江大学学报(工学版)》, vol. 54, no. 03, pages 540 - 545 * |
杨剑宇 等: "用于遮挡形状匹配的弦角特征描述", 《光学精密工程》, vol. 23, no. 06, pages 1758 - 1767 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113392819A (en) * | 2021-08-17 | 2021-09-14 | 北京航空航天大学 | Batch academic image automatic segmentation and labeling device and method |
CN113392819B (en) * | 2021-08-17 | 2022-03-08 | 北京航空航天大学 | Batch academic image automatic segmentation and labeling device and method |
CN116486265A (en) * | 2023-04-26 | 2023-07-25 | 北京卫星信息工程研究所 | Airplane fine granularity identification method based on target segmentation and graph classification |
CN116486265B (en) * | 2023-04-26 | 2023-12-19 | 北京卫星信息工程研究所 | Airplane fine granularity identification method based on target segmentation and graph classification |
Also Published As
Publication number | Publication date |
---|---|
CN113191361B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107506761B (en) | Brain image segmentation method and system based on significance learning convolutional neural network | |
CN108154192B (en) | High-resolution SAR terrain classification method based on multi-scale convolution and feature fusion | |
CN110334765B (en) | Remote sensing image classification method based on attention mechanism multi-scale deep learning | |
CN106529447B (en) | Method for identifying face of thumbnail | |
US20190228268A1 (en) | Method and system for cell image segmentation using multi-stage convolutional neural networks | |
CN106951825B (en) | Face image quality evaluation system and implementation method | |
CN109063724B (en) | Enhanced generation type countermeasure network and target sample identification method | |
CN111898621B (en) | Contour shape recognition method | |
CN109784204B (en) | Method for identifying and extracting main fruit stalks of stacked cluster fruits for parallel robot | |
CN110321967B (en) | Image classification improvement method based on convolutional neural network | |
CN110032925B (en) | Gesture image segmentation and recognition method based on improved capsule network and algorithm | |
CN109033994B (en) | Facial expression recognition method based on convolutional neural network | |
JP2017157138A (en) | Image recognition device, image recognition method and program | |
CN110598692B (en) | Ellipse identification method based on deep learning | |
CN111986125A (en) | Method for multi-target task instance segmentation | |
CN110414616B (en) | Remote sensing image dictionary learning and classifying method utilizing spatial relationship | |
JP2010134957A (en) | Pattern recognition method | |
CN110223310B (en) | Line structure light center line and box edge detection method based on deep learning | |
CN113191361A (en) | Shape recognition method | |
CN113221956B (en) | Target identification method and device based on improved multi-scale depth model | |
CN112488128A (en) | Bezier curve-based detection method for any distorted image line segment | |
CN111968124B (en) | Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation | |
Lin et al. | Determination of the varieties of rice kernels based on machine vision and deep learning technology | |
CN115170805A (en) | Image segmentation method combining super-pixel and multi-scale hierarchical feature recognition | |
CN111310820A (en) | Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |