CN113657375B - Bottled object text detection method based on 3D point cloud - Google Patents

Bottled object text detection method based on 3D point cloud Download PDF

Info

Publication number
CN113657375B
CN113657375B CN202110769157.0A CN202110769157A CN113657375B CN 113657375 B CN113657375 B CN 113657375B CN 202110769157 A CN202110769157 A CN 202110769157A CN 113657375 B CN113657375 B CN 113657375B
Authority
CN
China
Prior art keywords
point
point cloud
image
points
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110769157.0A
Other languages
Chinese (zh)
Other versions
CN113657375A (en
Inventor
赵凡
李海宁
闻治泉
景翠宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202110769157.0A priority Critical patent/CN113657375B/en
Publication of CN113657375A publication Critical patent/CN113657375A/en
Application granted granted Critical
Publication of CN113657375B publication Critical patent/CN113657375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a bottled object text detection method based on a 3D point cloud. Three-dimensional reconstruction is carried out on the acquired bottled product image sequence to generate 3D point cloud data of the curved surface bottled product; in order to improve the expression capability of the character features, on the basis of the 3D point cloud space coordinate features, RGB color features and SWT stroke width features for distinguishing significance on bottled products are fused; in order to apply the existing image segmentation technology to the 3D point cloud, mapping 3D point cloud data to a pseudo image by adopting a drawing technology, and then carrying out text example segmentation on the pseudo image based on a U-Net network; the method provided by the invention can be used for accurately detecting the curved surface characters on the bottled objects commonly existing in pharmacies, supermarkets, cosmetic shops and the like, and experimental results prove that the method provided by the invention has the accuracy of detecting the characters on the curved surface bottled products.

Description

Bottled object text detection method based on 3D point cloud
Technical Field
The invention belongs to the technical field of image processing, and relates to a bottled object text detection method based on a 3D point cloud.
Background
With the development of deep learning theory and computer vision technology, the text detection technology in natural scenes is widely applied in the aspects of automatic navigation, product identification, language translation and the like. The existing scene text detection method can accurately detect characters in any direction, any size and any shape in a natural scene to a certain extent, but has poor text detection effect on bottled products commonly existing in pharmacies, supermarkets, cosmetic shops and the like. Aiming at the problem that the existing scene text detection method cannot accurately detect curved text by utilizing the 2D information of the image, the bottled object text detection method based on the 3D point cloud needs to be provided, so that the curved text on the bottled object can be effectively detected.
Disclosure of Invention
The invention aims to provide a bottled object text detection method based on 3D point cloud, which solves the problem that the existing scene text detection method cannot accurately detect the text on a curved bottled object by using the 2D information of an image, and improves the performance of a text detection algorithm on the curved bottled object.
The technical scheme adopted by the invention is that the bottled object text detection method based on the 3D point cloud specifically comprises the following steps:
Step 1, defining a total number of bottled articles variable N obj, defining a pseudo-image set variable Pimg, defining a bottled article number variable N obj, pimg to initialize to an empty set, pimg =null, and setting N obj to 1, namely N obj =1;
step 2, for the nth obj bottled objects Performing multi-view image acquisition to obtain a curved surface scene image sequence img= { I 1,…,Ik,…,IK }, wherein K represents the number of acquired multi-view images;
step 3, adopting a 3D point cloud generation method (OpenMVG + PMVS) for the curved scene image sequence Img to generate 3D point cloud data Wherein N 1 represents the number of 3D points in PS 1, and the projection relationship matrices of each image in the projection relationship matrices H k,PS1 through Img corresponding between the images I k in PS 1 through Img are obtained simultaneously to form a projection relationship matrix set HS, wherein hs= { H 1,…,Hk,…,HK };
Step 4, performing downsampling processing on the point cloud data PS 1 to obtain sampled point cloud data And sample labeling is performed on the point cloud data PS 2, wherein N 2 represents the number of 3D points in PS 2, and the points/>The corresponding spatial position is characterized by/>Wherein/>Respectively represent 3D points/>X, y and z coordinate values of (a);
Step 5, randomly extracting an image I k from the Img, and obtaining a corresponding 2D point set of the point cloud data PS 2 in the 2D image I k according to the projection relation matrix H k
Step 6, find the points in the 2D Point set PI k in image I k RGB color features/>And stroke width feature/>Wherein/>And/>Respectively represent points/>R channel value, G channel value and B channel value;
step 7, for 3D points Spatial location features/>2D Point/>RGB color features of (c)And stroke width feature/>Feature fusion is carried out to generate points/>Is a fusion feature of (2)
Step 8, calling the library function spring_layout () in Networkx package in Python programming language to map the point cloud data PS 2, namely mapping the points in PS 2 to 2D grid pseudo imageWherein the pixel points/>Is characterized by fusion features/>, corresponding to 3D pointsHandle/>Additional into the set Pimg of pseudo-images, i.e./>
Step 9, judging whether N obj is greater than or equal to N obj, if N obj≥Nobj, entering step 10; otherwise, n obj=nobj +1 returns to the step 2;
step 10, taking a pseudo image set Pimg as input, and training by adopting a multi-scale U-Net network to obtain a MSUnet network model M MSUnet;
Step 11, inputting bottled objects obj ', executing step 2, collecting K' multi-viewpoint images for obj 'to obtain a curved surface scene image sequence Img' = { I '1,…,I′k′,…,I′K′ }, executing step 3, and generating 3D point cloud data PS' 1,PS′1 of obj 'and an Img' projection relation matrix set HS ', HS' = { H '1,…,H′k′,…,H′K′ }, by using a OpenMVG + PMVS method for Img';
Step 12, taking PS ' 1, img ' and HS ' as inputs, and executing steps 4-8 to obtain a pseudo image
Step 13, pseudo image is obtainedInto MSUnet network model M MSUnet, output all text instance classification results CL= { CL 1,…,clc,…,clC }, and text instance classification Score score= { sc 1,…,scc,…,scC }, whereC represents the total number of text examples, nc represents the number of 3D points in class C in CL, and/>Represents the nc 3D point in cl c,/>A classification Score representing the nc 3D point in Score;
step 14, performing refinement adjustment on the CL according to a refinement adjustment mechanism to obtain an adjusted point cloud classification result CL '= { CL' 1,…,cl′c,…,cl′C }, wherein Nc 'represents the number of 3D points in class c in CL';
Step 15, defining an image number counter k ', and initializing k' =1;
Step 16, according to H 'k′ in the projection relation matrix set HS', calculating a 2D classification point set cp= { CP 1,…,cpc,…,cpC } corresponding to CL 'in the image I' k′, wherein a calculation formula of the 2D point CP c is CP c=H′k′×cl′c;
Step 17, executing a text filling algorithm, performing text filling on the 2D classification point set CP in the image I 'k′ to obtain a text instance classification result of the image I' k′, and simultaneously outputting all the text instance external polygonal frame sets
Step 18, judging whether K 'is less than or equal to K', if K 'is less than or equal to K', K '=k' +1, returning to step 16, otherwise, ending the procedure.
The invention is also characterized in that:
The specific process of the step 4 is as follows:
The specific process of the downsampling in the step 4 is as follows: opening point cloud processing software CloudCompare V2.6.3, clicking a button Open file on a toolbar, and loading 3D point cloud data PS 1; clicking a button Delete on the toolbar to manually remove irrelevant background points on non-bottled objects in the 3D point cloud data PS 1 to obtain interesting point cloud data Wherein N '1 represents the number of 3D points in PS'; clicking a button Clean on the toolbar, setting filter parameters MEAN DISTANCE and nSigma in a pull-down menu of the button Clean, and performing SOR filter operation; clicking a button Subsample on the toolbar, setting a space sampling distance parameter space and a sampling point number N 2 in a drop-down menu of the button Subsample, and performing point cloud down-sampling operation to obtain point cloud data/>Wherein the point/>Represents the nth 2 sampling points, 1 is less than or equal to n 2≤N2,/>Is characterized by/>
In the step 4, the specific process of sample labeling on the 3D point cloud data PS 2 is as follows: clicking a button segment on a toolbar of the point cloud processing software CloudCompare V2.6.3, sequentially manually framing point clouds of each text instance in the point cloud data PS 2 by using a mouse according to the sequence from top to bottom and from left to right, clicking a button Add constant SF on the toolbar, adding a tag value label to the framed text instance point cloud data, framing all text instances in the point cloud data PS 2 and adding tag values, clicking a button Merge multiple clouds on the toolbar, and merging all text instance point cloud data selected by the frame in PS 2 and non-text instance point cloud data on a bottled object into marked point cloud data PS LA={PS0,PS1,…,PSl,…,PSL, wherein PS 0 is the non-text instance point cloud data on the bottled object, PS l represents the first text instance point cloud data, L is the total number of text instances in the 3D point cloud data PS 2, and the tag value label=l of PS l.
The specific process of the step 5 is as follows: randomly extracting an image I k from the image sequence Img, and calculating a 2D point set corresponding to the point cloud data PS 2 in the image I k according to H k in the projection relation matrix set HS corresponding to the image I k The specific calculation formula is as follows:
wherein: d represents a 3D point Distance to the camera.
The specific steps of the step 6 are as follows:
Step 6.1, for 2D points in image I k Invoking image library functions in PIL packages in Python programming languageExtraction Point/>R, G, B channel value of (c) as RGB color characteristic/>, of the pointWherein/>And/>Respectively represent 2D points/>Is the abscissa and ordinate of (2);
Step 6.2, calling the stroke width transformation library function swttransform () in SWTloc package in Python programming language to the image I k to obtain the stroke width values of all the pixel points in the image I k, 2D point The stroke width value of (2) is 2D pointSWT Stroke Width characterization of/>
The specific process of the step 7 is as follows: for any point in the 3D point cloud data PS 2 Coordinate features/>RGB color characterization/>SWT Stroke Width characterization/>Performing serial fusion according to columns to obtain a point/>Fused features/>
The specific steps of the step 8 are as follows:
Step 8.1, taking 3D point cloud data PS 2 as input, setting the number of clusters NL and the number of cluster iteration IT, calling a cluster library function KMeans (), in a Scikit-learn package, in a Python programming language, and initially clustering the point cloud data PS 2 to obtain NL initial cluster centers Cen '= { ce' 1,…,ce′nl,…,ce′NL }, and a distance matrix from each point to a cluster center point
Step 8.2, using PS 2、Dist、N2 and NL as input, adopting graph_cut algorithm to refine and divide the initial clustering result, obtaining a cluster center point coordinate set Cen= { ce 1,…,cenl,…,ceNL } after dividing, and a point set in the NL-th classWhere ce nl denotes the center point coordinates of the nl-th class,/>Represents the knl th point within the nl class, knl represents the number of points in Cint nl;
Step 8.3, calling a distance function pdist () in Scipy packages in the Python programming language, calculating Euclidean distance { Dis 1,…,DisNL×NL } between the coordinates of each cluster center point in Cen, and calling a squareform () function in Scipy packages in the Python programming language to convert { Dis 1,…,DisNL×NL } into a matrix form to obtain a cluster center point distance matrix Dcc NL×NL;
step 8.4, constructing an undirected graph G c by taking the Dcc NL×NL as input, carrying out first-level graph drawing on G c to generate a first-level 2D Grid graph Grid 4 with the size of Wg×Wg, Wherein/>Represents the nl-th Grid point in Grid 4,/> And/>Respectively representing the abscissa and the ordinate of the nl-th Grid point in the 2D Grid map Grid 4;
Step 8.5, calling a distance function pdist () in a Scipy package in a Python programming language, calculating Euclidean distance { Dis '1,…,Dis′NL×NL } between points in a point set Cint nl in the nl-th class, and calling a squareform () function in a Scipy package in the Python programming language to convert { Dis' 1,…,Dis′NL×NL } into a matrix form to obtain a distance matrix of points in the cluster
Step 8.6, connectingAs input, a two-level 2D mesh map/> of size Wg×Wg is generated according to the method of step 8.4 Wherein/>An INL grid point in the nl class is represented, and INL represents the number of points in the nl class;
Step 8.7, call OpenCV library function CV2.restore () will each Grid point in Grid 4 Block/> enlarged to Wg×Wg sizeThe Grid 4 with the enlarged size is assigned to Grid 5,Grid5, and the Grid consists of Wg×Wg blocks, wherein the size of each block is Wg×Wg;
step 8.8, the second-level 2D grid diagram of the nl-th class Corresponding blocks/>, embedded in Grid 5 in sequenceGrid 5 was defined as the 2D pseudo image/>, of the nth obj bottled itemI.e./>
The specific steps of step 10 are as follows:
Step 10.1, designing MSUnet a network structure;
step 10.2, defining MSUnet a loss function of the network multi-classification task:
Wherein: n represents the number of training samples; c represents the number of categories; y ic is a sample class identifier, if the class of the i-th sample is c, y ic =1, otherwise y ic=0;pic represents the probability that the i-th sample is predicted to be class c;
step 10.3, labeling MSUnet sample labels of the network: 2D grid pseudo-image The label of each non-blank position pixel in the map is the corresponding point/>, in the point cloud data PS 2 A category label value label of (1), a pixel label value label=0 of a blank position;
Step 10.4, training MSUnet the network.
The specific steps of step 15 are as follows:
Step 15.1, defining an adjusted text instance classification result CL ', defining a text instance classification result counter c, CL ' initialized to an empty set, CL ' =null, c initialized to 1, c=1;
Step 15.2, taking out the c-th classification result CL c,clc={x1,…,xi,…,xNP from the text instance classification result cl= { CL 1,…,clc,…,clC }, wherein x i is the i-th point in CL c, and NP is the total number of points in CL c;
Step 15.3, for each point x i in cl c, calculate the distance of point x i to the other points in cl c:
dij=||xi-xj||2,xi∈clc,xj∈clc,i≠j
Step 15.4, setting super parameter km, selecting the nearest distance between km x i and other points Find collection/>Mean value d i,/>, of all elements in (1)All d i constitute the set d 1,…,di,…,dNP;
step 15.5, calculate mean and variance of { d 1,…,di,…,dNP } stddev:
Step 15.6, setting a super parameter lambda, and calculating a threshold value:
thre=mean+λ×stddev
Step 15.7, judging whether x i is an outlier, if d i is more than thre, x i is the outlier, eliminating x i from the collection cl c, and assigning cl c after all outliers are eliminated to cl' c;
Step 15.8, determining whether C is greater than C, if C > C, preserving CL ', otherwise, c=c+1, adding CL' c to CL ', i.e., CL' =cl '+cl' c, and returning to step 15.2.
The specific steps of step 17 are as follows:
In step 17.1, an image I' k′, a classification point set cp= { CP 1,…,cpc,…,cpC }, a text Score set score= { sc 1,…,scc,…,scC } of CP corresponding points, a threshold variable T is defined, wherein, Represents the nz c th point in cp c,A score representing the NZ c th point in sc c, NZ c representing the number of points in cp c;
Step 17.2, filtering the points with low Chinese score in the CP according to the threshold T, namely: if (if) Then delete the point/>, in cp c Giving the variable TR, TR= { TR 1,…,trc,…,trC }, where/>, to the CP after filtering all points with low text score Represents NTR c th point in tr c, and NTR c represents the number of points in tr c;
Step 17.3, calculating center points of each category in the TR, and forming a center point set TC, TC= { cen 1,…,cenc,…,cenC }, wherein cen c=mean(trc), and mean () represents a mean function;
Step 17.4, defining a text polygon outer bounding box set Poly k′,Polyk′ in an image I 'k′, initializing to be an empty set, poly k′ =null, defining all pixels in an image B k,Bk with the same size as the image I' k′ to be assigned 0 values, and initializing a text instance class counter c to be 1, wherein c=1;
Step 17.5, using cen c as an initial seed point, calling a flooding filling library function floodfill () in OpenCV to fill tr c to obtain a filled point set trfill c, and assigning 1 value to the pixel of the corresponding point in trfill c in B k;
Step 17.6, calling OpenCV library function cv2.morphyox () to perform 5 times of open operation processing on B k to obtain an image Mopen, and calling OpenCV library function cv2.dialite () to perform 10 times of expansion processing on image Mopen to obtain an image Mdilate;
Step 17.7, calling the connected region ConectedRegion of the image Mdilate by the OpenCV library function cv2.connectiedcomponents withstats (), calling the outline ContourPS of the connected region ConectedRegion by the OpenCV library function cv2.findcoutour (), and calling the convex hull vertex set of the outline ContourPS by the OpenCV library function cv2.convexhull () function, namely the polygon vertex set of the c-th text example Polygonal peripheral frame/>, constituting the c-th text instance in image I' k′ I.e.Where NPL represents the number of vertices in CNT c, each vertex in CNT c is plotted in image I' k′;
step 17.8, judging whether C is less than or equal to C, if C is less than or equal to C, c=c+1, and if C is less than or equal to C Added to Poly k′, i.eReturning to the step 17.5, otherwise, entering the step 17.9;
Step 17.9, outputting a set Poly k′ of all text polygon peripheral frames on the image I' k′,
The beneficial effects of the invention are as follows: the existing scene text detection method can accurately detect characters in any direction, size and shape in a natural scene to a certain extent, but has poor detection effect on curved characters on bottled objects commonly existing in pharmacies, supermarkets, cosmetic shops and the like.
Drawings
FIG. 1 is a schematic flow chart of a bottled object text detection method based on a 3D point cloud;
FIG. 2 is a schematic diagram of a 2D pseudo image generation flow based on drawing in a bottled object text detection method based on 3D point cloud;
FIG. 3 is a schematic diagram of a process of embedding a secondary grid pattern into a primary grid pattern in the drawing of the method for detecting the characters of the bottled object based on the 3D point cloud;
Fig. 4 is a schematic diagram of a network structure of MSUnet in a method for detecting characters of a bottled object based on a 3D point cloud according to the present invention;
FIG. 5 is a schematic diagram of a MSUnet network training process in a 3D point cloud-based bottled object text detection method according to the present invention;
FIG. 6 is a schematic diagram of a 3D point cloud refinement adjustment flow in a bottled object text detection method based on a 3D point cloud;
FIG. 7 is a schematic flow chart of a text filling algorithm in a bottled object text detection method based on 3D point cloud;
FIG. 8 is an image of a bottled product in an embodiment of a 3D point cloud based bottled object text detection method of the present invention;
FIG. 9 is an image of another bottled product in an embodiment of a 3D point cloud based bottled object text detection method of the present invention;
FIG. 10 is a diagram showing the results of text detection in an image of the bottled object of FIG. 8 using the method of the present invention, with white boxes being text boxes;
Fig. 11 shows a display of the result of text detection in an image of the bottled object of fig. 9 using the method of the present invention, with white boxes being text boxes.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention discloses a bottled object text detection method based on a 3D point cloud, which specifically comprises the following steps:
Step 1, defining a total number of bottled articles variable N obj, defining a pseudo-image set variable Pimg, defining a bottled article number variable N obj, pimg to initialize to an empty set, pimg =null, and setting N obj to 1, namely N obj =1;
step 2, for the nth obj bottled objects Performing multi-view image acquisition to obtain a curved scene image sequence img= { I 1,…,Ik,…,IK }, wherein K represents the number of acquired multi-view images, and in the embodiment of the invention, k=90;
Step 3, generating 3D point cloud data using the 3D point cloud generation method OpenMVG + PMVS proposed in the text of "Openmvg: open multiple view geometry" proposed in the conference International Workshop on Reproducible RESEARCH IN PATTERN Recognition (IWRRPR) by MoulonP, monasse P et al in 2016 on the curved surface scene image sequence img Wherein N 1 represents the number of 3D points in PS 1, and the projection relationship matrices of each image in the projection relationship matrices H k,PS1 through Img corresponding between the images I k in PS 1 through Img are obtained simultaneously to form a projection relationship matrix set HS, wherein hs= { H 1,…,Hk,…,HK };
The specific process of the downsampling in the step 4 is as follows: opening point cloud processing software CloudCompare V2.6.3, clicking a button Open file on a toolbar, and loading 3D point cloud data PS 1; clicking a button Delete on the toolbar to manually remove irrelevant background points on non-bottled objects in the 3D point cloud data PS 1 to obtain interesting point cloud data Wherein N '1 represents the number of 3D points in PS'; clicking a button Clean on a toolbar, setting filter parameters MEAN DISTANCE and nSigma in a pull-down menu of the button Clean, and performing SOR filter operation, wherein MEAN DISTANCE =8 and nsima=1.5 in the embodiment of the invention; clicking a button Subsample on the toolbar, setting a space sampling distance parameter space and a sampling point number N 2 in a drop-down menu of the button Subsample, and performing point cloud down-sampling operation to obtain point cloud data/>Wherein the point/>Represents the nth 2 sampling points, 1 is less than or equal to n 2≤N2,/>Is characterized by/>Space=1.585 and n 2 =8192 in the embodiment of the present invention;
In the step 4, the specific process of sample labeling on the 3D point cloud data PS 2 is as follows: clicking a button segment on a toolbar of point cloud processing software CloudCompare V2.6.3, sequentially manually framing point clouds of each text instance in point cloud data PS 2 by using a mouse in sequence from top to bottom and from left to right, clicking a button Add constant SF on the toolbar, adding tag values label to the framed text instance point cloud data, framing all text instances in point cloud data PS 2 and adding tag values, clicking a button Merge multiple clouds on the toolbar, and merging all text instance point cloud data selected by a frame in PS 2 and non-text instance point cloud data on a bottled object into marked point cloud data PS LA={PS0,PS1,…,PSl,…,PSL }, wherein PS 0 is non-text instance point cloud data on the bottled object, PS l represents the first text instance point cloud data, L is the total number of text instances in 3D point cloud data PS 2, and the tag value label=l of PS l;
The specific process of the step 5 is as follows: randomly extracting an image I k (K is more than or equal to 1 and less than or equal to K) from an image sequence Img, and calculating a 2D point set corresponding to point cloud data PS 2 in the image I k according to H k in a projection relation matrix set HS corresponding to the image I k The specific calculation formula is as follows:
wherein: d represents a 3D point Distance to camera, in the embodiment of the invention, get/>
The specific steps of the step 6 are as follows:
Step 6.1, for 2D points in image I k Invoking image library functions in PIL packages in Python programming languageExtraction Point/>R, G, B channel value of (c) as RGB color characteristic/>, of the pointWherein/>And/>Respectively represent 2D points/>Is the abscissa and ordinate of (2);
Step 6.2, obtaining the stroke width values of all pixels in image I k, 2D points, using the text detection method based on the stroke width transformation set forth in the text entitled "DETECTING TEXT IN NATURAL SCENES WITH stroke width transform" set forth in the Computer Society Conference on Computer Vision AND PATTERN Recognition (CSCCVPR) conference of B.Epshtein, E.Ofek et al, 2010 The stroke width value of (2) is 2D point/>SWT Stroke Width characterization of/>
The specific process of the step 7 is as follows: for any point in the 3D point cloud data PS 2 Coordinate features/>RGB color characterization/>SWT Stroke Width characterization/>Performing serial fusion according to columns to obtain a point/>Fused features/>
The drawing process of the step 8 is shown in fig. 2, and the specific implementation steps are as follows:
Step 8.1, taking 3D point cloud data PS 2 as input, setting the number of clusters NL and the number of cluster iteration IT, calling a cluster library function KMeans (), in a Scikit-learn package, in a Python programming language, and initially clustering the point cloud data PS 2 to obtain NL initial cluster centers Cen '= { ce' 1,…,ce′nl,…,ce′NL }, and a distance matrix from each point to a cluster center point Nl=128, it=100 in the embodiment of the present invention;
Step 8.2, using PS 2、Dist、N2 and NL as input, adopting graph_cut algorithm to refine and divide the initial clustering result, obtaining a cluster center point coordinate set Cen= { ce 1,…,cenl,…,ceNL } after dividing, and a point set in the NL-th class Where ce nl denotes the center point coordinates of the nl-th class,/>Represents the knl th point in the NL class, knl represents the number of points in Cint nl, NL is 1.ltoreq.nl.ltoreq.NL, and knl.ltoreq. Knl;
Step 8.3, calling a distance function pdist () in Scipy packages in the Python programming language, calculating Euclidean distance { Dis 1,…,DisNL×NL } between the coordinates of each cluster center point in Cen, and calling a squareform () function in Scipy packages in the Python programming language to convert { Dis 1,…,DisNL×NL } into a matrix form to obtain a cluster center point distance matrix Dcc NL×NL;
step 8.4, constructing an undirected graph G c by taking the Dcc NL×NL as input, carrying out first-level graph drawing on G c to generate a first-level 2D Grid graph Grid 4 with the size of Wg×Wg, Wherein/>Represents the nl-th Grid point in Grid 4,/> And/>Respectively representing the abscissa and the ordinate of the nl-th Grid point in the 2D Grid map Grid 4;
Step 8.4.1, using Dcc NL×NL as input, calling graph construction function from_ numpy _matrix () in Networkx library in Python programming language to construct an undirected graph G c,Gc = (V, E), wherein V represents cluster center point coordinate set Cen, E represents distance matrix Dcc NL×NL between each vertex,
Step 8.4.2, taking the graph G c as input, calling a graph drawing function spring_layout () in a Networkx library in the Python programming language to obtain a 2D Grid map Grid 1 of the graph G c,Wherein the method comprises the steps ofRepresents the nl-th Grid point in Grid 1,/> And/>Respectively representing the abscissa and the ordinate of the nl-th Grid point in the 2D Grid map Grid 1;
Step 8.4.3, scaling the 2D Grid map Grid 1 to Grid map Grid 2 of Wg by Wg size by Scale factor Scale, Wherein/>Represents the nl-th Grid point in Grid 2, And/>The Scale factor Scale is calculated as follows, representing the abscissa and ordinate, respectively, of the nl-th Grid point in the 2D Grid map Grid 2: calculating Grid points/>, in Grid 1 And/>Europe distance Disg ij,/>I is more than or equal to 1 and less than or equal to Wg, j is more than or equal to 1 and less than or equal to Wg, and the Euclidean distance matrix between Grid points in Grid 1 is DisG,DisG={Disgij},scale1=1/min(DisG),scalex=(Wg-2)/(max(Grid1.x)-min(Grid1.x)),scaley=(Wg-2)/(max(Grid1.y)-min(Grid1.y)),Scale=min(scale1,scalex,scaley),/> Wherein max () and min () represent maximum and minimum functions, respectively, wg=16 in the embodiment of the present invention;
Step 8.4.4, performing coordinate rounding operation on the Grid points in the 2D Grid map Grid 2 to obtain a Grid map with rounded coordinates I.e./> Where int () represents a rounding function;
In step 8.4.5, position adjustment is performed on Grid points with coincident coordinates in Grid 3 to generate a final Grid graph Grid 4, which specifically includes: if it is And/>Coordinate coincidence, take out/>Will/>Placing on the unassigned neighborhood grid points;
Step 8.5, calling a distance function pdist () in a Scipy package in a Python programming language, calculating Euclidean distance { Dis '1,…,Dis′NL×NL } between points in a point set Cint nl in the nl-th class, and calling a squareform () function in a Scipy package in the Python programming language to convert { Dis' 1,…,Dis′NL×NL } into a matrix form to obtain a distance matrix of points in the cluster
Step 8.6, connectingAs input, a two-level 2D mesh map/> of size Wg×Wg is generated according to the method of step 8.4 Wherein/>An INL grid point in the nl class is represented, and INL represents the number of points in the nl class;
Step 8.7, call OpenCV library function CV2.restore () will each Grid point in Grid 4 Block/> enlarged to Wg×Wg sizeThe Grid 4 with the enlarged size is assigned to Grid 5,Grid5, and the Grid consists of Wg×Wg blocks, wherein the size of each block is Wg×Wg;
step 8.8, the second-level 2D grid diagram of the nl-th class Corresponding blocks/>, embedded in Grid 5 in sequenceGrid 5 was defined as the 2D pseudo image/>, of the nth obj bottled itemI.e./>
Step 9, judging whether N obj is greater than or equal to N obj, if N obj≥Nobj, entering step 10; otherwise, n obj=nobj +1 returns to the step 2;
The specific steps for training using a Multi-scale U-Net network (MSUnet) in step 10 are as follows:
Step 10.1, the structural design of the MSUnet network structure is shown in fig. 4: the total layer number of MSUnet network structures is 18, including 1 input layer, 5 convolution layers, 6 series connection layers, 2 maximum pooling layers, 2 upsampling layers, 1 full connection layer and 1 output layer, the specific connection sequence of MSUnet network structures is: input layer-convolution layer 1-series layer 1-maximum pooling layer 1-convolution layer 2-series layer 2-maximum pooling layer 2-full connection layer-upsampling layer 1-series layer 3-convolution layer 3-series layer 4-upsampling layer 2-series layer 5-convolution layer 4-series layer 6-convolution layer 5-output layer, the categories and numbers of network layers in MSUnet network structure are shown in table 1, wherein S is the category of network layer, n is the number of network layer;
table 1 MSUnet categories and numbers of network layers in network structure
The input layer is the 2D mesh pseudo image from step 8 generationPseudo image/>Is 256×256×7 in size;
Convoluting layer 1 pair of pseudo images Features are extracted in parallel by using convolution kernels with the sizes of 1×1 and 3×3 respectively, and two feature maps with the sizes of 256×256×64 are output; /(I)
The serial layer 1 splices two feature maps from the convolution layer 1 together, and outputs a feature map with the size of 256×256×128;
The maximum pooling layer 1 performs space downsampling on the features from the serial layers and outputs a feature map with the size of 16 multiplied by 128;
the convolution layer 2 uses convolution kernels with the sizes of 1×1 and 3×3 to extract the features from the maximum pooling layer 1 in parallel, and outputs two feature maps with the sizes of 16×16×128;
The serial layer 2 splices two feature images from the rolling layer 2 together and outputs a feature image with the size of 16 multiplied by 256;
The largest pooling layer 2 performs space downsampling on the features from the serial layer 2 and outputs a feature map with the size of 1 multiplied by 256;
In an implementation, the activation function used by the fully-connected layer and each convolution layer is a linear rectification function (RECTIFIED LINEAR Unit, reLU), The ReLU () function is a piecewise linear function, changing all negative values to 0, while positive values are unchanged, and is an open source activation function commonly used in artificial neural networks;
The input of the full connection layer is the output characteristic of the maximum pooling layer 2, and a characteristic diagram with the size of 1 multiplied by 256 is output;
the up-sampling layer 1 carries out linear interpolation on the feature map from the full-connection layer and outputs a feature map with the size of 16 multiplied by 256;
The series layer 3 performs series splicing on the feature map from the series layer 2 and the feature map of the up-sampling layer 1, and outputs the feature map with the size of 16 multiplied by 512;
the convolution layer 3 uses convolution kernels with the sizes of 1×1 and 3×3 to extract the features from the serial layer 3 in parallel, and outputs two feature maps with the sizes of 16×16×128;
The tandem layer 4 performs tandem splicing on the two feature graphs from the convolution layer 3, and outputs a feature graph with the size of 16×16×256;
the up-sampling layer 2 carries out linear interpolation on the feature map from the serial layer 4 and outputs a feature map with the size of 256 multiplied by 256;
the series layer 5 performs series splicing on the feature map from the series layer 1 and the feature map of the up-sampling layer 2, and outputs the feature map with the size of 256×256×384;
The convolution layer 4 extracts features in parallel by using convolution kernels with the sizes of 1×1 and 3×3 for the features from the serial layer 5, and outputs two feature maps with the sizes of 256×256×64;
The series layer 6 performs series splicing on two feature graphs from the convolution layer 4, and outputs a feature graph with the size of 16×16×256;
The convolution layer 5 performs convolution operation using a convolution kernel of 1×1 size to output a feature map of 256×256×50;
The output layer uses softmax activation function to activate the feature map from the convolution layer 5, and outputs the feature map with the size of 256×256×50;
the feature map sizes of the inputs and outputs of each network layer in MSUnet network structure are shown in table 2:
network layer parameters of Table 2 MSUnet
Step 10.2, defining a loss function of the MSUnet network multi-classification task as follows:
Wherein: n represents the number of training samples; c represents the number of categories; y ic is a sample class identifier, if the class of the i-th sample is c, y ic =1, otherwise y ic=0;pic represents the probability that the i-th sample is predicted to be class c;
step 10.3, labeling MSUnet sample labels of the network: 2D grid pseudo-image The label of each non-blank position pixel in the map is the corresponding point/>, in the point cloud data PS 2 A category label value label (1 is less than or equal to label is less than or equal to C), and a pixel label value label=0 at a blank position;
Step 10.4, MSUnet network training is shown in FIG. 5, and the specific steps are as follows:
Step 10.4.1, inputting a pseudo-image set Pimg;
Step 10.4.2, setting MSUnet network model training parameters, setting a learning rate variable lr, a training iteration total number variable epoch and a batch data size variable batch, defining a training iteration number variable step and the like, and in specific implementation, specifically setting as shown in table 3:
table 3 MSUnet network model training parameter specification
Parameters (parameters) Parameter description Value taking
lr Learning rate 0.0001
display The loss function value is displayed on the screen every other iteration 20
batch Size of data per batch 4
epoch Total number of training iterations 200
step Initial value of training iteration number variable 1
Step 10.4.3, randomly extracting batch pseudo images from the pseudo image set Pimg each time, and sending the batch pseudo images into a MSUnet network for MSUnet network training;
Step 10.4.4, calculating the absolute difference Dif of the loss function L of two iterations in the MSUnet network training process, if (Dif < Th 1) | (step > epoch), in the embodiment of the invention, th 1=0.002, the model converges, and storing MSUnet network model M MSUnet, and ending the training; otherwise step = step +1, the adam optimizer proposed in 2015 of Diederik p. Kingma, jimmy Ba et al at International Conference on Learning Representations (ICLR) conference, entitled "adam: a method for stochastic optimization", is used to reverse the correction of the weight coefficients of each network layer in the training model, returning to step 10.4.3;
step 11, defining an image number counter k ', and initializing k' =1;
Step 12, inputting bottled articles obj ' common in daily life scenes, performing multi-view image acquisition on obj ' in step 2 to obtain a curved scene image sequence Img ' = { I ' 1,…,I′k′,…,I′K′ }, performing the OpenMVG + PMVS method in step 3 on Img ', and generating 3D point cloud data PS 1 ' and a projection relation matrix set HS ' = { H ' 1,…,H′k′,…,H′K′ }, wherein K ' = 90 in the embodiment of the invention;
Step 13, executing steps 4-8 on PS' 1 to obtain a pseudo image
Step 14, willInto MSUnet network model M MSUnet, output all text instance classification results CL= { CL 1,…,clc,…,clC }, and text instance classification Score score= { sc 1,…,scc,…,scC }, whereC represents the total number of lines of text, nc represents the number of points in class C in CL,/>Represents the nc-th point in cl c,/>A classification Score representing the nc-th point in Score;
The refinement and adjustment flow of the point cloud segmentation result in the step 15 is shown in fig. 6, and the specific steps are as follows:
Step 15.1, defining an adjusted text instance classification result CL ', defining a text instance classification result counter c, CL ' initialized to an empty set, CL ' =null, c initialized to 1, c=1;
Step 15.2, taking out the c-th classification result CL c,clc={x1,…,xi,…,xNP from the text instance classification result cl= { CL 1,…,clc,…,clC }, wherein x i is the i-th point in CL c, and NP is the total number of points in CL c;
Step 15.3, for each point x i in cl c, calculating its distance d ij=||xi-xj||2,xi∈clc,xj∈clc, i+.j to the other points;
Step 15.4, setting up super parameter km, selecting the nearest distance from km x i to other points in cl c Find collection/>Mean value d i,/>, of all elements in (1)All d i constitute the set d 1,…,di,…,dNP, km=200 in the embodiment of the invention;
Step 15.5, calculate the mean and variance of { d 1,…,di,…,dNP } stddev,
Step 15.6, setting a super parameter lambda, and calculating a threshold value: thre=mean+λ× stddev, λ=1.5 in the embodiment of the present invention;
Step 15.7, judging whether x i is an outlier, if d i is more than thre, x i is the outlier, eliminating x i from the collection cl c, and assigning cl c after all outliers are eliminated to cl' c;
Step 15.8, judging whether C is greater than C, if C is greater than C, preserving CL ', otherwise, c=c+1, adding CL' c into CL ', i.e., CL' =cl '+cl' c, returning to step 15.2;
Step 16, calculating a classification point set cp= { CP 1,…,cpc,…,cpC } corresponding to CL 'in the image I' k′ according to H 'k′ in the projection relation matrix set HS', wherein the calculation formula of the point CP c is CP c=H′k′×cl′c;
the text filling algorithm flow in step 17 is shown in fig. 7, and the specific steps are as follows:
In step 17.1, an image I' k′, a classification point set cp= { CP 1,…,cpc,…,cpC }, a text Score set score= { sc 1,…,scc,…,scC } of CP corresponding points, a threshold variable T is defined, wherein, Represents the nz c th point in cp c,/>A score representing the NZ c th point in sc c, NZ c representing the number of points in cp c;
Step 17.2, filtering the points with low Chinese score in the CP according to the threshold T, namely: if (if) Then delete the point/>, in cp c Giving the variable TR, TR= { TR 1,…,trc,…,trC }, where/>, to the CP after filtering all points with low text score Represents the NTR c th point in tr c, NTR c represents the number of points in tr c, and t=0.75 in the embodiment of the present invention;
Step 17.3, calculating center points of each category in the TR, and forming a center point set TC, TC= { cen 1,…,cenc,…,cenC }, wherein cen c=mean(trc), and mean () represents a mean function;
Step 17.4, defining a text polygon outer bounding box set Poly k′,Polyk′ in an image I 'k′, initializing to be an empty set, poly k′ =null, defining all pixels in an image B k,Bk with the same size as the image I' k′ to be assigned 0 values, and initializing a text instance class counter c to be 1, wherein c=1;
Step 17.5, using cen c as an initial seed point, calling a flooding filling library function floodfill () in OpenCV to fill tr c to obtain a filled point set trfill c, and assigning 1 value to the pixel of the corresponding point in trfill c in B k;
Step 17.6, calling OpenCV library function cv2.morphyox () to perform 5 times of open operation processing on B k to obtain an image Mopen, and calling OpenCV library function cv2.dialite () to perform 10 times of expansion processing on image Mopen to obtain an image Mdilate;
Step 17.7, calling the connected region ConectedRegion of the image Mdilate by the OpenCV library function cv2.connectiedcomponents withstats (), calling the outline ContourPS of the connected region ConectedRegion by the OpenCV library function cv2.findcoutour (), and calling the convex hull vertex set of the outline ContourPS by the OpenCV library function cv2.convexhull () function, namely the polygon vertex set of the c-th text example Polygonal peripheral frame/>, constituting the c-th text instance in image I' k′ I.e.Where NPL represents the number of vertices in CNT c, each vertex in CNT c is plotted in image I' k′;
step 17.8, judging whether C is less than or equal to C, if C is less than or equal to C, c=c+1, and if C is less than or equal to C Added to Poly k′, i.eReturning to the step 17.5, otherwise, entering the step 17.9;
Step 17.9, outputting a set Poly k′ of all text polygon peripheral frames on the image I' k′,
Step 18, judging whether K 'is less than or equal to K', if K 'is less than or equal to K', K '=k' +1, returning to step 13, otherwise, ending the procedure.
The invention discloses a bottled object text detection method based on 3D point cloud, which aims at solving the problem that curved text cannot be accurately detected by utilizing 2D information of an image in the existing scene text detection method, and introduces 3D point cloud information into a text detection algorithm, wherein the 3D information is used for detecting the curved text for the first time, color features and stroke width features for distinguishing significance on bottled products are fused on the basis of 3D point cloud space coordinate features, fusion features with strong distinguishability are generated, 3D point cloud data are mapped onto a 2D grid graph by adopting a graph drawing technology, channel features of grid points are represented by the fusion features of the point cloud, a pseudo image which can be subjected to target segmentation by utilizing the existing image segmentation algorithm is generated, and experimental results show that the curved text on the bottled products can be accurately detected by adopting the method.
Examples
In the embodiment of the invention, curved text positioning effect test is carried out on the curved bottled object images which are common in life, and subjective and objective evaluation is carried out on the test results respectively.
The subjective effect diagram of text positioning in the embodiment of the invention is shown in fig. 10 and 11:
Inputting any common bottled object in life, and using the method of the invention to detect and test the text instance on the bottled object. Fig. 8 shows an image of a bottled object, and fig. 10 shows a display of the result of text detection of the bottled object in fig. 8 in the image using the method of the present invention, with white boxes being text boxes; fig. 9 shows an image of another bottled object, and fig. 11 shows a display of the result of text detection of the bottled object in fig. 9 in the image using the method of the present invention, with white boxes being text boxes.
As can be seen from the text detection results of fig. 10 and 11, for the curved text examples of "pleasant and alive" and "milk" in fig. 8, the text boundary can be accurately detected by using the method of the invention, and the detected boundary is smoother and closer to the text content, so that the human visual perception of the text boundary is met.
In the embodiment of the invention, 50 common bottled objects in life are collected, characters on the collected bottled objects are detected and tested by adopting the method, and the detection result is objectively evaluated by adopting the following indexes:
① Accuracy (P). The accuracy represents the ratio of the number of detected correct targets to the total number of detected targets.
② Recall (R). The recall represents the ratio of the number of correct targets detected to the total number of truth boxes for all labels.
③ And a harmonic mean (F-measure, F). The harmonic mean is a weighted average of recall and accuracy, so F-measure is a comprehensive measure of the performance of the detection algorithm, the higher the value of the harmonic mean is, the better the performance of the algorithm is, and the calculation expression is:
The character detection performance on the bottled article is shown in table 4:
TABLE 4 detection Performance of text examples on bottled objects
Test object Accuracy rate of Recall rate of recall Harmonic mean
Bottled object 85.9% 77.5% 81.5%
As can be seen from Table 4, the average accuracy, recall and reconciliation average of the text examples on the 50 bottled objects collected by the method of the present invention were 85.9%, 77.5% and 81.5%, respectively, and the objective evaluation results of Table 4 demonstrate the effectiveness of the method of the present invention for the text examples on the bottled objects.
From the subjective and objective results, the method can well detect the curved character examples on the bottled objects, and the detection result shows the high efficiency of detecting the character examples with any shape, size and direction.

Claims (9)

1. A bottled object text detection method based on 3D point cloud is characterized in that: the method specifically comprises the following steps:
Step 1, defining a total number of bottled articles variable N obj, defining a pseudo-image set variable Pimg, defining a bottled article number variable N obj, pimg to initialize to an empty set, pimg =null, and setting N obj to 1, namely N obj =1;
step 2, for the nth obj bottled objects Performing multi-view image acquisition to obtain a curved surface scene image sequence img= { I 1,…,Ik,…,IK }, wherein K represents the number of acquired multi-view images;
Step 3, adopting a 3D point cloud generation method OpenMVG + PMVS to the curved surface scene image sequence Img to generate 3D point cloud data Wherein N 1 represents the number of 3D points in PS 1, and the projection relationship matrices of each image in the projection relationship matrices H k,PS1 through Img corresponding between the images I k in PS 1 through Img are obtained simultaneously to form a projection relationship matrix set HS, wherein hs= { H 1,…,Hk,…,HK };
Step 4, performing downsampling processing on the point cloud data PS 1 to obtain sampled point cloud data And sample labeling is performed on the point cloud data PS 2, wherein N 2 represents the number of 3D points in PS 2, and the points/>The corresponding spatial position is characterized by/>Wherein/>Respectively represent 3D points/>X, y and z coordinate values of (a);
Step 5, randomly extracting an image I k from the Img, and obtaining a corresponding 2D point set of the point cloud data PS 2 in the 2D image I k according to the projection relation matrix H k
Step 6, find the points in the 2D Point set PI k in image I k RGB color features/>And stroke width featureWherein/>And/>Respectively represent points/>R channel value, G channel value and B channel value;
step 7, for 3D points Spatial location features/>2D Point/>RGB color features of (c)And stroke width feature/>Feature fusion is carried out to generate points/>Is a fusion feature of (2)
Step 8, calling the library function spring_layout () in Networkx package in Python programming language to map the point cloud data PS 2, namely mapping the points in PS 2 to 2D grid pseudo imageWherein the pixel points/>Is characterized by fusion features/>, corresponding to 3D pointsHandle/>Additional into the set Pimg of pseudo-images, i.e./>
Step 9, judging whether N obj is greater than or equal to N obj, if N obj≥Nobj, entering step 10; otherwise, n obj=nobj +1 returns to the step 2;
step 10, taking a pseudo image set Pimg as input, and training by adopting a multi-scale U-Net network to obtain a MSUnet network model M MSUnet;
Step 11, inputting bottled objects obj ', executing step 2, collecting K' multi-viewpoint images for obj 'to obtain a curved surface scene image sequence Img' = { I '1,…,I′k′,…,I′K′ }, executing step 3, and generating 3D point cloud data PS' 1,PS′1 of obj 'and an Img' projection relation matrix set HS ', HS' = { H '1,…,H′k′,…,H′K′ }, by using a OpenMVG + PMVS method for Img';
Step 12, taking PS ' 1, img ' and HS ' as inputs, and executing steps 4-8 to obtain a pseudo image
Step 13, pseudo image is obtainedInto MSUnet network model M MSUnet, output all text instance classification results CL= { CL 1,…,clc,…,clC }, and text instance classification Score score= { sc 1,…,scc,…,scC }, whereC represents the total number of text examples, nc represents the number of 3D points in class C in CL, and/>Represents the nc 3D point in cl c,/>A classification Score representing the nc 3D point in Score;
step 14, performing refinement adjustment on the CL according to a refinement adjustment mechanism to obtain an adjusted point cloud classification result CL '= { CL' 1,…,cl′c,…,cl′C }, wherein Nc 'represents the number of 3D points in class c in CL';
Step 15, defining an image number counter k ', and initializing k' =1;
Step 16, according to H 'k′ in the projection relation matrix set HS', calculating a 2D classification point set cp= { CP 1,…,cpc,…,cpC } corresponding to CL 'in the image I' k′, wherein a calculation formula of the 2D point CP c is CP c=H′k′×cl′c;
Step 17, executing a text filling algorithm, performing text filling on the 2D classification point set CP in the image I 'k′ to obtain a text instance classification result of the image I' k′, and simultaneously outputting all the text instance external polygonal frame sets
Step 18, judging whether K 'is less than or equal to K', if K 'is less than or equal to K', K '=k' +1, returning to step 16, otherwise, ending the procedure.
2. The bottled object text detection method based on 3D point cloud as claimed in claim 1, wherein the method is characterized in that: the specific process of the step 4 is as follows:
The specific process of the downsampling in the step 4 is as follows: opening point cloud processing software CloudCompare V2.6.3, clicking a button Open file on a toolbar, and loading 3D point cloud data PS 1; clicking a button Delete on the toolbar to manually remove irrelevant background points on non-bottled objects in the 3D point cloud data PS 1 to obtain interesting point cloud data Wherein N 1 'represents the number of 3D points in PS'; clicking a button Clean on the toolbar, setting filter parameters MEAN DISTANCE and nSigma in a pull-down menu of the button Clean, and performing SOR filter operation; clicking a button Subsample on the toolbar, setting a space sampling distance parameter space and a sampling point number N 2 in a drop-down menu of the button Subsample, and performing point cloud down-sampling operation to obtain point cloud data/>Wherein the point/>Represents the nth 2 sampling points, 1 is less than or equal to n 2≤N2,/>Is characterized by/>
In the step 4, the specific process of sample labeling on the 3D point cloud data PS 2 is as follows: clicking a button segment on a toolbar of the point cloud processing software CloudCompare V2.6.3, sequentially manually framing point clouds of each text instance in the point cloud data PS 2 by using a mouse according to the sequence from top to bottom and from left to right, clicking a button Add constant SF on the toolbar, adding a tag value label to the framed text instance point cloud data, framing all text instances in the point cloud data PS 2 and adding tag values, clicking a button Merge multiple clouds on the toolbar, and merging all text instance point cloud data selected by the frame in PS 2 and non-text instance point cloud data on a bottled object into marked point cloud data PS LA={PS0,PS1,…,PSl,…,PSL, wherein PS 0 is the non-text instance point cloud data on the bottled object, PS l represents the first text instance point cloud data, L is the total number of text instances in the 3D point cloud data PS 2, and the tag value label=l of PS l.
3. The bottled object text detection method based on the 3D point cloud according to claim 2, wherein the method is characterized in that: the specific process of the step 5 is as follows: randomly extracting an image I k from the image sequence Img, and calculating a 2D point set corresponding to the point cloud data PS 2 in the image I k according to H k in the projection relation matrix set HS corresponding to the image I k The specific calculation formula is as follows:
wherein: d represents a 3D point Distance to the camera.
4. The bottled object text detection method based on 3D point cloud as claimed in claim 3, wherein: the specific steps of the step 6 are as follows:
Step 6.1, for 2D points in image I k Invoking image library functions in PIL packages in Python programming languageExtraction Point/>R, G, B channel value of (c) as RGB color characteristic/>, of the pointWherein/>And/>Respectively represent 2D points/>Is the abscissa and ordinate of (2);
Step 6.2, calling the stroke width transformation library function swttransform () in SWTloc package in Python programming language to the image I k to obtain the stroke width values of all the pixel points in the image I k, 2D point The stroke width value of (2) is 2D pointSWT Stroke Width characterization of/>
5. The bottled object text detection method based on the 3D point cloud as claimed in claim 4, wherein the method is characterized by comprising the following steps of: the specific process of the step 7 is as follows: for any point in the 3D point cloud data PS 2 Coordinate features/>RGB color characterization/>SWT Stroke Width characterization/>Performing serial fusion according to columns to obtain a point/>Fused features
6. The bottled object text detection method based on 3D point cloud as claimed in claim 5, wherein the method is characterized in that: the specific steps of the step 8 are as follows:
Step 8.1, taking 3D point cloud data PS 2 as input, setting the number of clusters NL and the number of cluster iteration IT, calling a cluster library function KMeans (), in a Scikit-learn package, in a Python programming language, and initially clustering the point cloud data PS 2 to obtain NL initial cluster centers Cen '= { ce' 1,…,ce′nl,…,ce′NL }, and a distance matrix from each point to a cluster center point
Step 8.2, using PS 2、Dist、N2 and NL as input, adopting graph_cut algorithm to refine and divide the initial clustering result, obtaining a cluster center point coordinate set Cen= { ce 1,…,cenl,…,ceNL } after dividing, and a point set in the NL-th classWhere ce nl denotes the center point coordinates of the nl-th class,/>Represents the knl th point within the nl class, knl represents the number of points in Cint nl;
Step 8.3, calling a distance function pdist () in Scipy packages in the Python programming language, calculating Euclidean distance { Dis 1,…,DisNL×NL } between the coordinates of each cluster center point in Cen, and calling a squareform () function in Scipy packages in the Python programming language to convert { Dis 1,…,DisNL×NL } into a matrix form to obtain a cluster center point distance matrix Dcc NL×NL;
step 8.4, constructing an undirected graph G c by taking the Dcc NL×NL as input, carrying out first-level graph drawing on G c to generate a first-level 2D Grid graph Grid 4 with the size of Wg×Wg, Wherein/>Represents the nl-th Grid point in Grid 4,/> And/>Respectively representing the abscissa and the ordinate of the nl-th Grid point in the 2D Grid map Grid 4;
Step 8.5, calling a distance function pdist () in a Scipy package in a Python programming language, calculating Euclidean distance { Dis '1,…,Dis′NL×NL } between points in a point set Cint nl in the nl-th class, and calling a squareform () function in a Scipy package in the Python programming language to convert { Dis' 1,…,Dis′NL×NL } into a matrix form to obtain a distance matrix of points in the cluster
Step 8.6, connectingAs input, a two-level 2D mesh map/> of size Wg×Wg is generated according to the method of step 8.4 Wherein/>An INL grid point in the nl class is represented, and INL represents the number of points in the nl class;
Step 8.7, call OpenCV library function CV2.restore () will each Grid point in Grid 4 Block/> enlarged to Wg×Wg sizeThe Grid 4 with the enlarged size is assigned to Grid 5,Grid5, and the Grid consists of Wg×Wg blocks, wherein the size of each block is Wg×Wg;
step 8.8, the second-level 2D grid diagram of the nl-th class Corresponding blocks/>, embedded in Grid 5 in sequenceGrid 5 was defined as the 2D pseudo image/>, of the nth obj bottled itemI.e./>
7. The bottled object text detection method based on the 3D point cloud, according to claim 6, is characterized in that: the specific steps of the step 10 are as follows:
Step 10.1, designing MSUnet a network structure;
step 10.2, defining MSUnet a loss function of the network multi-classification task:
Wherein: n represents the number of training samples; c represents the number of categories; y ic is a sample class identifier, if the class of the i-th sample is c, y ic =1, otherwise y ic=0;pic represents the probability that the i-th sample is predicted to be class c;
step 10.3, labeling MSUnet sample labels of the network: 2D grid pseudo-image The label of each non-blank position pixel in the map is the corresponding point/>, in the point cloud data PS 2 A category label value label of (1), a pixel label value label=0 of a blank position;
Step 10.4, training MSUnet the network.
8. The bottled object text detection method based on the 3D point cloud, which is characterized in that: the specific steps of the step 15 are as follows:
Step 15.1, defining an adjusted text instance classification result CL ', defining a text instance classification result counter c, CL ' initialized to an empty set, CL ' =null, c initialized to 1, c=1;
Step 15.2, taking out the c-th classification result CL c,clc={x1,…,xi,…,xNP from the text instance classification result cl= { CL 1,…,clc,…,clC }, wherein x i is the i-th point in CL c, and NP is the total number of points in CL c;
Step 15.3, for each point x i in cl c, calculate the distance of point x i to the other points in cl c:
dij=||xi-xj||2,xi∈clc,xj∈clc,i≠j
Step 15.4, setting super parameter km, selecting the nearest distance between km x i and other points AggregationMean value d i,/>, of all elements in (1)All d i constitute the set d 1,…,di,…,dNP;
step 15.5, calculate mean and variance of { d 1,…,di,…,dNP } stddev:
Step 15.6, setting a super parameter lambda, and calculating a threshold value:
thre=mean+λ×stddev
Step 15.7, judging whether x i is an outlier, if d i is more than thre, x i is the outlier, eliminating x i from the collection cl c, and assigning cl c after all outliers are eliminated to cl' c;
Step 15.8, determining whether C is greater than C, if C > C, preserving CL ', otherwise, c=c+1, adding CL' c to CL ', i.e., CL' =cl '+cl' c, and returning to step 15.2.
9. The bottled object text detection method based on the 3D point cloud, according to claim 8, is characterized in that: the specific steps of the step 17 are as follows:
In step 17.1, an image I' k′, a classification point set cp= { CP 1,…,cpc,…,cpC }, a text Score set score= { sc 1,…,scc,…,scC } of CP corresponding points, a threshold variable T is defined, wherein, Represents the nz c th point in cp c,A score representing the NZ c th point in sc c, NZ c representing the number of points in cp c;
Step 17.2, filtering the points with low Chinese score in the CP according to the threshold T, namely: if it is Then delete the point in cp c The CP filtered out of all points with low literal scores is assigned to the variable TR, tr= { TR 1,…,trc,…,trC }, where, Represents NTR c th point in tr c, and NTR c represents the number of points in tr c;
Step 17.3, calculating center points of each category in the TR, and forming a center point set TC, TC= { cen 1,…,cenc,…,cenC }, wherein cen c=mean(trc), and mean () represents a mean function;
Step 17.4, defining a set of text polygon outer bounding boxes Poly k′,Polyk′ in the image I k "to be initialized to be empty, poly k′ =null, defining all pixels in the image B k,Bk having the same size as the image I k" to be assigned 0 values, and initializing a text instance class counter c to be 1, c=1;
Step 17.5, using cen c as an initial seed point, calling a flooding filling library function floodfill () in OpenCV to fill tr c to obtain a filled point set trfill c, and assigning 1 value to the pixel of the corresponding point in trfill c in B k;
Step 17.6, calling OpenCV library function cv2.morphyox () to perform 5 times of open operation processing on B k to obtain an image Mopen, and calling OpenCV library function cv2.dialite () to perform 10 times of expansion processing on image Mopen to obtain an image Mdilate;
Step 17.7, calling the connected region ConectedRegion of the image Mdilate by the OpenCV library function cv2.connectiedcomponents withstats (), calling the outline ContourPS of the connected region ConectedRegion by the OpenCV library function cv2.findcoutour (), and calling the convex hull vertex set of the outline ContourPS by the OpenCV library function cv2.convexhull () function, namely the polygon vertex set of the c-th text example Polygonal peripheral frame/>, constituting the c-th text instance in image I' k′ I.e.Where NPL represents the number of vertices in CNT c, each vertex in CNT c is plotted in image I k';
step 17.8, judging whether C is less than or equal to C, if C is less than or equal to C, c=c+1, and if C is less than or equal to C Added to Poly k′, i.eReturning to the step 17.5, otherwise, entering the step 17.9;
Step 17.9, outputting a set Poly k′ of all text polygon peripheral frames on the image I' k′,
CN202110769157.0A 2021-07-07 2021-07-07 Bottled object text detection method based on 3D point cloud Active CN113657375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110769157.0A CN113657375B (en) 2021-07-07 2021-07-07 Bottled object text detection method based on 3D point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110769157.0A CN113657375B (en) 2021-07-07 2021-07-07 Bottled object text detection method based on 3D point cloud

Publications (2)

Publication Number Publication Date
CN113657375A CN113657375A (en) 2021-11-16
CN113657375B true CN113657375B (en) 2024-04-19

Family

ID=78489167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110769157.0A Active CN113657375B (en) 2021-07-07 2021-07-07 Bottled object text detection method based on 3D point cloud

Country Status (1)

Country Link
CN (1) CN113657375B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782464B (en) * 2022-04-07 2023-04-07 中国人民解放军国防科技大学 Reflection chromatography laser radar image segmentation method based on local enhancement of target region

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009301411A (en) * 2008-06-16 2009-12-24 Kobe Steel Ltd Image processing method and image processing device for sampling embossed characters
CN104598885A (en) * 2015-01-23 2015-05-06 西安理工大学 Method for detecting and locating text sign in street view image
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN112070082A (en) * 2020-08-24 2020-12-11 西安理工大学 Curve character positioning method based on instance perception component merging network
CN113033247A (en) * 2019-12-09 2021-06-25 Oppo广东移动通信有限公司 Image identification method and device and computer readable storage medium
CN113033543A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Curved text recognition method, device, equipment and medium
CN113052835A (en) * 2021-04-20 2021-06-29 江苏迅捷装具科技有限公司 Medicine box detection method and detection system based on three-dimensional point cloud and image data fusion

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009301411A (en) * 2008-06-16 2009-12-24 Kobe Steel Ltd Image processing method and image processing device for sampling embossed characters
CN104598885A (en) * 2015-01-23 2015-05-06 西安理工大学 Method for detecting and locating text sign in street view image
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN113033247A (en) * 2019-12-09 2021-06-25 Oppo广东移动通信有限公司 Image identification method and device and computer readable storage medium
CN112070082A (en) * 2020-08-24 2020-12-11 西安理工大学 Curve character positioning method based on instance perception component merging network
CN113052835A (en) * 2021-04-20 2021-06-29 江苏迅捷装具科技有限公司 Medicine box detection method and detection system based on three-dimensional point cloud and image data fusion
CN113033543A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Curved text recognition method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Straightforward and Efficient Instance-Aware Curved Text Detector";Fan Zhao;《sensors》;20210310;全文 *
"一种直接高效的自然场景汉字逼近定位方法";赵凡;《计算机工程与应用》;20201231;第57卷(第6期);全文 *

Also Published As

Publication number Publication date
CN113657375A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
CN101692224B (en) High-resolution remote sensing image search method fused with spatial relation semantics
CN108319957A (en) A kind of large-scale point cloud semantic segmentation method based on overtrick figure
CN111242208A (en) Point cloud classification method, point cloud segmentation method and related equipment
CN107679250A (en) A kind of multitask layered image search method based on depth own coding convolutional neural networks
CN110852182A (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
CN111738344A (en) Rapid target detection method based on multi-scale fusion
Wang et al. Tea picking point detection and location based on Mask-RCNN
CN107527054B (en) Automatic foreground extraction method based on multi-view fusion
CN113159232A (en) Three-dimensional target classification and segmentation method
CN106815578A (en) A kind of gesture identification method based on Depth Motion figure Scale invariant features transform
CN108960260A (en) A kind of method of generating classification model, medical image image classification method and device
CN112396655B (en) Point cloud data-based ship target 6D pose estimation method
CN110751097A (en) Semi-supervised three-dimensional point cloud gesture key point detection method
CN114782417A (en) Real-time detection method for digital twin characteristics of fan based on edge enhanced image segmentation
CN115240119A (en) Pedestrian small target detection method in video monitoring based on deep learning
CN117274388B (en) Unsupervised three-dimensional visual positioning method and system based on visual text relation alignment
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN108537109B (en) OpenPose-based monocular camera sign language identification method
CN115346071A (en) Image classification method and system for high-confidence local feature and global feature learning
CN104933415B (en) A kind of visual remote sensing image cloud sector detection method in real time
CN113657375B (en) Bottled object text detection method based on 3D point cloud
Van Hoai et al. Feeding Convolutional Neural Network by hand-crafted features based on Enhanced Neighbor-Center Different Image for color texture classification
CN114299339A (en) Three-dimensional point cloud model classification method and system based on regional correlation modeling
CN109902692A (en) A kind of image classification method based on regional area depth characteristic coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant