CN111340873A

CN111340873A - Method for measuring and calculating object minimum outer envelope size of multi-view image

Info

Publication number: CN111340873A
Application number: CN202010128916.0A
Authority: CN
Inventors: 何力; 朱蕾; 陈炜楠; 管贻生
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-06-26
Anticipated expiration: 2040-02-28
Also published as: CN111340873B

Abstract

The application discloses a method for measuring and calculating the minimum object outer envelope size of a multi-view image, which comprises the following steps: respectively shooting images of an object to be measured from two different angles by using terminal equipment, and calculating spatial coordinates of pixel points in the shot images in a world coordinate system so as to obtain a three-dimensional point cloud model of the images; separating and extracting the point cloud of the object from the three-dimensional point cloud model of the image to complete the point cloud extraction of the object; and for the point cloud of the object, generating a vertex of the surface of the object by using a directed envelope box, and obtaining a characteristic vector through principal component analysis, so as to obtain the minimum outer envelope of the object and obtain the size of the object. According to the method, the actual application requirement of the logistics industry on package information acquisition is better met by acquiring the multi-view images of the object and extracting the three-dimensional point cloud model to estimate the size; the method realizes process automation, does not need manual operation except shooting, and has wide application scenes and convenient use.

Description

Method for measuring and calculating object minimum outer envelope size of multi-view image

Technical Field

The application relates to the field of logistics and image processing, in particular to a method for measuring and calculating the minimum outer envelope size of an object of a multi-view image.

Background

The wide application of regional economic integration and internet technology enables the logistics industry to be revolutionarily developed. Refined distribution and intelligent operation are basic requirements of modern logistics, and the weight and volume of packages are the most basic elements in the logistics. At present, the acquisition of parcel weight information is very convenient, and the acquisition of volume information has the problems of complex operation, low effect and the like, thereby limiting the development of logistics intellectualization to a certain extent.

Along with the popularization of mobile phones and the improvement of image processing technology, logistics practitioners measure the length, width and height of packages by using mobile phone cameras, sorting and delivering of the packages can be simplified, and logistics efficiency is improved. However, the processing of parcel image information and the like and the calculation and measurement of the minimum outer envelope of parcels with irregular shapes and the like (such as tables and chairs) are two major difficulties.

Disclosure of Invention

The application aims to provide a method for measuring and calculating the minimum outer envelope size of an object in a multi-view image, which is used for acquiring the outer envelope of the object by using the multi-view image shot by a terminal device to obtain the size information of the object.

In order to realize the task, the following technical scheme is adopted in the application:

an object minimum outer envelope dimension measuring and calculating method of a multi-view image comprises the following steps:

respectively shooting images of an object to be measured from two different angles by using terminal equipment, and calculating spatial coordinates of pixel points in the shot images in a world coordinate system so as to obtain a three-dimensional point cloud model of the images;

separating and extracting the point cloud of the object from the three-dimensional point cloud model of the image to complete the point cloud extraction of the object;

and for the point cloud of the object, generating a vertex of the surface of the object by using a directed envelope box, and obtaining a characteristic vector through principal component analysis, so as to obtain the minimum outer envelope of the object and obtain the size of the object.

Further, the calculating a spatial coordinate of a pixel point in the shot image in a world coordinate system to obtain a three-dimensional point cloud model of the image includes:

if the origins of the camera coordinate system and the world coordinate system coincide with each other, the same object has the same depth under the two coordinate systems, and therefore, the pixel point p (u, v) in the image shot by the camera has the spatial coordinate relation (X) with the pixel point p (u, v) in the world coordinate system_w,Y_w,Z_w) Comprises the following steps:

wherein f is the focal length of the camera, the pixel coordinate (u) of the image center₀,v₀) And the physical size of each pixel of the light sensing device in the camera is d_xAnd d_y；

The point (x, y) corresponding to the pixel point p (u, v) in the image in the camera coordinate system satisfies the relationship:

let the coordinate of a point P on the object in the world coordinate system be (X)_w，Y_w，Z_w) According to two times of shooting at different angles, X can be calculated and obtained due to the intersection point of the spatial positions of the corresponding pixels in the two shot images_wAnd Y_wAnd P is in a ray O from the optical center_Cp is above;

note O_CAnd O_C' is the center of the lens when the object is photographed twice from different angles, respectively, then P is the ray O_Cp and O_CPoint of intersection of 'p', the point of intersectionI.e. seeking points that satisfy the geometric constraints on the orders so that the reprojection error is minimal:

C(p，p′)＝d(P，p)²+d(P，p′)²

where C (P, P ') is the reprojection error value, P represents a point on the object, P and P' are the same point in the image taken twice by the camera, respectively, d ()²Solving a function for the Euclidean distance;

by solving the above equation, Z is obtained_w(ii) a According to the same method, the spatial coordinates of all pixel points in the shot image are calculated, and the coordinate points form a three-dimensional point cloud model of the image, so that the three-dimensional point cloud expression of the image is completed.

Further, the separating and extracting the point cloud of the object from the three-dimensional point cloud model of the image to complete the point cloud extraction of the object includes:

extracting the characteristics of the collected point cloud data, judging and classifying the characteristic expressions of the extracted characteristic points, and dividing the characteristic points into two types of high confidence degree and low confidence degree;

for the high-confidence characteristic points, taking the category corresponding to the maximum value in the characteristic vectors of the characteristic points as the category of the characteristic points; for the feature points with low confidence coefficient, establishing a similarity expression matrix between the feature points by using position information between the feature points;

taking the similarity expression matrix as the association between the feature points with low confidence coefficient, and classifying the feature points with low confidence coefficient into the category of the feature points with high confidence coefficient with the maximum association;

and summarizing and combining the feature points divided into the same category to realize semantic segmentation of the point cloud.

Further, the discriminating and classifying the feature expression of the extracted feature points to classify the feature points into two categories, namely high confidence level and low confidence level, includes:

and (3) distinguishing the confidence coefficient of feature expression by taking n as a threshold value for the proportion of the maximum value to the second maximum value in the feature vectors of the feature points: and taking the feature points with the ratio of the maximum value to the second largest value in the feature vectors larger than n as high-confidence feature points, and taking the feature points smaller than n as low-confidence feature points.

Further, for the feature points with low confidence coefficient, establishing a similarity expression matrix between the feature points by using the position information between the feature points, including:

constructing a feature correlation matrix M between feature points by using feature expression vectors of the feature points with low confidence_fv(ii) a Normalizing the coordinates of the feature points, and constructing a distance correlation matrix M between the feature points by using the distances of the normalized coordinates between the feature points_dm(ii) a The similarity expression matrix M between feature points is | M_fv-M_dm|。

Further, the characteristic correlation matrix M_fvIn (2), feature expression vector V of any two feature points_i、V_jThe similarity of (A) is as follows:

cov (V)_i，V_j) Is a V_i、V_jCovariance of D (V)_i)、D(V_j) Are each V_i、V_jThe variance of (c).

Further, the distance correlation matrix M_dmThe P, Q distance between any two feature points is:

wherein (x)_P，y_P，z_P)、(x_Q，y_Q，z_Q) The normalized coordinates of P, Q representing the feature points.

Further, the classifying the low-confidence feature points into the category where the high-confidence feature points with the maximum association degree are located by using the similarity expression matrix as the association between the feature points includes:

and constructing a network graph by using graph theory, and classifying the low-confidence characteristic points into the category of the high-confidence characteristic points with the maximum relevance by combining the similarity expression matrix.

Further, the constructing a network graph by using graph theory, and classifying the low-confidence feature points into the category of the high-confidence feature points with the maximum association degree by combining the similarity expression matrix includes:

establishing a network graph, taking all feature points as vertexes in the network graph, taking the association between adjacent feature points as edges of the graph, and defining the weight of each edge according to the similarity between the feature points, wherein the weight of each edge is defined by the similarity expression matrix; and then, calculating the probability of each low-confidence characteristic point reaching each high-confidence characteristic point by utilizing the weight of the edge, taking the class of the high-confidence characteristic point with the maximum probability as the class of the low-confidence characteristic point, and dividing the low-confidence characteristic points into the class.

The application provides a terminal device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for measuring and calculating the minimum outer envelope size of an object in a multi-view image according to the first aspect when executing the computer program.

The present application provides a computer readable storage medium storing a computer program, which when executed by a processor implements the steps of the method for object minimum outer envelope dimension estimation for multi-view images of the aforementioned first aspect.

The application has the following technical characteristics:

1. the method for acquiring the multi-view images of the object and extracting the three-dimensional point cloud model to estimate the size by using the terminal equipment, such as a mobile phone and the like, is provided, and the actual application requirement of the logistics industry on acquisition of the parcel information is better met; the method realizes process automation, does not need manual operation except shooting, and has wide application scenes and convenient use.

2. The method simplifies the processes of sorting, classifying and delivering the packages, improves the logistics efficiency and is beneficial to promoting the development of modern logistics intelligent operation.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present application;

FIG. 2 is a schematic diagram of a pinhole model of a camera;

FIG. 3 is a schematic diagram of calculating object space coordinates from different perspectives;

FIGS. 4 (a), (b), (c) are schematic diagrams of point clouds illustrating an environment and an object from three different angles, respectively;

fig. 5 (a), (b), and (c) are point cloud schematic diagrams showing chairs separated from the three-dimensional point cloud environment at three different angles, respectively.

Detailed Description

In view of the problems of complicated operation, low efficiency and the like in the logistics industry of acquiring volume information of parcels (particularly parcels with irregular shapes), the method for measuring and calculating the minimum object outer envelope size of the multi-view image is provided. The method uses point cloud to express packages shot in multiple visual angles; performing semantic segmentation on a multi-angle image acquired by a mobile phone camera by utilizing a deep convolutional neural network, taking planes for placing logistics packages such as a desktop and a floor as segmentation targets, and separating the packages from the placing planes where the packages are placed; and according to the package point cloud obtained by segmentation, obtaining the length, width and height of the minimum external cuboid.

As shown in fig. 1, the present application provides a method for measuring and calculating the minimum object outer envelope size of a multi-view image, comprising the following steps:

and S1, respectively shooting images of the object to be measured from two different angles by using the terminal equipment, and calculating the space coordinates of pixel points in the shot images in a world coordinate system, thereby obtaining a three-dimensional point cloud model of the images.

In the embodiment of the application, the terminal device adopts a mobile phone, and the images of the object are respectively acquired from the left side surface and the right side surface of the object through a rear camera of the mobile phone. In the pinhole model structure of the camera, the camera's internal parameters are known quantities related to the camera's own properties, including the focal length f, the pixel coordinates (u) of the image center₀,v₀) And the physical size of each pixel of the light sensing device in the camera is d_xAnd d_y。

If the origin points of the camera coordinate system and the world coordinate system are coincidentThe single and same object in the two coordinate systems has the same depth, so that the pixel point p (u, v) in the image shot by the camera and the space coordinate relation (X) of the pixel point in the world coordinate system_w,Y_w,Z_w) Comprises the following steps:

wherein Z is_cRepresenting the coordinates of the object in the Z-axis in the camera coordinate system.

as shown in FIG. 2, O_CX_CY_CZ_CAs camera coordinate system, O_wX_wY_wZ_wIs a world coordinate system. The coordinate of a point P on the object in the world coordinate system is (X)_w，Y_w，Z_w) According to the two times of shooting at different angles, X can be calculated as the intersection point of the spatial positions of the corresponding pixels in the two shot images is shown in FIG. 3_wAnd Y_wAnd P is in a ray O from the optical center_Cp is above.

Since in the imaging model shown in FIG. 2, the depth of the P point (i.e., Z)_w) Is a ray which passes through the midpoint p of the picture plane starting from the center of the camera lens, for Z_wAs shown in fig. 3:

O_Cand O_C' is the center of the lens when the object is photographed twice from different angles, respectively, then P is the ray O_Cp and O_CThe intersection of 'p', i.e., the point that seeks to satisfy the geometric constraint on the stage, minimizes the reprojection error:

C(p，p′)＝d(P，p)²+d(P，p′)²

wherein C (P, P ') is a reprojection error value, P represents a point on the object, P and P'The same point in the image, d ()²Solving the function for the euclidean distance, i.e.:

therefore, after two shots at different angles, the formula C (P, P') ═ d (P, P)²+d(P，p′)²Containing only an unknown quantity of Z_wThus solving for Z_w。

According to the same method, the spatial coordinates of all pixel points in the shot image are calculated, and the coordinate points form a three-dimensional point cloud model of the image, so that the three-dimensional point cloud expression of the image is completed.

And S2, separating and extracting the point cloud of the object from the three-dimensional point cloud model of the image by using the deep neural network, and finishing the point cloud extraction of the object.

The method and the device combine the characteristic description of the point cloud and the distance between the point pair to construct the similarity expression of the characteristic points. The characteristic description is the Pearson similarity of the characteristic points in the pre-training point cloud classification; the distance between the point pairs is represented by the euclidean distance. Introducing a random walk algorithm in the two-dimensional image segmentation field into the three-dimensional point cloud processing, and classifying a two-dimensional similarity matrix, so as to realize the efficient segmentation of the three-dimensional point cloud, wherein the method specifically comprises the following steps:

and S2.1, extracting the characteristics of the three-dimensional point cloud model, judging and classifying the characteristic expression of the extracted characteristic points, and classifying the characteristic points into two types of high confidence degree and low confidence degree.

Specifically, for the ratio of the maximum value to the second largest value in the feature vectors of the feature points, n is used as a threshold, and the confidence coefficient of feature expression is distinguished: and taking the feature points with the ratio of the maximum value to the second largest value in the feature vectors larger than n as high-confidence feature points, and taking the feature points smaller than n as low-confidence feature points. n is a constant that can be adjusted.

S2.2, for the high-confidence characteristic points, taking the category corresponding to the maximum value in the characteristic vectors of the characteristic points as the category of the characteristic points; for the feature points with low confidence coefficient, establishing a similarity expression matrix between the feature points by using the position information between the feature points, wherein the similarity expression matrix comprises the following steps:

The characteristic correlation matrix M_fvIn (2), feature expression vector V of any two feature points_i、V_jThe similarity of (A) is as follows:

The distance correlation matrix M_dmThe P, Q distance between any two feature points is:

S2.3, using the similarity expression matrix as the association between the feature points with low confidence coefficient, and classifying the feature points with low confidence coefficient into the category of the feature points with high confidence coefficient with the maximum association degree, including:

constructing a network graph by using graph theory, and classifying the low-confidence characteristic points into the category of the high-confidence characteristic points with the maximum association degree by combining the similarity expression matrix, wherein the method specifically comprises the following steps:

And S2.4, summarizing and combining the feature points divided into the same category to realize semantic segmentation of the point cloud and extract the point cloud of the object.

As shown in fig. 4, for the chair to be subjected to outer envelope extraction, the diagrams (a) to (c) in fig. 4 show the environment of the chair from three different angles, and the diagrams (a) to (c) in fig. 5 are point cloud schematic diagrams showing the chair separated from the chair in the three-dimensional point cloud environment from three different angles.

S3, for the point cloud of the object, generating the vertex of the object surface by using an oriented envelope box (OBB), and obtaining a characteristic vector, namely the main axis of the OBB, through Principal Component Analysis (PCA); the principal component analysis is to calculate a covariance matrix formed by covariance between any two variables, judge the correlation size between the variables, extract a group of linear uncorrelated variable sets as principal components, and obtain the eigenvector of the covariance matrix, namely the direction of the OBB envelope box, so as to obtain the minimum outer envelope of the object and obtain the length, width and height dimensions of the object, thereby facilitating stacking and storage of goods in practical application.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for measuring and calculating the minimum object outer envelope size of a multi-view image is characterized by comprising the following steps:

2. The method for measuring and calculating the minimum outer envelope size of an object in a multi-view image according to claim 1, wherein the step of calculating the space coordinates of pixel points in the captured image in a world coordinate system to obtain a three-dimensional point cloud model of the image comprises:

if the origins of the camera coordinate system and the world coordinate system coincide with each other, the same object has the same depth under the two coordinate systems, and therefore, the pixel point p (u, v) in the image shot by the camera has the spatial coordinate relation (X) with the pixel point p (u, v) in the world coordinate system_w，Y_w，Z_w) Comprises the following steps:

wherein f is the focal length of the camera, the pixel coordinate (u) of the image center₀，v₀) And the physical size of each pixel of the light sensing device in the camera is d_xAnd d_y；

note O_CAnd O_C' is the center of the lens when the object is photographed twice from different angles, respectively, then P is the ray O_Cp and O_CThe intersection of 'p', i.e., the point that seeks to satisfy the geometric constraint on the stage, minimizes the reprojection error:

C(p，p′)＝d(P，p)²+d(P，p′)²

3. The method for measuring and calculating the minimum outer envelope dimension of the object in the multi-view image according to claim 1, wherein the step of separating and extracting the point cloud of the object from the three-dimensional point cloud model of the image comprises the steps of:

4. The method for measuring and calculating the minimum object envelope dimension of a multi-view image according to claim 3, wherein the step of performing discriminant classification on the feature expressions of the extracted feature points to classify the feature points into two categories, namely high confidence and low confidence, comprises the steps of:

5. The method for measuring and calculating the minimum outer envelope size of an object in a multi-view image according to claim 3, wherein for the feature points with low confidence, establishing a similarity expression matrix between the feature points by using the position information between the feature points comprises:

6. The method as claimed in claim 5, wherein the feature correlation matrix M is a matrix of minimum object envelop dimensions_fvIn (2), feature expression vector V of any two feature points_i、V_jThe similarity of (A) is as follows:

cov (V)_i，V_j) Is a V_i、V_jCovariance of D (V)_i)、D(V_j) Is divided intoIs other than V_i、V_jThe variance of (a);

7. The method for measuring and calculating the minimum object outer envelope size of the multi-view image according to claim 3, wherein the step of classifying the feature points with low confidence degree into the category of the feature points with high confidence degree with the maximum relevance degree by using the similarity expression matrix as the relevance among the feature points comprises the steps of:

8. The method for measuring and calculating the minimum object outer envelope size of the multi-view image according to claim 7, wherein the step of constructing the network graph by using graph theory and combining the similarity expression matrix to classify the low-confidence feature points into the category of the high-confidence feature points with the maximum association degree comprises the following steps:

9. A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that the steps of the method according to any of claims 1-8 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.