CN116129037A

CN116129037A - Visual touch sensor, three-dimensional reconstruction method, system, equipment and storage medium thereof

Info

Publication number: CN116129037A
Application number: CN202211596160.8A
Authority: CN
Inventors: 叶姗姗; 张勇; 陈宇
Original assignee: Zhuhai Shixi Technology Co Ltd
Current assignee: Zhuhai Shixi Technology Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-05-16
Anticipated expiration: 2042-12-13
Also published as: CN116129037B

Abstract

The application provides a visual touch sensor, a three-dimensional reconstruction method, a system, equipment and a storage medium thereof, and relates to the technical field of sensors. The method comprises the following steps: the method comprises the steps of calibrating a binocular camera in a touch sensor, processing two images shot by the binocular camera into a reference image and a matching image, dividing the reference image and the matching image into an edge point area and a non-edge point area corresponding to the reference image, and an edge point area and a non-edge point area corresponding to the matching image, calculating depth values of all matching points in the reference image and the matching image by combining parallax corresponding to the matching points in the edge point area and parallax corresponding to the matching points in the non-edge point area, and performing three-dimensional reconstruction on the surface of the binocular image by using the depth values of all the matching points in the reference image and the matching image.

Description

Visual touch sensor, three-dimensional reconstruction method, system, equipment and storage medium thereof

Technical Field

The application relates to the technical field of sensors, in particular to a visual touch sensor, a three-dimensional reconstruction method, a three-dimensional reconstruction system, three-dimensional reconstruction equipment and a storage medium thereof.

Background

In recent years, related researchers have developed touch sensors based on different principles and used for sensing and task operations of various robots, such as piezoresistive, magnetic field, photoelectric, etc., but the signals generated after they are stressed are very weak, and after the signals are amplified by a signal amplifying circuit, the data often contain a lot of interferences, which forms a great obstacle for research. Compared with the traditional pressure array touch sensor, the visual touch sensor captures colloid deformation information generated by contact through a camera system, and then models the colloid deformation information into different touch characteristics by adopting a computer vision algorithm, so that the visual touch sensor has the advantages of high spatial resolution, low cost, rich sensor information and the like, and gradually becomes a hotspot direction in the research field of the target touch sensor.

Currently, mainly used are GelSight type visual tactile sensors, depth camera tactile sensors and binocular tactile sensors. The GelSight type visual touch sensor adopts a structural design combining monocular and RGB color light sources and senses the contact three-dimensional geometric shape with high precision through a luminosity three-dimensional algorithm, but the problems that the light source structure design and calibration are complex, the expansion to a multi-curvature contact surface scene is difficult and the like still exist at present. The depth camera tactile sensor can capture a tactile image and acquire a contact depth map of the sensor surface, but a certain imaging distance needs to be ensured, so that the structural design of the depth camera tactile sensor is redundant. The binocular tactile sensor adopts a binocular stereo matching algorithm to reconstruct the contact surface in three dimensions, so that the structural design requirement of a light source is effectively reduced, the sensor principle can be expanded to a multi-curvature contact surface, and the binocular multi-medium light refraction propagation error is modeled by extra degree. However, the existing binocular stereo matching algorithm is not ideal in depth value accuracy, is not suitable for acquiring object depth value information with single surface texture, and is high in stereo matching calculation complexity, large in calculation amount and difficult to reconstruct the surface of an object.

Disclosure of Invention

In view of the above problems, the present application is provided to provide a visual tactile sensor, a three-dimensional reconstruction method, a system, a device and a storage medium thereof, which overcome or at least partially solve the above problems, and can solve the problem that the depth value obtained by the binocular stereo matching algorithm in the related art is not ideal enough in precision, so as to better complete the three-dimensional reconstruction of the multi-curvature object. The technical scheme is as follows:

in a first aspect, a three-dimensional reconstruction method of a visual tactile sensor is provided, the visual tactile sensor capturing images using a binocular camera, the method comprising:

calibrating a binocular camera in the visual touch sensor, and processing two images shot by the binocular camera into a reference image and a matching image;

respectively carrying out edge detection on the reference image and the matching image, and dividing the reference image and the matching image into an edge point area and a non-edge point area corresponding to the reference image and an edge point area and a non-edge point area corresponding to the matching image;

performing point-by-point matching on the edge point areas corresponding to the reference image and the matching image by using mark points preset on the visual touch sensor, and calculating parallax corresponding to the matching points in the edge point areas by using pixel point coordinates obtained by matching;

Performing point-by-point matching on the reference image and a non-edge point area corresponding to the matching image by using a matching algorithm, and calculating parallax corresponding to a matching point in the non-edge point area by using pixel point coordinates obtained by matching;

and calculating depth values of all matching points in the reference image and the matching image by combining the parallax corresponding to the matching points in the edge point area and the parallax corresponding to the matching points in the non-edge point area, and performing three-dimensional reconstruction on the surface of the binocular image by using the depth values of all the matching points in the reference image and the matching image.

In a second aspect, there is provided a three-dimensional reconstruction system for a visual tactile sensor that captures images using a binocular camera, the system comprising:

the calibration unit is used for performing calibration processing on the binocular camera in the visual touch sensor, and processing two images shot by the binocular camera into a reference image and a matching image;

the dividing unit is used for respectively carrying out edge detection on the reference image and the matching image and dividing the reference image and the matching image into an edge point area and a non-edge point area corresponding to the reference image and an edge point area and a non-edge point area corresponding to the matching image;

the first matching unit is used for carrying out point-by-point matching on the edge point areas corresponding to the reference image and the matching image by utilizing mark points which are preset on the visual touch sensor, and calculating parallax corresponding to the matching points in the edge point areas by utilizing pixel point coordinates obtained by matching;

The second matching unit is used for carrying out point-by-point matching on the reference image and the non-edge point area corresponding to the matching image by utilizing a matching algorithm, and calculating parallax corresponding to the matching point in the non-edge point area by utilizing the pixel point coordinates obtained by matching;

and the reconstruction unit is used for calculating depth values of all the matching points in the reference image and the matching image by combining the parallax corresponding to the matching points in the edge point area and the parallax corresponding to the matching points in the non-edge point area, and performing three-dimensional reconstruction on the surface of the binocular image by using the depth values of all the matching points in the reference image and the matching image.

In a third aspect, there is provided a visual tactile sensor comprising: the system comprises a plurality of layers of soft silica gel, a binocular camera, a supporting body and a light source, wherein the binocular camera is two RGB cameras and is used for acquiring reference image pixel points and corresponding pixel points in a matched image;

the module structure corresponding to the visual touch sensor is in a fingertip shape and consists of a hemisphere and a cylinder, and is used for calculating parallax between binocular corresponding pixels obtained through shooting by a binocular camera, and the three-dimensional reconstruction method of the visual touch sensor is carried out on the surface of the multi-curvature object by utilizing the parallax.

In a fourth aspect, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the three-dimensional reconstruction method of a visual tactile sensor described above when the computer program is executed.

In a fifth aspect, a computer readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, implements the steps of the three-dimensional reconstruction method of a visual tactile sensor described above.

By means of the technical proposal, compared with the three-dimensional reconstruction of the three-dimensional matching used in the existing mode, the visual touch sensor, the three-dimensional reconstruction method, the system, the device, the storage medium, the computer device and the computer readable storage medium thereof, the visual touch sensor and the three-dimensional reconstruction method, the visual touch sensor, the device and the computer readable storage medium thereof divide the reference image and the matching image into an edge point area and a non-edge point area corresponding to the reference image and an edge point area corresponding to the matching image and a non-edge point area corresponding to the matching image by respectively carrying out edge detection on the reference image and the matching image, further carry out point-by-point matching on the edge point areas corresponding to the reference image and the matching image by utilizing marking points which are preset on the visual touch sensor, calculate parallax corresponding to the matching points in the edge point area by using pixel point coordinates obtained by matching, the matching algorithm is utilized to carry out point-by-point matching on the reference image and the non-edge point area corresponding to the matching image, the pixel point coordinates obtained by matching are used to calculate the parallax corresponding to the matching point in the non-edge point area, the three-dimensional matching process can be realized in areas so as to improve the image matching speed, the parallax corresponding to the matching point in the edge point area and the parallax corresponding to the matching point in the non-edge point area are further combined, the depth values of all the matching points in the reference image and the matching image are calculated, the depth values of all the matching points in the reference image and the matching image are used to carry out three-dimensional reconstruction on the surface of the binocular image, so that the image obtained by the binocular camera is enabled to realize accurate pixel point matching, the three-dimensional reconstruction of the multi-curvature object is better completed, and the three-dimensional reconstruction effect is improved.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flow chart of a three-dimensional reconstruction method of a visual touch sensor according to an embodiment of the present application;

FIG. 2 is a flow chart of step 101 of FIG. 1;

fig. 3 is a schematic diagram of mapping relationships between corresponding coordinate systems of camera imaging models provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a coordinate transformation process according to an embodiment of the present application;

FIG. 5 is a schematic illustration of a calibration process for polar correction provided by an embodiment of the application;

FIG. 6 is a schematic view of an binocular distance model provided in the examples of the application

FIG. 7 is a schematic flow chart of median filtering smooth images provided in the application embodiment;

FIG. 8 is a flow chart of step 103 of FIG. 1;

FIG. 9 is a flow chart of step 104 of FIG. 1;

FIG. 10 is a flow chart of another three-dimensional reconstruction method for a visual tactile sensor according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a visual tactile sensor according to an embodiment of the present application;

fig. 12 is a block diagram of a three-dimensional reconstruction device for a visual tactile sensor according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that such uses may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "include" and variations thereof are to be interpreted as open-ended terms that mean "include, but are not limited to.

It should be noted that, the three-dimensional reconstruction method of the visual touch sensor provided by the application can be applied to a terminal, for example, the terminal can be a large tablet, a mobile phone or a computer for various commercial purposes, a common consumption tablet, an intelligent television, a portable computer terminal, or a fixed terminal such as a desktop computer. For convenience of explanation, the terminal is taken as an execution body for illustration in the application.

Embodiments of the present application may be applied to computer systems/servers that are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.

A computer system/server may be described in the general context of computer-system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

The embodiment of the application provides a three-dimensional reconstruction method of a visual touch sensor, wherein the visual touch sensor uses a binocular camera to shoot an image, so that the problem that depth value accuracy obtained by a binocular stereo matching algorithm in the related art is not ideal can be solved, and further three-dimensional reconstruction of a multi-curvature object can be completed better, as shown in fig. 1, the method can comprise the following steps:

101. and calibrating the binocular camera in the visual touch sensor, and processing two images shot by the binocular camera into a reference image and a matching image.

The calibration processing comprises the calibration of the camera and the calibration of polar correction, the calibration of the camera is mainly the process of carrying out standardization processing on imaging parameters of the binocular camera, so that a shot image can be located in a world coordinate system after the standardization processing, subsequent image matching is facilitated, the calibration of polar correction is mainly to align the binocular camera to the same observation plane on the field of view, so that pixel rows on the camera are strictly aligned, the corresponding imaging planes of the binocular camera are parallel and aligned, matching points are in the same row, and only the matching points need to be found on the same row, thereby greatly saving time.

Specifically, in the calibration process of a camera, a camera imaging model determines the mutual relation between the world coordinate value of a certain point of a visual surface of an object in a scene and the image coordinate of a corresponding point in the image, wherein the calibration process of the camera is realized by calculating camera parameters related in the camera imaging model and then using the camera parameters, the camera parameters related in the camera imaging model comprise an internal reference matrix and an external reference matrix, the internal reference matrix comprises the internal collection characteristic and the optical characteristic of the camera, and the external reference matrix comprises the three-dimensional direction and the position of a camera coordinate system relative to the world coordinate system.

In general, the calibration of the camera needs to use a three-dimensional image of a world coordinate system, a linear algebra method is utilized to solve the camera parameters, a checkerboard calibration method can be used, namely, the multi-view shooting of the checkerboard is utilized, the two-dimensional image is utilized to replace the three-dimensional image for calibration, the calculated amount of calibration can be reduced, and the obvious characteristic of the corner points of the checkerboard is utilized, so that the accuracy of the calibration result is greatly enhanced. In a specific checkerboard calibration process, a standard checkerboard and a proper plane can be prepared, a multi-view image is shot, checkerboard corner points are extracted from the image, five internal parameters and six external parameters under ideal conditions are estimated, a least square method is used for estimating distortion parameters under the condition that radial distortion exists actually, a maximum likelihood method is used for improving estimation accuracy, and a calibration matrix of a camera is determined by using the distortion parameters, wherein the calibration matrix comprises an internal reference matrix and an external reference matrix.

Specifically, in the calibration process of polar correction, two camera planes can be calculated to be corrected to a horizontal transformation matrix, two parallel planes are calculated to transform the center to a consistent transformation matrix, namely, the two coordinate systems are transformed together until the base line is parallel to the X axis, although the camera coordinate systems are corrected, the internal parameters of the left and right cameras are still different, the image coordinate systems are not aligned, and therefore, the internal parameters need to be changed, and the focal lengths of the left and right cameras are set to be the same.

It can be understood that after calibration of epipolar correction, the binocular camera corresponds to two camera planes to become a plane, then each row of pixels corresponds to each other, and the epipolar lines of the left and right images are parallel by transforming the left and right images obtained by shooting the binocular camera. That is, having each point in a certain row (or column) in the left image appear in the corresponding row (or column) in the right image can greatly reduce the amount of computation of stereo matching.

102. And respectively carrying out edge detection on the reference image and the matching image, and dividing the reference image and the matching image into an edge point area and a non-edge point area corresponding to the reference image and an edge point area and a non-edge point area corresponding to the matching image.

It can be appreciated that the existing stereo matching algorithm mainly comprises a global matching algorithm, a semi-global matching algorithm and a local matching algorithm. However, typical stereo matching problems exist, such as sensitivity to external environments such as illumination and noise, poor matching effect in parallax discontinuous areas, non-texture/weak texture/repeated texture areas and occlusion areas, and the like. The image shot by the binocular camera can be fused with the deformation degree of the silica gel through the visual touch sensor, so that the problem can be well solved.

103. And carrying out point-by-point matching on the edge point areas corresponding to the reference image and the matching image by using mark points which are preset on the visual touch sensor, and calculating parallax corresponding to the matching points in the edge point areas by using pixel point coordinates obtained by matching.

Considering that the edge point area is greatly influenced by noise and illumination, the existing mode is extremely easy to cause errors of discontinuous depth and abrupt change of characteristic points, and the calculation speed is low. In order to improve the matching precision of the edge point area, stereo vision information provided by a binocular camera and the deformation degree of the multi-layer soft silica gel are adopted to match pixel points one by one, so that the matching accuracy and the calculating speed are greatly improved, and the real-time requirement is met.

Specifically, a marker point preset on the visual touch sensor is divided into an edge point area, rough matching can be completed by using the marker point in the edge point area, then constraint conditions of a matching range are set, the constraint conditions of the matching range are used for defining a matching point range in the edge point area corresponding to the reference image and the matching image respectively, then a unique matching point is determined in the matching point range, and parallax corresponding to the matching point in the edge point area is calculated according to pixel point coordinates of the unique matching point.

104. And carrying out point-by-point matching on the reference image and a non-edge point area corresponding to the matched image by using a matching algorithm, and calculating parallax corresponding to the matched point in the non-edge point area by using pixel point coordinates obtained by matching.

For a non-edge point region, the matching error rate of a weak texture region and a repeated texture region is extremely high, a self-adaptive weight AD-Census algorithm is considered to be used as a cost function, a cross support self-adaptive window is used as a filtering window for cost aggregation, a WTA strategy is used for parallax selection, and a left-right consistency detection, a depth discontinuous region edge correction method and sub-pixel refinement are used for multi-process parallax optimization.

Specifically, a cost function is built for the center pixel point of an image, a first cost function is built by using the Hamming window distance of the center pixel point of the image to a transformation value, a second cost function is built by using the absolute difference of the center pixel point of the image to gray scale, then the first cost function and the second cost function are weighted and summarized based on a fusion function to obtain matching cost, a cross support self-adaptive window is built by using a pre-configured condition, cost aggregation is carried out to obtain a parallax range, a pixel point with the lowest cost value is further selected in the parallax range by adopting a WTA algorithm to serve as a pixel point obtained by matching, and parallax corresponding to a matching point in a non-edge point region is calculated according to the coordinate of the pixel point obtained by matching.

Further, in order to optimize the parallax value, the error parallax caused by noise and shielding can be removed by using the left-right consistency check, and sub-pixel refinement can be performed.

105. And calculating depth values of all matching points in the reference image and the matching image by combining the parallax corresponding to the matching points in the edge point area and the parallax corresponding to the matching points in the non-edge point area, and performing three-dimensional reconstruction on the surface of the binocular image by using the depth values of all the matching points in the reference image and the matching image.

It can be understood that the parallax and the depth are in inverse proportion, and the depth of the matching point can be calculated through the parallax, so that the three-dimensional coordinates of the matching point can be calculated, and the three-dimensional coordinates of the matching point are further used for carrying out three-dimensional reconstruction on the surface of the object obtained through shooting.

Compared with the three-dimensional reconstruction method of the visual touch sensor in the existing mode, the three-dimensional reconstruction method of the visual touch sensor is characterized in that edge detection is respectively carried out on a reference image and a matching image, the reference image and the matching image are divided into an edge point area and a non-edge point area corresponding to the reference image, the edge point area and the non-edge point area corresponding to the matching image, the point-by-point matching is further carried out on the edge point area corresponding to the reference image and the matching image by utilizing the mark points preset on the visual touch sensor, parallax corresponding to the matching points in the edge point area is calculated by utilizing pixel point coordinates obtained by matching, point-by-point matching is carried out on the non-edge point area corresponding to the reference image and the matching image by utilizing pixel point coordinates obtained by matching, the parallax corresponding to the matching points in the non-edge point area can be realized in a regional manner, the three-dimensional matching process can be realized, the image matching speed is improved, depth values of all the matching points in the reference image and the matching image are further combined, the depth values of all the matching points in the edge point area are used for carrying out on the two-by utilizing the depth values of all the matching points in the matching image, the matching points in the edge point area, the three-dimensional image is better reconstructed, and three-dimensional image reconstruction effect is better achieved.

As an implementation manner in this embodiment, specifically, in a process of calibrating a binocular camera in a visual touch sensor and processing two images captured by the binocular camera into a reference image and a matching image, as shown in fig. 2, the method may include:

201. and (3) calibrating the binocular camera in the visual touch sensor to obtain a first parameter related to the imaging process of the binocular camera, and processing two images shot by the binocular camera into a reference image and a matching image in a world coordinate system by using the first parameter.

202. And performing polar line correction calibration processing on the binocular camera in the visual touch sensor to obtain a second parameter related to the imaging process of the binocular camera, and processing two images shot by the binocular camera into a reference image and a matching image on the same observation plane by using the second parameter.

It can be understood that in the process of calibrating the binocular camera, not only the internal and external parameters of the left and right cameras in the binocular camera are required to be taken and distortion is corrected, so as to complete the calibration of the cameras, but also the positional relationship between the left and right cameras is required to be calibrated, so as to complete the calibration of polar line correction.

In the process of calibrating the camera, the camera can be calibrated for the binocular camera in the visual touch sensor, an internal reference matrix and an external reference matrix related to the imaging process of the binocular camera are obtained, corresponding relation estimation parameters between image feature points are determined by using a calibration tool, the internal reference matrix is optimized by using the relation estimation parameters, and image coordinate systems corresponding to two images shot by the binocular camera are converted into a world coordinate system by using the internal reference matrix and the external reference matrix after the optimization processing, so that a reference image and a matching image in the world coordinate system are obtained.

In a practical application scene, the camera imaging model relates to four coordinate systems, one is an image pixel coordinate system (mu, v), which is built according to an image, and a coordinate origin is generally built on the first pixel coordinate of the image; the second is an image physical coordinate system O-xy, and the origin of coordinates is generally established on the center pixel of the image; third is the camera coordinate system O _c -X _c Y _c Z _c The origin of coordinates is typically the center of the left camera; fourth is world coordinate system O _ω -X _ω Y _ω Z _ω The origin of coordinates may be any location. Let the origin of the image coordinate system lie at (mu) ₀ ，v ₀ ) On a pixel, the units are mm, dx and dy represent how many mm each column and each row represent, i.e. 1 piexl=dx mm, respectively. The mapping relationship between the coordinate systems corresponding to the specific camera imaging model is shown in fig. 3, and the transformation relationship between the pixel coordinate system and the world coordinate system can be expressed as follows:

Wherein M is an internal reference matrix, R _3×3 For rotation matrix, translation vector T _3×1 The vectors, which are the movement of the object from the world coordinate system to the camera coordinate system, are determined by the orientation and position of the camera relative to the scene, are external parameters, P also being called an extrinsic matrix.

In the embodiment of the invention, the calibration process of the camera can acquire the internal reference matrix and the distortion coefficient, distortion correction can be performed, the camera coordinate of the image is corrected through the distortion parameter by converting the pixel coordinate system of the source image into the camera coordinate system through the internal reference matrix, the camera coordinate system is converted into the image coordinate system through the internal reference matrix after correction, and the new image coordinate is assigned according to the pixel value of the source image coordinate by utilizing an interpolation mode, the whole coordinate system conversion process is as shown in fig. 4, the image coordinate is obtained through secondary conversion of the pixel coordinate in fig. 4, the camera coordinate can be obtained through perspective projection of the image coordinate, and the world coordinate can be obtained through rigid transformation of the camera coordinate.

Specifically, in the calibration process of polar line correction, polar line correction can be performed on a binocular camera in a visual touch sensor, two imaging planes obtained after the binocular camera passes through a rotation matrix are obtained, one camera in the binocular camera corresponds to the pole of the imaging plane to infinity, then scale adjustment is performed on a coordinate system where pixel points in the two imaging planes are located, so that the two imaging planes after adjustment are parallel and aligned, and two images obtained by shooting the binocular camera are processed by using adjustment scales, so that a reference image and a matching image on the same observation plane are obtained.

In the practical application scenario, the calibration process of the polar correction is shown in fig. 5, assuming that the two cameras in the binocular camera rotate around their respective optical centers, when the focal planes of the two cameras are rotated to be barely same, the original image plane R _o Conversion to R _n . First, a rotation matrix R is used _rec The left camera is rotated so that the pole of the left imaging plane is at infinity, and then the rotation matrix R is used as well _rec The right camera is rotated, R in external parameters is used for continuously rotating the right camera, and the scale of the coordinate system is adjusted, so that two-dimensional matching search of two images after adjustment is changed into one dimension, the calculated amount is saved, and false matching points are eliminated.

Further, after the limit calibration, one camera can be selected from the binocular cameras as a reference camera, for example, a left camera, an image shot by the reference camera is a reference image, an image shot by the other camera is a matching image, and the whole camera coordinate system corresponds to the coordinate system of the reference camera. Assuming that the point P is an object in space, the imaging point of the object in the two cameras is P _l And p _r The corresponding abscissa is x _l And x _r 。O _l And O _r The optical centers of the two cameras respectively, b is the distance between the two cameras, also called the baseline distance, and the difference of the horizontal coordinates of the object P at the mapping points of the two images is set as parallax d=x _l -x _r The depth value is Z, and f is the focal length of the two cameras. The specific binocular distance model is shown in fig. 6, and the following formula can be further obtained by the similar triangle theorem:

/>

as can be seen from the above formula, the parallax and the depth are in inverse proportion, when the parallax is close to 0, the small change of the parallax can cause larger depth change, so that the binocular range model has higher precision with objects with smaller distances between two cameras, and the conversion from two-dimensional to three-dimensional of the images shot by the cameras is completed through binocular image stereo matching and parallax to depth conversion.

Considering the reference image and the matching image, the reference image and the matching image after the image pretreatment can be subjected to image pretreatment, and edge detection is respectively carried out on the reference image and the matching image after the image pretreatment, wherein the image pretreatment comprises one or more of image graying treatment, image smoothing filtering and image edge feature extraction, and the image pretreatment can improve the visual effect of the image and the definition of the image on one hand and can facilitate the treatment of a computer on the other hand, so that various feature analysis is facilitated. The image edge feature extraction specifically includes: and filtering Gaussian noise in the reference image and the matched image by using a Gaussian filter, and acquiring image edge information corresponding to the reference image and the matched image in a non-maximum suppression mode.

For the image graying process, the RGB image may be converted into a gray image using a weighted average.

For image smoothing filtering, nonlinear median filtering can be utilized to carry out image smoothing, namely, the value of a point in the digital image is replaced by the median value of each point value in a neighborhood of the point, and surrounding pixel values are close to a true value, so that isolated noise points are eliminated. The flow of median filtering the smoothed image is shown in fig. 7, and the main function of the median filter is to make the pixels with larger difference from the gray values of the surrounding pixels have values close to the surrounding pixel values, so as to eliminate isolated noise points. Since it is not a simple averaging, the blurring is less generated and high frequency noise and random noise in the image are filtered out.

For image edge feature extraction, a Canny operator can be utilized to perform edge extraction, the idea is that a proper Gauss filter is selected for an image to filter Gaussian noise, then the obtained image edge information is processed through a non-maximum suppression means, and finally double-threshold detection is applied to determine real and potential edges, and the method comprises the following specific implementation steps:

first, the image is smoothed using a gaussian filter, filtering out gaussian noise, expressed as follows:

Where σ is the standard deviation, G (x, y) is a gaussian function, and (x, y) is the pixel coordinates of the image.

Then, the intensity and direction of the gradient are calculated by utilizing the finite difference of the first-order partial derivative, and the used first-order difference convolution template is as follows:

where f (x, y) represents the gray value of the image,

the convolution operation is represented by:

wherein phi (x, y), theta _φ Respectively its magnitude and its corresponding direction.

Then, non-maximum suppression of the gradient amplitude is applied, after the global gradient is obtained, edge information cannot be completely determined, which is the maximum point of the local gradient needs to be reserved, and meanwhile, the non-maximum value is suppressed, namely, the point of the non-local maximum value is zeroed to obtain finer edge information, namely, the gradient intensity of the current pixel is compared with two pixels along the positive and negative gradient directions, if the gradient intensity of the current pixel is the maximum compared with the other two pixels, the pixel point is reserved as the edge point, and otherwise, the pixel point is suppressed.

Finally, after non-maximum suppression, the remaining pixels can represent the actual edges in the image more accurately, however, there are still some edge pixels due to noise and color variations, in order to solve these spurious responses, the edge pixels must be filtered with weak gradient values, and edge pixels with high gradient values remain, which can be done by selecting a high and low threshold T ₁ 、T ₂ Realize, T ₁ <T ₂ The choice of its value depends on the content of the input image, phi representing the gradient value of the edge pixels, let:

so that the edges are as closed as possible.

As an implementation manner in this embodiment, specifically, in a process of performing point-by-point matching on edge point areas corresponding to the reference image and the matching image by using mark points set on the visual touch sensor in advance, and calculating parallax corresponding to the matching points in the edge point areas by using pixel point coordinates obtained by matching, as shown in fig. 8, the method may include:

301. and acquiring the mark points distributed on the edge point area by using the mark points preset on the visual touch sensor, and matching the mark points distributed on the edge point area according to the coordinate sequence, so that the mark points on the reference image and the matched image are pixel points with the same sequence.

302. And using constraint conditions of pre-setting a matching range, and respectively defining the matching range corresponding to the marking point in the edge point areas corresponding to the reference image and the matching image.

303. And carrying out mutual matching operation on each pixel point in the reference image and the matched image by utilizing the matching range corresponding to the mark point, and determining the unique matching point in the reference image and the matched image.

304. And calculating parallax corresponding to the matching points in the edge point area according to the pixel point coordinates corresponding to the unique matching points in the reference image and the matching image.

The constraint condition at least comprises that the abscissa of the pixel corresponding to the matching point is in a preset parallax range, the ordinate of the pixel corresponding to the matching point is in a preset limit bandwidth range, the similar gray scale difference intensity value of the pixel corresponding to the matching point is in a first similarity threshold value and the similar gray scale gradient direction value of the pixel corresponding to the matching point is in a second similarity threshold value, the first similarity threshold value is a threshold value for measuring the similarity degree of the gray scale difference intensity values corresponding to the two pixels, the second similarity threshold value is a threshold value for measuring the similarity degree of the gray scale gradient direction values corresponding to the two pixels, and the two threshold values can be selected according to practical conditions.

It can be appreciated that, in order to facilitate matching of the pixels in the reference image and the matching image, a fibonacci grid can be used to select a preset number of grid points on the multi-layer soft silica gel in the visual touch sensor, the preset number of grid points are used as marking points set on the visual touch sensor, the optical centers corresponding to the reference image and the matching image are used as origins, and an initial depth corresponding to the marking points when the multi-layer soft silica gel is not deformed is calculated in combination with design parameters given by the visual touch sensor. Specifically, N uniformly distributed marking points are arranged on the multi-layer soft silica gel by using a grid, the number of N can be set according to actual requirements, the optical centers of the two cameras are respectively used as origins, and the number of N is calculated according to design parameters of the visual touch sensor The initial depth of the multi-layer soft silica gel without deformation is calculated,

and (3) finishing program initialization, then respectively carrying out Canny edge detection on a reference image and a matching image which are shot by the binocular camera, respectively dividing the reference image and the matching image into an edge point area and a non-edge point area, finally adopting an optimization principle of a mathematical method to match the edge point area, and adopting self-adaptive weight AD+census as a cost function to match the non-edge point area.

In the process of matching the corresponding pixel points, the number of the marked points falling in the edge area can be set to be M, the number of the marked points is smaller than the number of the marked points, the marked points are matched in a one-to-one correspondence manner, then the edge points can be matched according to the coordinate sequence consistency constraint of the M points, namely the sequence of the pixel points in the reference image is the same, and the pixel points in the matched image are also in the same sequence, so that the search range is greatly reduced.

In the process of setting the constraint condition of the matching range, the pixel coordinate of a certain edge point in the reference image can be set as (mu, v), and then the pixel coordinate of the candidate matching edge point in the matching image is set as (mu) _n ^′ , _n ^’ ) N=1, 2, …, satisfies the following constraint of horizontal parallax and epipolar bandwidth:

{(μ _n ^′ , _n ^’ )|μ-H _max ≤μ _n ^′ ≤μ＝ _max ,-V _max ≤v _n ^’ ≤v+V _max }

Wherein H is _max Is the maximum horizontal observation, V _max The polar band width is determined by the pitch of the mark points.

The gray level difference intensity value of the similarity between the edge pixels corresponding to the reference image and the matching image is as follows:

|f ^l (μ,v)- ^r (μ _n ^′ , _n ^’ )|<ε

wherein f ^l (mu, v) and f ^r (μ _n ^′ , _n ^’ ) The gray level difference intensity values of the edge points in the reference image and the matching image are threshold values for measuring the similarity degree of the gray level difference intensity values of the reference image and the matching image.

The gray gradient direction values of the similarity of the edge pixels corresponding to the reference image and the matching image are as follows:

|θ ^l (μ,v)- ^r (μ _n ^′ , _n ^’ )|<ε

wherein θ ^l (mu, v) and theta ^r (μ _n ^′ , _n ^’ ) The gradient direction values of the edge points in the reference image and the matching image are threshold values for measuring the similarity degree of the gradient direction values of the reference image and the matching image.

Specifically, a constraint condition of a pre-established matching range is used, in the process of respectively defining the matching ranges corresponding to the pixel points in the edge point areas corresponding to the reference image and the matching image, the matching range corresponding to the pixel points can be utilized to perform mutual matching operation on each pixel point in the reference image and the matching image, a matching point set is respectively acquired in the reference image and the matching image, secondary matching is performed on the matching point set acquired in the reference image and the matching image through a small neighborhood continuity constraint, and unique matching points in the reference image and the matching image are determined, wherein the mutual matching operation specifically comprises: and defining a first matching area in the matching image by utilizing a matching range corresponding to the pixel points for each pixel point in the reference image, acquiring each pixel point in the first matching area, and defining a second matching area in the reference image by utilizing the matching range corresponding to the pixel point for each pixel point in the matching area, and acquiring each pixel point in the second matching area.

When the constraint condition of the matching range is established, the constraint condition of the horizontal parallax and the polar line bandwidth, the threshold value of the similarity degree of the gray level difference intensity value corresponding to the reference image and the matching image and the similarity distance of the gradient direction value corresponding to the reference image and the matching image can be usedRegion planning screening is carried out by combining the threshold value of the degree, each edge point (x, y) in the reference image is searched, and the matching set of candidate corresponding edges meeting the constraint condition is determined as S in the matching image _l ＝(μ _m ′,v _m '), m=1, 2, … }, and further using a small neighborhood continuous constraint to perform matching, wherein for a pair of matched marker points, the reference image is denoted as a, and the matched image is denoted as a'. U (U) _A Selecting an edge point a E U for a neighborhood taking A as a center _A The matching set is S _l ^a ＝(μ _n ′,v _n '), n=1, 2, … }, and selecting an edge point pixel b epsilon S of the matched image _l ^a The matching set is S _r ^b ＝(μ _m ′,v _m ’),m＝1,2,…}。

Defining successive operators is:

wherein d _AA′ Representing the parallax of A and A', d _ab Representing the parallaxes of a and b, T=1 when the parallaxes of the pixels adjacent to the matched pixel pair keep continuity with the parallaxes thereof according to the continuity constraint, and a unique matching point is corresponding at the moment, otherwise T<1。

The objective function may be further defined as follows:

Namely searching (c, d) in the matched image by using an objective function to be a matching point uniquely corresponding to (a, b) in the matched image, thereby completing the matching process of the edge area.

In this embodiment, specifically, in a process of performing point-by-point matching on the reference image and the non-edge point area corresponding to the matching image by using a matching algorithm, and calculating the parallax corresponding to the matching point in the non-edge point area by using the pixel point coordinates obtained by the matching, as shown in fig. 9, the method may include:

401. and constructing a matching cost function corresponding to the reference image and the matching image aiming at the two center pixel points of the reference image and the matching image.

402. And carrying out cost aggregation on the matching cost functions corresponding to the reference image and the matching image by utilizing a pre-constructed cross support self-adaptive window, determining a parallax range formed by the reference image and the matching image, and selecting parallax corresponding to a central pixel point with the lowest cost value in the parallax range as parallax corresponding to a matching point in a non-edge point area.

The matching cost functions corresponding to the reference image and the matching image comprise a first matching cost function and a second matching cost function, specifically, the first matching cost function and the second matching cost function of the two center pixel points when the parallax is preset can be respectively defined by utilizing a self-adaptive weight method aiming at the two center pixel points of the reference image and the matching image, the first matching cost function is the hamming distance of the conversion codes corresponding to the two center pixel points, the second matching cost function is the absolute difference of gray values corresponding to the two center pixel points, and the matching cost functions corresponding to the reference image and the matching image are constructed according to the first matching cost function and the second matching cost function of the two center pixel points when the parallax is preset.

For a center point pixel (u, v), a neighborhood window with a size of m×n is selected (m, n are all odd numbers), and the Census conversion code construction process is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing successive bit concatenation operations; m 'and n' are maximum integers no greater than half of m and n, respectively; i (u, v) represents the neighborhood window center pixel gray value; i (u+i, v+j) represents the pixel gray value in the neighborhood window; ζ (x, y) represents the magnitude relation of the gray values of the two pixels; the calculation method is as follows:

further defining a first matching cost function as the hamming distance C of Census transformed values of two pixels corresponding to the reference image and the matching image _cen (u,v,d)＝aming(C _cenl (u,v), _cenr (u-d, v)) and taking the hamming distance of the pixel point and the Census change code of the pixel point as an initial matching cost, C _cen (u, v, d) represents the matching cost value of the center pixel at parallax d, wherein C _cenl (u, v) Census transform code representing the center pixel point (u, v) of the reference image, C _cenr (u-d, v) represents the Census-transformed code matching the center pixel (u-d, v) of the image.

Further defining a second matching cost function as the absolute difference C of the gray scale _AD (u,v,d)＝|I _L (u,v)-I _R (u-d, v) I, wherein I _L (u, v) represents the pixel gray value of the reference image (u, v) at the parallax d, I _R (u-d, v) represents a pixel gradation value matching the parallax of the image d.

Then define a fusion function:

fusing the first matching cost function and the second matching cost function through a fusion function, wherein C is the matching cost value, lambda is the control parameter, and the specific calculation method of C is as follows:

C(u,v,d)＝(C _AD (u,v,d)，λ _AD )+(C _cen (u,v,d)，λ _cen )

wherein lambda is _AD And lambda (lambda) _cen Respectively as cost function C _AD (u, v, d) and cost function C _cen Control parameters of (u, v, d).

It can be understood that Census can better handle the situation that the overall brightness of the reference image is inconsistent with that of the matching image, and the window-based mode is more robust to image noise, is also more robust to weak textures, and can reflect real correlation relatively more, but has the disadvantage that repeated textures can be ambiguous. Whereas the mode based on single pixel luminance differences can alleviate the ambiguity problem of repeated textures to some extent. However, the characteristics of each image are different, and the above-mentioned calculation method of C fixes the weights of Census algorithm and the algorithm based on single-pixel brightness difference, so considering that the weights are set for the two algorithms, the above-mentioned calculation method of C can be adjusted as follows:

C(u,v,d)＝αρ(C _AD (u,v,d)，λ _AD )+(1-α)ρ(C _cen (u,v,d)，λ _cen )

when the pixel points are in the image smoothing area, lower weight is given to alpha, so that the Census transformation has larger influence on the area matching cost calculation, and when the image textures are rich, namely the image edge area, higher weight is given to alpha, so that the influence on the area matching cost calculation based on a single-pixel brightness difference algorithm is larger.

Further specifically, the following conditions can be used to construct a cross-support adaptive window to aggregate the cost.

Wherein D (p, p _i ) Representing pixel points p (u, v) and p _i (u _i ,v _i ) Is τ ₂ <τ ₁ Is threshold, l ₂ <l ₁ Arm length was determined for the experiment.

And finally, selecting the pixel point with the lowest cost value in the parallax range by adopting a WTA algorithm as the best matching point in the non-edge area, and calculating the parallax corresponding to the best matching point.

It will be appreciated that, to ensure parallax accuracy, left-right consistency checks may also be used to reject false parallaxes due to noise and occlusion and perform sub-pixel refinement.

In an actual application scene, fig. 10 shows a flow diagram of another three-dimensional reconstruction method of a visual touch sensor, in the overall architecture, the visual touch sensor is designed first, a binocular camera is calibrated, including camera calibration and epipolar correction, then an image obtained by shooting is preprocessed, including graying, median filtering smoothing image and Canny algorithm, edge detection, and then binocular stereo matching is performed on the image obtained by shooting the binocular camera, on one hand, an optimized matching edge point area is utilized, on the other hand, a non-edge area is matched, and specifically, an adaptive weight AD-Census algorithm is combined as a cost function, a cross support adaptive window is utilized to perform cost aggregation, a WTA strategy is selected to calculate parallax and parallax optimization, parallax corresponding to matching points in the image is obtained, and three-dimensional reconstruction is performed by using parallax corresponding to the matching points in the image.

Further, based on the same inventive concept, as shown in fig. 11, the embodiment of the present application further provides a visual touch sensor, specifically a visual touch sensor based on a binocular camera, where the visual touch sensor based on the binocular camera includes: the system comprises a plurality of layers of soft silica gel, a binocular camera, a supporting body and a light source, wherein the binocular camera is two RGB cameras and is used for acquiring reference image pixel points and corresponding pixel points in a matched image; the module structure corresponding to the visual touch sensor is in a fingertip shape and consists of a hemisphere and a cylinder, the module structure is used for calculating parallax between binocular corresponding pixel points obtained through shooting by a binocular camera, and the three-dimensional reconstruction method of the visual touch sensor is carried out on the surface of a multi-curvature object by utilizing the parallax.

Further, based on the same inventive concept, the embodiment of the present application further provides a three-dimensional reconstruction system of a visual touch sensor, where the visual touch sensor captures an image using a binocular camera, as shown in fig. 12, the system includes: the calibration unit 51, the dividing unit 52, the first matching unit 53, the second matching unit 54, and the reconstruction unit 55.

A calibration unit 51, configured to perform calibration processing on the binocular camera in the visual touch sensor, and process two images captured by the binocular camera into a reference image and a matching image;

a dividing unit 52, configured to perform edge detection on the reference image and the matching image, and divide the reference image and the matching image into an edge point area and a non-edge point area corresponding to the reference image, and an edge point area and a non-edge point area corresponding to the matching image;

a first matching unit 53, configured to perform point-by-point matching on the edge point area corresponding to the reference image and the matching image by using a mark point set on the visual touch sensor in advance, and calculate parallax corresponding to a matching point in the edge point area by using pixel point coordinates obtained by the matching;

the second matching unit 54 is configured to perform point-by-point matching on the reference image and a non-edge point area corresponding to the matching image by using a matching algorithm, and calculate parallax corresponding to a matching point in the non-edge point area by using pixel point coordinates obtained by matching;

And a reconstruction unit 55, configured to calculate depth values of all matching points in the reference image and the matching image in combination with the parallax corresponding to the matching point in the edge point area and the parallax corresponding to the matching point in the non-edge point area, and perform three-dimensional reconstruction on the surface of the binocular image by using the depth values of all matching points in the reference image and the matching image.

In a specific application scenario, the calibration unit 51 includes:

the first calibration module is used for performing camera calibration processing on the binocular camera in the visual touch sensor to obtain a first parameter related to the imaging process of the binocular camera, and processing two images shot by the binocular camera into a reference image and a matching image in a world coordinate system by using the first parameter;

and the second calibration module is used for carrying out polar line correction calibration processing on the binocular camera in the visual touch sensor to obtain a second parameter related to the imaging process of the binocular camera, and using the second parameter to process two images shot by the binocular camera into a reference image and a matching image on the same observation plane.

In a specific application scene, the first calibration module is specifically configured to perform calibration processing on a binocular camera in the visual touch sensor to obtain an internal reference matrix and an external reference matrix related to an imaging process of the binocular camera; determining corresponding relation estimation parameters among the image feature points by using a calibration tool, and optimizing the internal reference matrix by using the relation estimation parameters; and converting an image coordinate system corresponding to the two images shot by the binocular camera into a world coordinate system by using the optimized internal reference matrix and the external reference matrix to obtain a reference image and a matching image in the world coordinate system.

In a specific application scene, the second calibration module is specifically configured to perform polar line correction calibration processing on a binocular camera in the visual touch sensor, and obtain two imaging planes obtained after the binocular camera passes through a rotation matrix, where one camera in the binocular camera corresponds to a pole of the imaging plane to infinity; performing scale adjustment on a coordinate system where pixel points in the two imaging planes are located, so that the two adjusted imaging planes are parallel and aligned; and processing the two images shot by the binocular camera by using the adjustment scale to obtain a reference image and a matching image on the same observation plane.

In a specific application scenario, the apparatus further includes:

the preprocessing unit is used for performing calibration processing on the binocular camera in the visual touch sensor, processing two images shot by the binocular camera into a reference image and a matched image, performing image preprocessing on the reference image and the matched image, and performing edge detection on the reference image and the matched image after the image preprocessing;

the image preprocessing comprises one or more of image graying processing, image smoothing filtering and image edge feature extraction;

The image edge feature extraction specifically comprises the following steps: and filtering Gaussian noise in the reference image and the matched image by using a Gaussian filter, and acquiring image edge information corresponding to the reference image and the matched image in a non-maximum suppression mode.

In a specific application scenario, the apparatus further includes:

the setting unit is used for selecting a preset number of grid points on the visual touch sensor before the edge detection is respectively carried out on the reference image and the matching image and is divided into an edge point area and a non-edge point area corresponding to the reference image and an edge point area and a non-edge point area corresponding to the matching image, using the preset number of grid points as marking points set on the visual touch sensor, taking optical centers corresponding to the reference image and the matching image as original points, and calculating initial depth corresponding to the marking points when the multi-layer soft silica gel is not deformed by combining design parameters given by the visual touch sensor.

In a specific application scenario, the first matching unit 53 includes:

the acquisition module is used for acquiring the mark points distributed on the edge point area by using the mark points preset on the visual touch sensor, and matching the mark points distributed on the edge point area according to the coordinate sequence so as to enable the mark points on the reference image and the matched image to be pixel points with the same sequence;

The defining module is used for defining a matching range corresponding to a pixel point in an edge point area corresponding to the reference image and the matching image respectively by using a constraint condition of pre-setting the matching range, wherein the constraint condition at least comprises that the abscissa of the pixel corresponding to the matching point is in a preset parallax range, the ordinate of the pixel corresponding to the matching point is in a preset limit bandwidth range, the similar gray level differential intensity value of the pixel corresponding to the matching point is in a first similarity threshold value and the similar gray level gradient direction value of the pixel corresponding to the matching point is in a second similarity threshold value;

the matching module is used for carrying out mutual matching operation on each pixel point in the reference image and the matching image by utilizing the matching range corresponding to the pixel point, and determining a unique matching point in the reference image and the matching image;

and the calculating module is used for calculating parallax corresponding to the matching points in the edge point area according to the pixel point coordinates corresponding to the unique matching points in the reference image and the matching image.

In a specific application scenario, the matching module is specifically configured to perform a mutual matching operation on each pixel point in the reference image and the matching image by using a matching range corresponding to the pixel point, and obtain a matching point set in the reference image and the matching image respectively; performing secondary matching on the reference image and the matching point set acquired from the matching image through small neighborhood continuity constraint, and determining unique matching points in the reference image and the matching image; the mutual matching operation specifically comprises the following steps: for each pixel point in the reference image, a first matching area is defined in the matching image by utilizing a matching range corresponding to the pixel point, and each pixel point in the first matching area is obtained; and defining a second matching area in the reference image by utilizing a matching range corresponding to the pixel points aiming at each pixel point in the matching area, and acquiring each pixel point in the second matching area.

In a specific application scenario, the second matching unit 54 includes:

the construction module is used for constructing a matching cost function corresponding to the reference image and the matching image aiming at the two center pixel points of the reference image and the matching image;

the determining module is used for carrying out cost aggregation on the matching cost functions corresponding to the reference image and the matching image, determining a parallax range formed by the reference image and the matching image, and selecting the parallax corresponding to the central pixel point with the lowest cost value in the parallax range as the parallax corresponding to the matching point in the non-edge point area.

In a specific application scenario, the construction module is specifically configured to define a first matching cost function and a second matching cost function of the two center pixel points when parallax is preset by using an adaptive weight method for the two center pixel points of the reference image and the matching image, where the first matching cost function is a hamming distance of a conversion code corresponding to the two center pixel points, and the second matching cost function is an absolute difference of gray values corresponding to the two center pixel points; and constructing matching cost functions corresponding to the reference image and the matching image according to the first matching cost function and the second matching cost function of the two central pixel points when parallax is preset.

Based on the same inventive concept, the embodiments of the present application further provide a computer readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the three-dimensional reconstruction method of the visual tactile sensor of any one of the embodiments described above when running.

It will be clear to those skilled in the art that the specific working processes of the above-described systems, devices and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein for brevity.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some or all technical features may be replaced equally within the spirit and principles of the present application; such modifications and substitutions do not depart from the scope of the present application.

Claims

1. A method of three-dimensional reconstruction of a visual tactile sensor, wherein the visual tactile sensor captures images using a binocular camera, the method comprising:

2. The method according to claim 1, wherein the calibrating the binocular camera in the visual sensor, and processing the two images captured by the binocular camera into the reference image and the matching image, includes:

performing camera calibration processing on a binocular camera in a visual touch sensor to obtain a first parameter related to the imaging process of the binocular camera, and processing two images shot by the binocular camera into a reference image and a matching image in a world coordinate system by using the first parameter;

and performing polar line correction calibration processing on the binocular camera in the visual touch sensor to obtain a second parameter related to the imaging process of the binocular camera, and processing two images shot by the binocular camera into a reference image and a matching image on the same observation plane by using the second parameter.

3. The method according to claim 2, wherein the performing the calibration processing of the binocular camera in the visual tactile sensor to obtain a first parameter related to the imaging process of the binocular camera, and using the first parameter to process two images obtained by shooting the binocular camera into a reference image and a matching image in a world coordinate system includes:

Calibrating a camera of a binocular camera in the visual touch sensor to obtain an internal reference matrix and an external reference matrix related to the imaging process of the binocular camera;

determining corresponding relation estimation parameters among the image feature points by using a calibration tool, and optimizing the internal reference matrix by using the relation estimation parameters;

and converting an image coordinate system corresponding to the two images shot by the binocular camera into a world coordinate system by using the optimized internal reference matrix and the external reference matrix to obtain a reference image and a matching image in the world coordinate system.

4. The method according to claim 2, wherein the calibrating the binocular camera in the visual sensor by performing polar correction to obtain a second parameter related to the imaging process of the binocular camera, and processing two images obtained by capturing the binocular camera into a reference image and a matching image on the same viewing plane by using the second parameter includes:

performing polar line correction calibration treatment on a binocular camera in a visual touch sensor, and obtaining two imaging planes obtained by the binocular camera after a rotation matrix, wherein one camera in the binocular camera corresponds to the pole of the imaging plane to infinity;

Performing scale adjustment on a coordinate system where pixel points in the two imaging planes are located, so that the two adjusted imaging planes are parallel and aligned;

and processing the two images shot by the binocular camera by using the adjustment scale to obtain a reference image and a matching image on the same observation plane.

5. The method according to claim 1, wherein after the calibration process of the binocular camera in the visual tactile sensor, the two images captured by the binocular camera are processed into the reference image and the matching image, the method further comprises:

and carrying out image preprocessing on the reference image and the matched image, and respectively carrying out edge detection on the reference image and the matched image after the image preprocessing.

6. The method according to claim 1, wherein before the edge detection is performed on the reference image and the matching image, respectively, the edge detection is divided into an edge point area and a non-edge point area corresponding to the reference image, and an edge point area and a non-edge point area corresponding to the matching image, the method further comprises:

selecting a preset number of grid points on the visual touch sensor, using the preset number of grid points as marking points arranged on the visual touch sensor, taking optical centers corresponding to the reference image and the matched image as original points, and calculating initial depth corresponding to the marking points when the multi-layer soft silica gel is not deformed by combining design parameters given by the visual touch sensor.

7. The method according to claim 1, wherein the performing point-by-point matching on the edge point areas corresponding to the reference image and the matching image by using the mark points set on the visual touch sensor in advance, and calculating the parallax corresponding to the matching points in the edge point areas by using the pixel point coordinates obtained by the matching, includes:

obtaining mark points distributed on an edge point area by using mark points preset on a visual touch sensor, and matching the mark points distributed on the edge point area according to a coordinate sequence, so that the mark points on the reference image and the matched image are pixel points with the same sequence;

using constraint conditions of pre-setting a matching range, and respectively defining the matching range corresponding to the pixel points in the edge point areas corresponding to the reference image and the matching image;

performing mutual matching operation on each pixel point in the reference image and the matched image by utilizing the matching range corresponding to the pixel point, and determining a unique matching point in the reference image and the matched image;

and calculating parallax corresponding to the matching points in the edge point area according to the pixel point coordinates corresponding to the unique matching points in the reference image and the matching image.

8. The method of claim 7, wherein the performing a mutual matching operation on each pixel point in the reference image and the matching image using the matching range corresponding to the pixel point, and determining a unique matching point in the reference image and the matching image comprises:

performing mutual matching operation on each pixel point in the reference image and the matched image by utilizing the matching range corresponding to the pixel point, and respectively acquiring a matched point set in the reference image and the matched image;

and carrying out secondary matching on the matching point set obtained in the reference image and the matching image, and determining unique matching points in the reference image and the matching image.

9. The method according to claim 1, wherein the performing point-by-point matching on the reference image and the non-edge point region corresponding to the matched image by using a matching algorithm, and calculating the parallax corresponding to the matched point in the non-edge point region by using the pixel point coordinates obtained by the matching, includes:

constructing a matching cost function corresponding to the reference image and the matching image aiming at two center pixel points of the reference image and the matching image;

and carrying out cost aggregation on the matching cost functions corresponding to the reference image and the matching image, determining a parallax range formed by the reference image and the matching image, and selecting parallax corresponding to a central pixel point with the lowest cost value in the parallax range as parallax corresponding to a matching point in a non-edge point region.

10. The method according to claim 1, wherein constructing a matching cost function corresponding to the reference image and the matching image for two center pixels of the reference image and the matching image comprises:

respectively defining a first matching cost function and a second matching cost function of the two central pixel points when parallax is preset aiming at the two central pixel points of the reference image and the matching image;

and constructing matching cost functions corresponding to the reference image and the matching image according to the first matching cost function and the second matching cost function of the two central pixel points when parallax is preset.

11. A three-dimensional reconstruction system for a visual tactile sensor that captures images using a binocular camera, the system comprising:

12. A visual tactile sensor comprising: the system comprises a plurality of layers of soft silica gel, a binocular camera, a supporting body and a light source, wherein the binocular camera is two RGB cameras and is used for acquiring reference image pixel points and corresponding pixel points in a matched image;

the module structure corresponding to the visual touch sensor is in a fingertip shape and consists of a hemisphere and a cylinder, and is used for calculating parallax between binocular corresponding pixels obtained by shooting by a binocular camera, and executing the three-dimensional reconstruction method of the visual touch sensor according to any one of the claims 1-8 on the surface of the multi-curvature object by utilizing the parallax.

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the three-dimensional reconstruction method of a visual tactile sensor according to any one of claims 1 to 10.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the three-dimensional reconstruction method of a visual tactile sensor according to any one of claims 1 to 10.