CN117953229A

CN117953229A - Binocular image feature extraction, matching and depth solving hardware acceleration method and system based on FPGA

Info

Publication number: CN117953229A
Application number: CN202410173554.5A
Authority: CN
Inventors: 王栋; 翁睿; 张立宪; 陈超级; 梁鲁; 杨嘉楠; 朱益民
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2024-02-07
Filing date: 2024-02-07
Publication date: 2024-04-30

Abstract

The invention provides a binocular image feature extraction, matching and depth calculation hardware acceleration method and system based on an FPGA, and belongs to the technical field of image processing hardware acceleration. The method aims to solve the problems that the traditional algorithm relies on CPU computing power and memory read-write operation is frequently performed, so that CPU resources are excessively occupied, and the real-time performance of drawing is poor. The invention utilizes the characteristic that the FPGA is suitable for a pipeline algorithm with a certain depth to process the image data from the binocular camera in real time and synchronously, greatly improves the performance of the SLAM algorithm, and particularly solves the problems that the front-end feature extraction, detection and stereoscopic vision depth calculation in the SLAM algorithm occupy excessive CPU resources, thereby causing the reduction of the comprehensive performance of the algorithm.

Description

Binocular image feature extraction, matching and depth solving hardware acceleration method and system based on FPGA

Technical Field

The invention relates to the technical field of image processing hardware acceleration, in particular to a hardware acceleration method and system for binocular image feature extraction, matching and depth calculation based on an FPGA.

Background

Image features are a digital representation of image information, which is a particular place in an image, including corner points, edges, and blocks of the image. In contrast, the corners in the image are more distinguishable than the edges and the blocks, so that an intuitive way is to identify the corners in different images and confirm their correspondence. In this type of method, corner points of an image are typically taken as features of the image. The corner extraction algorithms include Harris corner, FAST corner, GFTT corner, BRISK corner, etc. The image features can be divided into key points and descriptors, the corner points of one image are extracted in most applications to match with the corner points of other images, and the corner point extraction algorithm can only confirm which points are the corner points of the image in one image, and further description is not carried out on the corner points, so that the corner points extracted by the method can only serve as the key points of the image and cannot be used for further feature matching. Therefore, descriptors can be generated on the basis of a corner extraction algorithm through SIFT, SURF, ORB and other algorithms, so that complete image features can be extracted and subsequent feature matching can be performed.

The image feature matching is to search the closest corresponding relation in the two groups of image feature points on the basis of extracting the image features. The simplest feature matching method is Brute-force matching (Brute-Force Matcher), i.e., measuring the distance between each feature point x _t ^m and all x _t ⁿ ₊₁ descriptors, then ordering and taking the nearest one as the matching point. The distance between descriptors represents the similarity between two feature points, and different measurement norms can be selected in the actual process. Whereas for floating point type descriptors, the metric is usually performed using the euclidean distance |x _t ^m-x_t ⁿ ₊₁ |, for binary descriptors, the hamming distance (HAMMING DISTANCE) is often used as the metric, i.e. the number of different digits between two binary groups.

In addition, cameras are used to map coordinate points in a three-dimensional world into a two-dimensional image plane, thereby providing spatial information to a computer. Common camera models can be classified into a monocular camera model, a binocular camera model, and an RGB-D camera model, but the binocular camera model has advantages that cannot be achieved by the monocular camera model and the RGB-D camera model. The monocular camera model cannot provide a pixel specific spatial location because all points on the line from the camera optical center to the normalization plane can be projected to the same pixel. The RGB-D camera model actively measures the depth of each pixel by emitting a beam of light toward a detection target, but such an emission-reception measurement method is easily interfered with by sunlight or infrared light emitted from other sensors, and thus is not suitable for outdoor use. The binocular camera model can estimate the depth of each pixel by synchronously capturing images of the left and right cameras and calculating the parallax between the images. The calculation formula of parallax is also very simple, but the difficulty is how to determine the correspondence between the left eye image pixels and the right eye image pixels. In the algorithm for measuring the distance of the image features, the corresponding relation of the binocular image features can be obtained by using an image feature matching algorithm, and the depth of the feature points can be calculated by using parallax on the basis.

The FPGA (field programmable gate circuit, fieldProgrammable GATE ARRAY) is a product developed further on the basis of programmable devices such as programmable array logic, general array logic and the like, and has the characteristics of being customizable and reconfigurable. In the application of high-performance calculation, the FPGA is suitable for a pipeline algorithm with a certain depth, data are transmitted in each calculation unit of the FPGA according to a certain sequence, then calculation results are naturally obtained, and each calculation module of the FPGA simultaneously and independently operates so as to realize parallel calculation.

Feature extraction, matching and measurement of binocular images are often used in SLAM (instant localization and mapping) as part of front-end visual odometry to provide data for subsequent optimization algorithms. Because the partial algorithm involves all pixels of the image, running the related algorithm on the CPU requires waiting to receive a complete frame of image data and traversing the image multiple times, which will occupy a significant amount of CPU operation time, thereby reducing the efficiency and accuracy of the overall SLAM algorithm operation.

Disclosure of Invention

The invention aims to solve the technical problems that:

the method aims to solve the problems that the traditional algorithm relies on CPU computing power and memory read-write operation is frequently performed, so that CPU resources are excessively occupied, and the real-time performance of drawing is poor.

The invention adopts the technical scheme for solving the technical problems:

The invention provides a binocular image feature extraction, matching and depth calculation hardware acceleration method based on an FPGA, which comprises the following steps:

S1, converting gray scales, namely converting RGB format data into YCbCr format data after the FPGA receives the RGB format data sent by the binocular camera, and extracting Y which is taken as brightness information of pixels;

S2, line caching, namely obtaining a plurality of first-in first-out queues for subsequent feature extraction through the line caching of the brightness information obtained in the step S1;

s3, FAST corner detection, namely judging whether the pixel is a corner or not through a FAST corner detection method by using the image window intercepted in the data caching mode in the S2;

s4, calculating a Harris corner response value, calculating a pixel gradient by adopting a Sobel operator on the corner detected in the S3, and calculating the Harris corner response value by combining a Gaussian smoothing window;

S5, ORB descriptor extraction, namely respectively comparing brightness values of the preset 256 pairs of point pairs meeting Gaussian distribution of the feature points reserved in the S4, and obtaining corresponding 256-bit descriptors of the feature points;

S6, caching the characteristic points, storing the extracted characteristic points during the transmission process of each frame of image, and carrying out sequencing management on a cache queue through a hardware insertion sequencing method based on a mask to obtain a final cache group;

And S7, matching the characteristic points, unfolding violence ordering for the left and right eye image characteristics after the transmission of each frame of image is finished, storing the matching result through hash coding, and finally obtaining the left and right eye x-axis coordinates, the base line and the focal length of the matched characteristic points and calculating the depth value of the corresponding characteristic points according to the left and right eye x-axis coordinates, the base line and the focal length.

Further, in step S1, the conversion formula for converting the RGB format into the YCbCr format is,

Wherein R' and G, B respectively represent the red, green and blue chromaticity values of the pixel.

Further, in step S2, when performing line buffering, let the width of each frame image be W, the height be H, the bit width of each pixel be 8, and m-1 line buffers be provided, each line buffer is a first-in-first-out module, that is, the size of each first-in-first-out module is w×8;

By means of line buffering, each clock simultaneously generates pixel data of m continuous lines and in the same column, and by means of buffering m clocks, an image window with m multiplied by m size can be obtained.

Further, in step S3, a 7×7 image window is truncated by means of data buffering during FAST corner detection.

Further, in step S4, the Harris corner response value R is calculated as follows,

R＝det(M)-k·trace(M)²

Where det (M) is the determinant of matrix M, k is the empirical coefficient, trace (M) is the trace of matrix M, w (x, y) is the two-dimensional Gaussian window coefficient, and I _x and I _y are the lateral and longitudinal gradient values at (x, y), respectively, which can be obtained by Sobel operator convolution.

Further, in step S5, when the ORB description is extracted, the center of gravity of the window needs to be calculated for each window first, the main direction of the window is redetermined, after rotation, the relationship between the new pixel point and the original pixel point is as follows,

Wherein θ is the rotation angle.

Further, in step S6, when the feature points are cached, it is necessary to sort the Harris corner response values of each extracted feature point and determine whether to store, and if a new feature point information enters the feature point caching module at a rising edge of a clock, determine the position of the feature point in the array according to the Harris corner response values of the feature point, and determine whether the feature point is discarded, including,

(1) For each received new feature point, comparing the response value R _cur with the response values (R ₀,R₁,...,R₉₉) of all feature points in the array through operators >, <=, and storing the results in two binary arrays (compare _high[99:0],compare_low [99:0 ]) with the length of 100 bits respectively;

(2) Judging whether the response value of each received new feature point is within the range of R _cur＞R₀&R_cur＜R₁, storing the comparison result in compare _insert [0], and storing the result within the range of R _cur＞R_n&R_cur＜R_n+1 in compare _insert [ n ] one by one to obtain a binary number set compare _insert [99:0];

(3) Expanding each bit in the compare _high、compare_low、compare_insert to 352 bits to form a mask with the same bit number as the cache array; repeating the new feature point for 100 times P _cur-extent to make the new feature point have the same bit number as the buffer array;

(4) The new cache array is calculated according to the following formula:

further, in step S7, after one frame of data from the binocular camera is transmitted, feature point matching is performed by measuring the hamming distance of the received enabling signal, and the matching is successful when the hamming distance is smaller than the threshold value.

A system for hardware acceleration method using FPGA-based binocular image feature extraction, matching and depth resolution, comprising,

The gray conversion module is used for converting RGB format data sent to the FPGA by the binocular camera into YCbCr format data and extracting pixel brightness information;

the line buffer module is used for storing pixel brightness information into a plurality of first-in first-out queues;

the FAST corner detection module is used for judging whether the pixels stored by the line cache module are corner points or not;

the Harris angular point response value calculation module is used for calculating and sequencing the response values of the angular points and reserving a corresponding number of characteristic points with relatively high response values;

the ORB descriptor extraction module is used for carrying out ORB descriptor extraction on the reserved characteristic points;

The characteristic point buffer module is used for sorting the Harris angular point response values of each extracted characteristic point, calculating the Harris angular point response values of the new characteristic points, and updating whether the new characteristic points are stored or not until a final buffer array is obtained;

and the characteristic point matching module is used for unfolding and violently sequencing the left and right eye image characteristics after the transmission of each frame of image is finished, storing a matching result through hash coding, and finally obtaining the left and right eye x-axis coordinates, the base line, the focal length and the depth of the calculated characteristic point.

A computer readable storage medium storing a computer program configured to implement the steps of a FPGA-based binocular image feature extraction, matching and depth resolution hardware acceleration method when invoked by a processor.

Compared with the prior art, the invention has the beneficial effects that:

According to the FPGA-based binocular image feature extraction, matching and depth calculation hardware algorithm, the performance of the SLAM algorithm is greatly improved, and particularly the problems that excessive CPU resources are occupied by front-end feature extraction, detection and stereoscopic vision depth calculation in the SLAM algorithm, and the comprehensive performance of the algorithm is reduced are solved;

(1) The invention hardware a series of algorithms such as feature extraction, and utilizes the characteristic that the FPGA is suitable for a pipeline algorithm with a certain depth, thereby greatly improving the real-time performance of the algorithm, improving the execution speed of the algorithm and further improving the comprehensive performance of the SLAM algorithm;

(2) The binocular camera directly performs data transmission with the FPGA, the CPU can receive processed image data comprising depth information, so that the occupation of resources of the CPU is greatly reduced, most of calculation power of the CPU can be put into more complex calculation tasks, and the utilization efficiency of the CPU is improved;

(3) Compared with a CPU, the FPGA has the advantages that the energy consumption is smaller, the interference of thermal noise to hardware is reduced, the reliability of devices is improved, and the endurance time of equipment can be obviously prolonged.

Drawings

FIG. 1 is a flowchart of a hardware acceleration method for extracting, matching and depth solving binocular image features based on an FPGA in an embodiment of the invention;

FIG. 2 is a flow chart illustrating operation of a line cache module according to an embodiment of the present invention;

Fig. 3 is a schematic diagram of FAST corner detection in an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an operation of the feature point buffer module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of hash encoding for storing feature matching results in an embodiment of the present invention;

FIG. 6 is a diagram of actual image data acquired by a binocular camera in an embodiment of the present invention;

FIG. 7 is a waveform diagram of key signals in a simulation environment according to an embodiment of the present invention;

fig. 8 is a diagram of extraction and matching results of feature points of a binocular image according to an embodiment of the present invention.

Detailed Description

In the description of the present invention, it should be noted that terms such as "upper", "lower", "front", "rear", "left", "right", and the like in the embodiments indicate terms of orientation, and only for simplifying the description based on the positional relationship of the drawings in the specification, do not represent that the elements and devices and the like referred to must be operated according to the specific orientation and the defined operations and methods, configurations in the specification, and such orientation terms do not constitute limitations of the present invention.

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

The specific embodiment I is as follows: referring to fig. 1 to 5, the invention provides a hardware acceleration method for extracting, matching and depth solving binocular image features based on an FPGA, which comprises the following steps:

S1, gray level conversion

The binocular camera sends RGB format data to the FPGA, each camera clock sends one pixel data, and 24 bits (8 bits each) are used; after receiving RGB data, the FPGA firstly converts the RGB format data into YCbCr format data through a gray level conversion module, extracts Y which is taken as brightness information (8 bits) of pixels and sends the brightness information to a line buffer module;

the feature extraction of an image is usually based on the gray value of the image, which reflects the "brightness" level of the pixel, and since the data sent by the camera is often in RGB format, the RGB format is first converted into YCbCr format, where Y represents the brightness information of the pixel, and its conversion formula is as follows,

Wherein R' and G, B respectively represent the red, green and blue chromaticity values of the pixel;

Because the FPGA does not support floating point operation, all coefficients need to be multiplied by 2 ⁸ to be converted into integers in the specific implementation, and the conversion result is shifted to the right by 8 bits to obtain a final result;

s2, line buffering

The buffer memory module is composed of a plurality of first-in first-out queues (FIFO), each FIFO can store one line of image data, the subsequent algorithm receives the pixel data of the same column and continuous multiple lines on the same rising edge through the line buffer memory module, and a basis is provided for the subsequent module to extract the window with the specific size of the image;

This is because a window of a specific size (in practice, the window size is 39×39, which is determined by the range required for ORB descriptor extraction) needs to be fetched when extracting image features, whereas image data only sends one pixel per clock, and a line buffer design needs to be introduced in order to obtain a window of m×m size at the same time;

Assuming that the width of each frame of image is W, the height is H, the bit width of each pixel is 8, m-1 line cache modules are required to be designed, and each line cache is a first-in first-out module and is composed of BRAM or triggers integrated in the FPGA;

It can be seen that the size of each first-in first-out module is w×8, which is the same as the size of a row of pixels; by means of line buffering, each clock simultaneously generates pixel data of m continuous lines and in the same column, and further by buffering m clocks, an image window with m multiplied by m can be obtained, and the window is used for subsequent feature extraction;

s3, FAST corner detection

The FAST corner detection module receives the input of the line buffer module in the step S2, intercepts a 7 multiplied by 7 image window in a data buffer mode, judges whether a pixel is a corner or not through a FAST corner detection algorithm, and sends a judgment result to a subsequent module;

FAST is a corner extraction algorithm, and if a pixel has a large phase difference from enough pixels in the neighborhood around the pixel, the pixel is judged to be a corner;

As shown in fig. 3, if the continuous 9 pixel values of 16 points are greater or less than the central pixel point by a certain threshold value, the threshold value can be set to be 50, and then the central point is determined as a corner point;

In a specific hardware algorithm, firstly, a 7×7 window is obtained through line cache, 16 pixel points on a circular ring in fig. 3 are selected to be compared with a central pixel, so that two binary number groups with the length of 16 bits are obtained, and if and only if 9 continuous high bits exist in the groups, the central pixel is judged to be a corner point of an image;

s4, calculating Harris corner response value

The Harris angular point response value calculation module captures a 9 multiplied by 9 image window, calculates pixel gradients by adopting a Sobel operator, calculates Harris angular point response values by combining a Gaussian smoothing window, and uses the calculated Harris angular point response values for feature point sequencing, wherein the higher the response value is, the higher the possibility that the feature point is an angular point is, and the more should be reserved in a subsequent algorithm;

the method is characterized in that the data length is fixed in the FPGA, the data format of an array with variable length in a high-level programming language does not exist, and if a frame of picture is extracted with feature points exceeding the array length, the feature points are difficult to store; therefore, sorting is performed by calculating the Harris response value as the feature point, sorting is performed on each extracted feature point according to the Harris angular point response value, and only the feature point with higher response value is stored;

The Harris corner response R is calculated as follows,

Wherein det (M) is determinant of matrix M, k is empirical coefficient, trace (M) is trace of matrix M, w (x, y) is two-dimensional Gaussian window coefficient, and I _x and I _y are respectively gradient values of (x, y) in horizontal and vertical directions, which can be obtained by Sobel operator convolution;

S5, ORB descriptor extraction

The image feature points are composed of key points and descriptors, the key points of the image can be determined through the calculation of FAST and Harris corner response values, and the descriptors of the key points also need to be further extracted; ORB descriptor is a binary descriptor, represented in Verilog HDL by a reg type variable of length 256;

The ORB descriptor extraction module is used for comparing the preset 256 pairs of points meeting Gaussian distribution with brightness values, so that 256-bit descriptors are obtained; the ORB descriptor has the characteristic of rotation invariance compared with the BRIEF descriptor due to the consideration of the rotation problem of the image, and is more effective;

The ORB descriptor randomly selects a pair of points (x ₀,y₀) and (x ₁,y₁) in a window of 31×31 near the key point, and decides the binary result as 0 or 1 according to the result of the gray value comparison; if 256 point pairs are randomly selected, a binary number group with the length of 256 is generated, namely the descriptor of the key point;

it should be noted that for each window, the center of gravity of the window needs to be calculated first, and the main direction of the window is redetermined, so that the descriptor is guaranteed to have the characteristic of rotation invariance;

After rotation, the relation between the new pixel point and the original pixel point is as follows,

Wherein θ is the rotation angle;

as in step S1, since the FPGA is not suitable for floating point operations, it is necessary to amplify sin (θ) and cos (θ) by a proper multiple, change them into integer, and shift the calculation result to the right to obtain the final calculation result;

S6, caching feature points

The feature buffer module stores the extracted feature points during the transmission process of each frame of image, and can store 100 feature points at most; each feature point has 352-bit codes (32-bit coordinates+64-bit response values+256-bit descriptors), and the cache queue is subjected to sequencing management through a mask-based hardware insertion sequencing algorithm, so that only 100 feature points with larger Harris response values are saved; the more the feature points are, the larger the demand on the FPGA memory resources is, 100 feature points are determined in consideration of the limitation of the resources, and the 100 feature points in the actual experiment can meet the demand of image matching;

because the FPGA data format is fixed, the Harris corner response values of each extracted feature point are required to be ordered and whether the Harris corner response values are stored or not is determined; assuming that at a certain clock rising edge, a new feature point information enters a feature point buffer module, the module needs to determine the position of the feature point in the array according to the Harris angular point response value of the feature point, and if the response value is smaller than the response values of all feature points in the array, the feature point is discarded, and the process is shown in figure 4;

Assuming that a buffer array (buffer) can store 100 feature point information at most, and each feature point occupies 352 bits (coordinate 32 bits+response value 64 bits+descriptor 256 bits), the method includes the following steps:

(5) For each received new feature point, comparing the response value R _cur with the response values (R ₀,R₁,...,R₉₉) of all feature points in the array through operators >, <=, and storing the results in two binary arrays (compare _high[99:0],compare_low [99:0 ]) with the length of 100 bits respectively;

(6) Judging whether the response value of each received new feature point is within the range of R _cur＞R₀&R_cur＜R₁, storing the comparison result in compare _insert [0], and storing the result within the range of R _cur＞R_n&R_cur＜R_n+1 in compare _insert [ n ] one by one to obtain a binary number set compare _insert [99:0];

(7) Expanding each bit in the compare _high、compare_low、compare_insert to 352 bits to form a mask with the same bit number as the cache array; repeating the new feature point for 100 times P _cur-extent to make the new feature point have the same bit number as the buffer array;

(8) The new cache array is calculated according to the following formula:

S7, feature point matching

The characteristic point matching module spreads the left eye image characteristic and the right eye image characteristic to violently sort after the transmission of each frame of image is finished, and saves a matching result through hash coding;

the feature point matching adopts a Brute-Force matching method (Brute-Force), and when one frame of data from the binocular camera is transmitted, the feature point matching module receives an enabling signal and performs feature point matching; because the BRIEF descriptor belongs to a binary descriptor, the Hamming distance is adopted for measurement; setting a threshold value as 40 in the actual process, and if the Hamming distance is smaller than the threshold value, successfully matching;

As 100 feature points are provided for one frame of data of the left and right eye cameras, 100 multiplied by 100=10000 matching combinations are adopted for violent matching; for the matching result, the algorithm adopts a hash coding method, and the matching result of the x-th characteristic point of the left-eye camera and the y-th characteristic point of the right-eye camera is stored in the 100x+y position, as shown in fig. 5;

The matching result of each pair of feature points adopts 65-bit codes (16 multiplied by 4+1, left eye u _L +left eye v _L +right eye u _R +right eye v _R +flag bits), and the result of all matching combinations is transmitted to the next module for processing;

S8, depth calculation

The depth calculation module calculates the depth of the feature points through the x-axis coordinates of the left and right eyes, the base line and the focal length of the matched feature points according to the model of the binocular camera;

the depth calculation module receives the operation result of the feature point matching module, if the flag bit of a pair of matching combinations is 1, which indicates that the matching of two feature points is successful, the depth calculation module calculates the depth of the feature points according to the binocular camera model, the formula is as follows,

Where f _x is the focal length and b is the left and right eye camera baseline length.

And a specific embodiment II: the invention provides a system for accelerating a method by utilizing binocular image characteristic extraction, matching and depth calculation based on FPGA, which comprises,

Other combinations and connection relationships of this embodiment are the same as those of the first embodiment.

And a specific embodiment II: the present invention provides a computer readable storage medium storing a computer program configured to implement the steps of a FPGA-based binocular image feature extraction, matching and depth solution hardware acceleration method when invoked by a processor.

Simulation experiment

The experiment adopts an indoor unmanned aerial vehicle binocular vision inertial data set manufactured by Zurich Federation and Congress chemical engineering college, a factory environment in the data set is selected as shown in fig. 6, a left object image and a right object image of the data set are read in a simulation environment, waveforms of digital signals are checked through EDA tools, a result is shown in fig. 7, a feature point matching result of the binocular image is finally obtained as shown in fig. 8, an excellent feature point matching result can be obtained through the algorithm, and a depth calculation module can calculate the depth of a point after one clock period through well-determined parameters (focal length and baseline length) through successfully matching point pairs.

The simulation experiment is to realize hardware description of the algorithm through the Verilog HDL and verify the algorithm in a simulation environment, and the experiment proves that the algorithm can greatly improve the real-time performance of the calculation under the condition of ensuring the operation precision, further release CPU resources and effectively improve the comprehensive performance of the SLAM algorithm.

Although the present disclosure is disclosed above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the disclosure, and such changes and modifications would be within the scope of the disclosure.

Claims

1. The hardware acceleration method for binocular image feature extraction, matching and depth calculation based on the FPGA is characterized by comprising the following steps of:

2. The hardware acceleration method for extracting, matching and depth solving of binocular image features based on the FPGA as claimed in claim 1, wherein the hardware acceleration method is characterized by comprising the following steps of: in step S1, the conversion formula for converting the RGB format into the YCbCr format is,

3. The hardware acceleration method for extracting, matching and depth solving of binocular image features based on the FPGA as claimed in claim 2, wherein the hardware acceleration method is characterized by comprising the following steps of: in step S2, when performing line buffering, let each frame image have width W, height H, bit width of each pixel be 8, m-1 line buffers be provided, each line buffer is a first-in first-out module, i.e. the size of each first-in first-out module is w×8;

4. A hardware acceleration method for FPGA-based binocular image feature extraction, matching and depth resolution according to claim 3, characterized in that: in step S3, a 7×7 image window is truncated by means of data buffering during FAST corner detection.

5. The hardware acceleration method for extracting, matching and depth solving binocular image features based on the FPGA as claimed in claim 4, wherein the hardware acceleration method is characterized by comprising the following steps of: in step S4, the Harris corner response value R is calculated as follows,

6. The hardware acceleration method for extracting, matching and depth solving binocular image features based on the FPGA as claimed in claim 5, wherein the hardware acceleration method is characterized by comprising the following steps of: in step S5, when the ORB description is extracted, for each window, the center of gravity of the window needs to be calculated first, and the main direction of the window is redetermined, and after rotation, the relationship between the new pixel point and the original pixel point is as follows,

Wherein θ is the rotation angle.

7. The FPGA-based hardware acceleration method for binocular image feature extraction, matching and depth resolution of claim 6, wherein: in step S6, when the feature points are cached, it is necessary to sort the Harris corner response values of each extracted feature point and determine whether to store, and if a new feature point information enters the feature point caching module at a rising edge of a clock, determine the position of the feature point in the array according to the Harris corner response values of the feature point, and determine whether to discard the feature point, including,

(4) The new cache array is calculated according to the following formula:

8. The hardware acceleration method for extracting, matching and depth solving of binocular image features based on the FPGA as claimed in claim 7, wherein the hardware acceleration method is characterized by comprising the following steps of: in step S7, after the transmission of one frame of data from the binocular camera is completed, feature point matching is performed by measuring the hamming distance of the received enabling signal, and if the hamming distance is smaller than the threshold value, the matching is successful.

9. A system utilizing the FPGA-based binocular image feature extraction, matching and depth resolution hardware acceleration method of any one of claims 1-8, characterized by: comprising the steps of (a) a step of,

10. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program configured to implement the steps of the FPGA-based binocular image feature extraction, matching and depth resolution hardware acceleration method of any one of claims 1-8 when invoked by a processor.