CN116416278A

CN116416278A - Visual tracking method, device, apparatus, medium and program product

Info

Publication number: CN116416278A
Application number: CN202310119922.3A
Authority: CN
Inventors: 杨剑; 张超
Original assignee: Xi'an Navinfo Information Technology Co ltd
Current assignee: Xi'an Navinfo Information Technology Co ltd
Priority date: 2023-02-15
Filing date: 2023-02-15
Publication date: 2023-07-11

Abstract

The embodiment of the specification discloses a visual tracking method, a device, equipment, a medium and a program, wherein the scheme comprises the following steps: and the complexity degree of the current tracking scene is reflected by adopting the first confidence coefficient of the N-th frame image tracking result, and the value of a first parameter set of the visual tracking algorithm when the visual tracking is carried out in the N+1-th frame image is determined by taking the first confidence coefficient as a priori. Therefore, the dynamic adjustment of the calculation force required by the visual tracking algorithm is realized according to the complexity of the current tracking scene on the premise of ensuring the tracking precision; compared with the technical scheme that the traditional vision tracking algorithm adopts uniform control parameters in any scene, the vision tracking method in the embodiment of the specification can reduce the calculation power consumed by the vision tracking algorithm on the premise of ensuring the tracking precision in most scenes, thereby improving the frame rate of the vision tracking algorithm and realizing real-time tracking of the tracking target on low-calculation-power embedded equipment.

Description

Visual tracking method, device, apparatus, medium and program product

Technical Field

The present disclosure relates to the field of vision tracking technologies, and in particular, to a vision tracking method, apparatus, device, medium, and program product.

Background

Crowd-sourced visual mapping refers to that vehicles with environment perception capability except a professional detection vehicle are enabled to run while road information data are collected and uploaded to a cloud end, and the cloud end builds a real-time updated driving map with high reduction degree according to the fed-back data. The crowdsourcing vision imaging mainly relates to algorithms such as detection and identification, vision tracking, positioning, three-dimensional reconstruction and the like, the performance of the tracking algorithm directly influences the imaging quality, and the crowdsourcing vision imaging system is a key module of the crowdsourcing vision imaging.

Crowd-sourced visual imaging mainly operates on embedded equipment with lower computational power, but the computational power required by the existing visual tracking algorithm is higher due to the fact that unified control parameters are adopted in the whole tracking process of the existing visual tracking algorithm. Based on this, the existing visual tracking algorithm has a low frame rate and cannot achieve real-time tracking on low-power embedded devices.

Thus, there is a need for a low-power visual tracking algorithm.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present disclosure provide a visual tracking method, apparatus, device, medium, and program, so as to achieve real-time tracking on low-computation embedded devices.

The embodiment of the specification provides a visual tracking method, which comprises the following steps:

acquiring a first confidence coefficient for performing visual tracking on a tracking target in an Nth frame image by using a visual tracking algorithm; the first confidence is used for indicating the credibility of the tracking result in the Nth frame of image;

determining the value of a first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the (N+1) th frame of image according to the value range corresponding to the first confidence coefficient; wherein the first parameter set is used for adjusting the computational power consumed by the vision tracking algorithm; the range of the first confidence coefficient is inversely related to the computational power consumed by running the visual tracking algorithm in the (N+1) th frame image;

and based on the first parameter set, performing visual tracking on the tracking target in the (N+1) th frame image by using the visual tracking algorithm to obtain a target image area of the tracking target in the (N+1) th frame image.

The embodiment of the present specification provides a visual tracking device, including:

the acquisition module is used for acquiring a first confidence coefficient for performing visual tracking on a tracking target in an N frame image by using a visual tracking algorithm; the first confidence is used for indicating the credibility of the tracking result in the Nth frame of image;

The determining module is used for determining the value of a first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the (N+1) th frame of image according to the range of the first confidence coefficient; wherein the first parameter set is used for adjusting the computational power consumed by the vision tracking algorithm; the range of the first confidence coefficient is inversely related to the computational power consumed by running the visual tracking algorithm in the (N+1) th frame image;

and the tracking module is used for visually tracking the tracking target in the (N+1) th frame image by utilizing the visual tracking algorithm based on the first parameter set to obtain an image position target image area of the tracking target in the (N+1) th frame image.

The embodiment of the specification provides a visual tracking device, which comprises a memory, a processor and a computer program stored on the memory, and is characterized in that the processor executes the computer program to realize the steps of the visual tracking method.

Embodiments of the present description provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the visual tracking method.

A computer program product provided by embodiments of the present description comprises a computer program/instruction which, when executed by a processor, implements the steps of the visual tracking method.

The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect:

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 is a schematic flow chart of a visual tracking method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an effective tracking area;

FIG. 3 is a flow chart of another visual tracking method according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural view of a visual tracking device corresponding to FIG. 1 according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a visual tracking device corresponding to fig. 1 according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

In order to solve the drawbacks of the prior art, the present solution provides the following embodiments:

fig. 1 is a schematic flow chart of a visual tracking method according to an embodiment of the present disclosure.

From the program perspective, the execution subject of the process may be an embedded device or a general computer device, or may be an application program installed on the embedded device or the general computer device. As shown in fig. 1, the process may include the steps of:

step 101: acquiring a first confidence coefficient for performing visual tracking on a tracking target in an Nth frame image by using a visual tracking algorithm; the first confidence is used for indicating the credibility of the tracking result in the Nth frame of image.

In the embodiment of the present disclosure, the visual tracking algorithm may be a median flow tracking algorithm (MedianFlow), or may be an LK (Lucas-Kanade) optical flow method, a pyramid optical flow tracking algorithm, or the like.

In this embodiment of the present disclosure, the first confidence is used to reflect the complexity of the current tracking scene. Generally, when the first confidence coefficient is larger, the tracking result of the nth frame image is better, and the current tracking scene can be considered to be relatively simpler; otherwise, when the first confidence coefficient is smaller, the tracking result of the nth frame image is not ideal enough, and the current tracking scene can be considered to be relatively complex.

Step 103: determining the value of a first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the (N+1) th frame of image according to the value range corresponding to the first confidence coefficient; wherein the first parameter set is used for adjusting the computational power consumed by the vision tracking algorithm; the range within which the first confidence level is located is inversely related to the computational effort consumed by running the visual tracking algorithm in the n+1st frame image.

In the embodiment of the present disclosure, the n+1st frame image may be a next picture for visual tracking after the N frame image; the nth frame image and the (n+1) th frame image may be continuous video frames acquired by an image acquisition device; and a plurality of video frames acquired by the image acquisition equipment can be also mixed between the N frame image and the (n+1) frame image.

In the embodiment of the present specification, the number of tracking targets may be one or more.

In this embodiment of the present disclosure, the range of values of the first confidence coefficient may be divided into a plurality of consecutive value ranges. Correspondingly, the first parameter set may be configured as a gear corresponding to the range of values of the first confidence level. The higher the value range corresponding to the first confidence coefficient is, the smaller the calculation power consumed by running the visual tracking algorithm in the (N+1) th frame image is; conversely, the greater the computational effort expended in running the visual tracking algorithm in the n+1st frame image.

In the embodiment of the present specification, the first parameter set may include parameters that have an influence on tracking accuracy of the visual tracking algorithm and the required consumed computing power. The first parameter set may include: one or more of a maximum number of layers of the image pyramid, a maximum number of iterations for iteratively tracking any one sample point, a window size for calculating a normalized correlation coefficient, and a number of sample points sampled in an initial image region in the nth frame image.

In the embodiment of the present specification, the first parameter set may correspond to the visual tracking algorithm. For example, when the visual tracking algorithm is a median flow tracking algorithm, the first set of parameters may include: the maximum number of layers of the image pyramid, the maximum number of iterations, the window size, and one or more of the sampling points. For another example, when the visual tracking algorithm is a pyramid optical flow tracking algorithm, since the normalized correlation coefficient (Normalized Cross Correlation, NCC) of the pixel rectangle centered on the sampling point in the nth frame image and the n+1th image is not calculated in the algorithm, the first parameter set may include: the maximum number of layers of the image pyramid, the maximum number of iterations, and one or more of the sampling points.

In this embodiment of the present disclosure, the first parameter set may be further determined according to the first confidence coefficient and the computing power consumed for performing visual tracking in the nth frame image.

In this embodiment of the present disclosure, the first parameter set may further be determined according to the first confidence coefficient and the current remaining computing power of the execution subject such as the embedded device.

Step 105: and based on the first parameter set, performing visual tracking on the tracking target in the (N+1) th frame image by using the visual tracking algorithm to obtain a target image area of the tracking target in the (N+1) th frame image.

In this embodiment of the present disclosure, a first confidence coefficient of a tracking result of an nth frame image is used to reflect a complexity degree of a current tracking scene, and the first confidence coefficient is used as a priori to determine a value of a first parameter set of the visual tracking algorithm when performing visual tracking in an n+1st frame image. Therefore, the dynamic adjustment of the calculation force required by the visual tracking algorithm is realized according to the complexity of the current tracking scene on the premise of ensuring the tracking precision; compared with the technical scheme that the traditional vision tracking algorithm adopts uniform control parameters in any scene, the vision tracking method in the embodiment of the specification can reduce the calculation power consumed by the vision tracking algorithm on the premise of ensuring the tracking precision in most scenes, thereby improving the frame rate of the vision tracking algorithm and realizing real-time tracking of the tracking target on low-calculation-power embedded equipment.

Based on the method in fig. 1, the examples of the present specification also provide some specific embodiments of the method, as described below.

Optionally, determining, according to the range in which the first confidence coefficient is located, a value of a first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the n+1st frame image may specifically include:

and determining the maximum layer number of the image pyramid in the process of visually tracking the tracking target in the (N+1) th frame image according to the range of the first confidence coefficient.

The performing, based on the first parameter set, visual tracking on the tracking target in the n+1st frame image by using the visual tracking algorithm may specifically include:

and establishing an image pyramid corresponding to the N-th frame image and an image pyramid corresponding to the (n+1) -th frame image according to the maximum layer number of the image pyramid.

Based on a pyramid optical flow tracking algorithm, the tracking target is subjected to visual tracking by utilizing an image pyramid corresponding to the N-th frame image and an image pyramid corresponding to the (N+1) -th frame image, and a target image area of the tracking target in the (N+1) -th frame image is obtained.

In the embodiment of the present disclosure, the visual tracking algorithm may be a pyramid optical flow tracking algorithm, and may also be a median flow tracking algorithm based on the pyramid optical flow tracking algorithm.

In the embodiment of the present specification, the image pyramid is an image set formed by a plurality of sub-images of different resolutions of one image according to the resolution. The image pyramid corresponding to the nth frame image may be an image pyramid of the nth frame image, or may be an image pyramid of a part of the nth frame image. Correspondingly, the image pyramid corresponding to the (n+1) th frame image may be the image pyramid of the (n+1) th frame image or the image pyramid of the part of the (n+1) th frame image.

In this embodiment of the present disclosure, the optical flow tracking algorithm based on a pyramid may perform visual tracking on the tracking target by using an image pyramid corresponding to the nth frame image and an image pyramid corresponding to the n+1th frame image, where the visual tracking may specifically include: directly utilizing a pyramid optical flow tracking algorithm to carry out visual tracking; and performing visual tracking by using a median flow tracking algorithm based on the pyramid optical flow tracking algorithm.

In the embodiment of the specification, when visual tracking is performed based on a pyramid optical flow tracking algorithm, the optical flow of a sampling point in a top image is obtained by minimizing the sum of matching errors of each point in a neighborhood range of top image layering; taking the optical flow in the image layer of the upper layer as an optical flow estimation value of the image layer of the lower layer, and obtaining the residual optical flow of the sampling point in the image layer by minimizing the matching error sum of the sampling point in the neighborhood range of the image layer; until the optical flow estimate and the residual optical flow of the sample point in the bottom image hierarchy are determined, thereby obtaining the optical flow of the sample point.

In the embodiment of the specification, the maximum layer number of the image pyramid is determined according to the range of the first confidence coefficient, so that the calculation power consumed in tracking the sampling points of the image pyramid is adjusted according to the complexity of the current tracking scene, and the calculation power required by the visual tracking algorithm is dynamically adjusted on the premise of ensuring the tracking precision.

In another embodiment of the present disclosure, the maximum number of layers of the image pyramid may also be determined according to the size of the bottom image hierarchy. Since the length and width of the image layering of the upper layer are half of those of the image layering of the upper layer in the image pyramid, if the maximum layer number of the image pyramid is too large, the size of the image layering of the top is too small, so that the image layering method has no meaning for the visual tracking effect. The maximum layer number of the image pyramid is determined according to the layering size of the bottom image, and the calculation force required by the visual tracking algorithm can be reduced on the premise of ensuring the tracking precision.

In practical application, the existing visual tracking algorithm takes the whole image as input, so that the computational effort required by the existing visual tracking algorithm is high, the frame rate of the existing visual tracking algorithm is low, and real-time tracking cannot be achieved on embedded equipment with low computational effort.

Based on this, the establishing the image pyramid corresponding to the nth frame image and the image pyramid corresponding to the n+1th frame image according to the maximum layer number of the image pyramid may specifically include:

and determining an effective tracking area of the tracking target in the (N+1) th frame image according to the maximum layer number of the image pyramid and the initial image area of the tracking target in the (N) th frame image.

And according to the effective tracking area, intercepting an N frame effective image of the tracking target in the N frame image.

And according to the effective tracking area, intercepting an N+1st frame effective image of the tracking target in the N+1st frame image.

And establishing the image pyramid of the N-th frame effective image and the image pyramid of the (n+1) -th frame effective image according to the maximum layer number of the image pyramid.

In this embodiment of the present disclosure, the effective tracking area may refer to a position where the tracking target may appear in the n+1st frame image.

In this embodiment of the present disclosure, the n+1st frame effective image may be a partial image of the effective tracking area in the n+1st frame image. The nth frame of effective image may be a partial image of the effective tracking area in the nth frame of image.

Optionally, the determining, according to the maximum layer number of the image pyramid and the initial image area of the tracking target in the nth frame image, an effective tracking area of the tracking target in the (n+1) th frame image may specifically include:

and calculating the expansion distance according to the maximum layer number of the image pyramid.

And extending the initial image area to the periphery by the expansion distance to obtain an effective tracking area of the tracking target in the (N+1) th frame image.

Fig. 2 is a schematic diagram of an effective tracking area. In the figure, a dashed box is an effective tracking area (x_roi, y_roi, w_roi, h_roi) of the tracking target in the n+1th frame image; points (x_roi, y_roi) are upper left corner points of the effective tracking area, w_roi is the width of the effective tracking area, and h_roi is the height of the effective tracking area. The solid line box is an initial image area of the tracking target in the Nth frame image; the points (x_pre, y_pre) are the upper left corner of the initial tracking area, w_pre is the width of the effective initial tracking area, and h_pre is the height of the initial tracking area. offset_x is the expansion distance in the X-axis direction, and offset_y is the expansion distance in the Y-axis direction.

In this embodiment of the present disclosure, the calculation formula of the extended distance is:

offset_x＝2 ^Lm +0.5*s*W_pre (1)

offset_y＝2 ^Lm +0.5*s*H_pre (2)

in the above formula, lm is the maximum number of layers of the image pyramid; s is the maximum magnification of the tracking target between adjacent frames, and the checked value can be 1.1-1.2.

In the embodiment of the present disclosure, the calculation formula of the effective tracking area (x_roi, y_roi, w_roi, h_roi) is:

X_roi＝X_pre-offset_x(3)

Y_roi＝Y_pre-offset_y(4)

W_roi＝W_pre+2*offset_x(5)

H_roi＝H_pre+2*offset_y(6)

determining the maximum iteration number of iterative tracking of any sampling point in one image layering of the image pyramid according to the range of the first confidence coefficient; the image pyramid is an image pyramid corresponding to the Nth frame image and an image pyramid corresponding to the (n+1) th frame image; the sampling point is obtained by sampling an initial image area of the tracking target in the Nth frame image.

And in the image layering of the image pyramid corresponding to the N-th frame image and the image pyramid corresponding to the (n+1) -th frame image, carrying out iterative tracking on the sampling points by utilizing a Newton iteration method by minimizing the sum of matching errors in the neighborhood range of each sampling point, so as to obtain the iterative optical flow of the sampling points in the image layering.

And if the iteration number of the iteration tracking reaches the maximum iteration number, taking the optical flow of the last iteration as the optical flow of the sampling point in the image layering.

And determining a tracking point corresponding to the sampling point in the (n+1) th frame image according to the optical flow of the sampling point in the image layering.

And determining the target image area of the tracking target in the (N+1) th frame image based on all sampling points and the corresponding tracking points.

In the embodiment of the present specification, the visual tracking algorithm may be a median flow tracking algorithm, and may also be a pyramid optical flow tracking algorithm.

In the examples herein, the Newton's method is also called Newton-Raphson method.

In the embodiment of the present disclosure, in the iterative tracking process, if the sum of the matching errors in the neighborhood range of the sampling point is smaller than a threshold value, the matching error and the corresponding optical flow are used as the optical flow of the sampling point in the image hierarchy.

In the embodiment of the specification, the maximum iteration number of iterative tracking is determined according to the range of the first confidence coefficient, so that the calculation force required by the iterative tracking of the sampling points in the image layering of the image pyramid is adjusted according to the complexity of the current tracking scene, and the calculation force required by the visual tracking algorithm is dynamically adjusted on the premise of ensuring the tracking precision.

and determining the window size for calculating the normalized correlation coefficient according to the range of the first confidence coefficient.

for any sampling point, tracking by using an optical flow tracking algorithm to obtain a tracking point of the sampling point in the (N+1) th frame image; the sampling points are obtained by sampling an initial image area of the tracking target in the Nth frame of image.

And extracting a first rectangular image which takes the sampling point as a center and has the size of the window from the Nth frame image.

And extracting a second rectangular image which takes the tracking point as a center and has a window size from the (N+1) th frame image.

And calculating normalized correlation coefficients of the sampling points and the tracking points according to the first rectangular image and the second rectangular image.

And calculating the median value of the normalized correlation coefficients of all the sampling points.

And screening the tracking points according to the median, and taking the tracking points with the normalized correlation coefficient larger than the median as stable tracking points.

And determining the target image area of the tracking target in the (N+1) th frame image according to the stable tracking point.

In the embodiment of the present specification, the visual tracking algorithm may be a median flow tracking algorithm.

In the embodiment of the present specification, the normalized correlation coefficient (Normalization cross correlation, NCC) is used to represent the similarity of the first rectangular image and the second rectangular image.

In this embodiment of the present disclosure, the calculating a median value of normalized correlation coefficients of all sampling points may specifically include:

And sequencing the normalized correlation coefficients of all the sampling points according to the size, and taking the normalized correlation coefficient at the middle position as the median of the normalized correlation coefficients.

In the embodiment of the specification, determining a window size for calculating a normalized correlation coefficient according to the range of the first confidence coefficient; therefore, the calculation force consumed by calculating the normalized correlation coefficient is adjusted according to the complexity of the current tracking scene, and the calculation force required by the visual tracking algorithm is dynamically adjusted on the premise of ensuring the tracking precision.

Optionally, the visual tracking method may further include:

calculating a second confidence coefficient for visually tracking the tracking target in the (N+1) th frame image according to the number of the stable tracking points; the second confidence is used for indicating the credibility of the tracking result in the (N+1) th frame image.

In this embodiment of the present disclosure, the stable tracking point may be a tracking point with a better tracking effect, and specifically may be a tracking point obtained by screening according to a median value of the normalized correlation coefficient and/or a median value of a Forward-Backward Error (FB-Error).

In this embodiment of the present disclosure, the calculation formula of the second confidence coefficient may be:

In conf _curr For the second confidence level; n is the number of stable tracking points obtained by tracking in the (N+1) th frame image; the thread is the quantity threshold value of the stable tracking points, and when N is larger than the thread, the tracking is very reliable, and the confidence is 1.0; sin is a sine function; min is a minimum function.

and determining the sampling point number sampled in the initial image area of the tracking target in the N frame image according to the range of the first confidence.

and uniformly sampling in the initial image area based on the sampling points to obtain the sampling points of the tracking target.

And based on the sampling points, performing visual tracking on the tracking target in the (n+1) th frame image by using an optical flow tracking algorithm.

In the embodiment of the present disclosure, the number of sampling points is the number of sampling points; the determining the number of sampling points may include determining a total number of sampling points, may include determining a density of sampling points, or may be determining an interval between sampling points.

In the embodiment of the present specification, the uniform sampling may refer to sampling in a manner of uniformly scattering points in a grid.

In the embodiment of the present disclosure, the number of sampling points in the initial image area is determined according to the range where the first confidence coefficient is located; according to the complexity of the current tracking scene, the calculation force required by the visual tracking algorithm is dynamically adjusted on the premise of ensuring the tracking precision.

Based on the same ideas as the solution shown in fig. 1, the embodiment of the present specification also provides another visual tracking method. Fig. 3 is a flow chart of another visual tracking method according to an embodiment of the present disclosure. As shown in fig. 3, when the visual tracking algorithm is a median flow tracking algorithm, the process may include the steps of:

step 301: acquiring a first confidence coefficient for performing visual tracking on a tracking target in an Nth frame image by using a visual tracking algorithm and an initial image area of the tracking target in the Nth frame image; the first confidence is used for indicating the credibility of the tracking result in the Nth frame of image.

Table 1: first parameter group selection table

The correspondence between the value range of the first confidence and the first parameter set is given in table 1.

Step 302: based on table 1, the value of the first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the n+1st frame image can be determined according to the value range corresponding to the first confidence coefficient. For example, when the first confidence is 0.6, the corresponding value range is [0.5,0.9 ], and the values of the first parameter set are as follows: the maximum layer number of the image pyramid is 5 layers, the maximum iteration number of iterative tracking of any sampling point is 15 times, the window size for calculating the normalized correlation coefficient is 20 x 20 pixels, and the number of sampling points sampled in the initial image area in the nth frame image is 8 rows x 8 columns and is equal to 64.

Step 303: calculating an expansion distance according to the maximum layer number of the image pyramid by using a formula 1 and a formula 2; and calculating an effective tracking area of the tracking target in the (N+1) th frame image according to the initial image area and the expansion distance by using the formulas 3 to 6. And according to the effective tracking area, intercepting an N-th frame effective image I of the tracking target in the N-th frame image, and intercepting an N+1th frame effective image J of the tracking target in the N+1th frame image.

Step 304: and based on the first parameter set, performing visual tracking on the tracking target in the (N+1) th frame image by using the visual tracking algorithm to obtain a target image area of the tracking target in the (N+1) th frame image. The method specifically comprises the following steps:

step 30401: and sampling is carried out in the initial image area according to the sampling points to obtain sampling points for visual tracking.

Step 30402: tracking to obtain a forward tracking point B of any one sampling point A in the (N+1) th frame image by using a pyramid LK optical flow method; the process may include:

establishing the image gold of the N-th effective image I according to the maximum layer number Lm of the image pyramidFont tower { I } ^L } _{L＝0,…Lm-1} And an image pyramid { J } of the (n+1) -th frame valid image J ^L } _{L＝0,…Lm-1} . I.e. the N-th frame effective image I or the N+1st frame effective image J is used as the 0 th layer image layering (bottom image layering), and the width and height are reduced by 2 ^L The multiplied image is taken as an L layer, and the width and the height are reduced by 2 ^Lm-1 And layering the multiplied images as top images, and obtaining the image pyramid according to the shape of pyramid formed by arranging the images from large to small.

From the image pyramid { I } ^L } _{L＝0,…Lm-1} Sum { J ^L } _{L＝0,…Lm-1} For any sampling point A, carrying out iterative tracking on the sampling point A by utilizing a Newton iteration method by minimizing the sum of matching errors of the sampling point A in the neighborhood range of the top image layering; until the iteration number of the iterative tracking reaches the maximum iteration number, or the sum of matching errors is smaller than a threshold value, obtaining the residual optical flow d of the sampling point A in the top image layering ^Lm-1 The method comprises the steps of carrying out a first treatment on the surface of the Top image layering optical flow estimates g for each point ^Lm-1 Is 0.

Starting from the sub-top layer (Lm-2 layer) image layering, an optical flow estimate g in the upper layer (l+1 layer) image layering is based on each sample point ^L+1 And residual optical flow d ^L+1 Determining an optical flow estimation value g of a sampling point in an L-layer image layering ^L ：

g ^L ＝2(g ^L+1 +d ^L+1 )

Based on LK optical flow method again, carrying out iterative tracking on the sampling point by utilizing Newton iterative method by minimizing the sum of matching errors of the sampling point A in the neighborhood range of the top image layering; until the iteration number of the iterative tracking reaches the maximum iteration number, or the sum of the matching errors is smaller than a threshold value, obtaining the residual optical flow d of the sampling point A in the L-th layer image layering ^L 。

Repeating the above steps until obtaining the optical flow estimated value g in the bottom image layering ⁰ And residual optical flow d ⁰ Obtaining the light of the sampling point A in the pyramid imageFlow:

d＝g ⁰ +d ⁰

and obtaining a forward tracking point B of the sampling point A in the (N+1) th frame image according to the position information of the sampling point A and the optical flow of the sampling point A in the pyramid image.

Step 30403: and tracking in the N-frame effective image I in a backward way by using a pyramid LK optical flow method again to obtain a backward tracking point C of the tracking point B in the N-frame effective image I.

Step 30404: and calculating Euclidean distance between the sampling point A and the Backward tracking point C to obtain Forward-Backward Error (FB-Error) of the tracking point B.

Step 30405: and calculating the FB-error median of the forward and backward errors of all the sampling points to obtain a threshold value for judging whether the forward and backward errors meet the conditions.

Step 30406: according to the window size (NCC window size for short) for calculating the normalized correlation coefficient, respectively extracting a first rectangular image taking the sampling point A as a center and the size of the NCC window size from the N-th frame effective image I; and a second rectangular image which takes the tracking point B as a center and has the size of the NCC window in the (N+1) th frame effective image J.

Step 30307: a normalized correlation coefficient (Normalized Cross Correlation, NCC) of the sampling point a and the tracking point B is calculated from the degree of similarity of the first rectangular image and the second rectangular image.

Step 30508: and calculating NCC median values of the normalized correlation coefficients of all the sampling points to obtain a threshold value for judging whether the normalized correlation coefficients NCC meet the conditions.

Step 3009: and screening all sampling points according to the FB-error median and the NCC median, and reserving tracking points with forward and backward errors of which the FB-error is smaller than the FB-error median and the normalized correlation coefficient NCC is larger than the NCC median to obtain stable tracking points.

Step 30440: and calculating the image area of the tracking target in the (n+1) th frame effective image according to the position relation between the stable tracking point and the corresponding sampling point.

Step 305: and obtaining a target image area of the tracking target in the (N+1) -th frame image according to the corresponding relation between the (N+1) -th frame effective image and the (N+1) -th frame image (namely the position of the effective tracking area in the (N+1) -th frame image).

Step 306: and calculating a second confidence coefficient for visually tracking the tracking target in the (N+1) th frame image by using a formula 7 according to the number of the stable tracking points.

Based on the same thought, the embodiment of the specification also provides a device corresponding to the method.

Fig. 4 is a schematic structural diagram of a visual tracking device corresponding to fig. 1 according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus may include:

an obtaining module 401, configured to obtain a first confidence level of performing visual tracking on a tracking target in an nth frame image by using a visual tracking algorithm; the first confidence level may be used to indicate a confidence level of a tracking result in the nth frame image.

The determining module 403 may be configured to determine, according to a range in which the first confidence coefficient is located, a value of a first parameter set of the visual tracking algorithm in a process of performing visual tracking on the tracking target in an n+1st frame image; wherein the first set of parameters may be used to adjust the computational effort consumed by the vision tracking algorithm; the range within which the first confidence level is located is inversely related to the computational effort consumed by running the visual tracking algorithm in the n+1st frame image.

The tracking module 405 may be configured to perform, based on the first parameter set, visual tracking on the tracking target in the n+1st frame image by using the visual tracking algorithm, to obtain an image position target image area of the tracking target in the n+1st frame image.

The present examples also provide some embodiments of the method based on the apparatus of fig. 4, as described below.

Optionally, the determining module 403 may specifically be configured to:

Correspondingly, the tracking module 405 may specifically include:

The pyramid establishment unit can be used for establishing the image pyramid corresponding to the N-th frame image and the image pyramid corresponding to the (n+1) -th frame image according to the maximum layer number of the image pyramid.

The pyramid tracking unit can be used for carrying out visual tracking on the tracking target by utilizing an image pyramid corresponding to the N-th frame image and an image pyramid corresponding to the (n+1) -th frame image based on a pyramid optical flow tracking algorithm to obtain a target image area of the tracking target in the (n+1) -th frame image.

Optionally, the pyramid building unit may specifically include:

and the effective tracking area determining subunit is used for determining the effective tracking area of the tracking target in the (n+1) th frame image according to the maximum layer number of the image pyramid and the initial image area of the tracking target in the (N) th frame image.

And the N frame effective image interception subunit can be used for intercepting the N frame effective image of the tracking target in the N frame image according to the effective tracking area.

And the N+1st frame effective image interception subunit can be used for intercepting the N+1st frame effective image of the tracking target in the N+1st frame image according to the effective tracking area.

And the pyramid establishment subunit is used for establishing the image pyramid of the N-th frame effective image and the image pyramid of the (n+1) -th frame effective image according to the maximum layer number of the image pyramid.

Optionally, the effective tracking area determining subunit may specifically be configured to:

Optionally, the determining module 403 may specifically be configured to:

Correspondingly, the tracking module 405 may specifically be configured to:

Optionally, the determining module 403 may specifically be configured to:

and determining the window size which can be used for calculating the normalized correlation coefficient according to the range of the first confidence coefficient.

Correspondingly, the tracking module 405 may specifically be configured to:

Optionally, the visual tracking device may further include:

the confidence coefficient calculation module is used for calculating a second confidence coefficient for carrying out visual tracking on the tracking target in the (N+1) th frame image according to the number of the stable tracking points; wherein the second confidence level may be used to indicate a confidence level of the tracking result in the n+1st frame image.

Optionally, the determining module 403 may specifically be configured to:

Correspondingly, the tracking module 405 may specifically be configured to:

Based on the same thought, the embodiment of the specification also provides equipment corresponding to the method.

Fig. 5 is a schematic structural diagram of a visual tracking device corresponding to fig. 1 according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 may include:

at least one processor 510; the method comprises the steps of,

a memory 530 communicatively coupled to the at least one processor; wherein,,

the memory 530 stores instructions 520 executable by the at least one processor 510, the instructions being executable by the at least one processor 510 to enable the at least one processor 510 to:

acquiring a first confidence coefficient for performing visual tracking on a tracking target in an Nth frame image by using a visual tracking algorithm; the first confidence is used for indicating the credibility of the tracking result in the Nth frame of image.

Determining the value of a first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the (N+1) th frame of image according to the value range corresponding to the first confidence coefficient; wherein the first parameter set is used for adjusting the computational power consumed by the vision tracking algorithm; the range within which the first confidence level is located is inversely related to the computational effort consumed by running the visual tracking algorithm in the n+1st frame image.

Based on the same considerations, the present description embodiments provide a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the visual tracking method.

Based on the same insight, the present description embodiments provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the visual tracking method.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, the storage medium, and the program in the embodiments of the present specification, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points are referred to in the partial description of the method embodiments.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. The designer programs itself to "integrate" a digital system onto a single PLD without requiring the chip manufacturer to design and fabricate application specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of vision tracking, comprising:

2. The method of claim 1, wherein determining the value of the first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the n+1st frame image according to the range of the first confidence coefficient specifically includes:

determining the maximum layer number of an image pyramid in the process of visually tracking the tracking target in the (N+1) th frame image according to the range of the first confidence coefficient;

the visual tracking algorithm is used for performing visual tracking on the tracking target in the (n+1) th frame image based on the first parameter set, and specifically comprises the following steps:

establishing an image pyramid corresponding to the Nth frame image and an image pyramid corresponding to the (n+1) th frame image according to the maximum layer number of the image pyramid;

3. The method according to claim 2, wherein the creating the image pyramid corresponding to the nth frame image and the image pyramid corresponding to the n+1th frame image according to the maximum layer number of the image pyramid specifically includes:

Determining an effective tracking area of the tracking target in the (N+1) th frame image according to the maximum layer number of the image pyramid and an initial image area of the tracking target in the (N) th frame image;

according to the effective tracking area, capturing an N frame effective image of the tracking target in the N frame image;

according to the effective tracking area, intercepting an N+1st frame effective image of the tracking target in the N+1st frame image;

4. The method according to claim 3, wherein the determining the effective tracking area of the tracking target in the n+1st frame image according to the maximum layer number of the image pyramid and the initial image area of the tracking target in the N frame image specifically includes:

calculating an expansion distance according to the maximum layer number of the image pyramid;

5. The method of claim 1, wherein determining the value of the first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the n+1st frame image according to the range of the first confidence coefficient specifically includes:

Determining the maximum iteration number of iterative tracking of any sampling point in one image layering of the image pyramid according to the range of the first confidence coefficient; the image pyramid is an image pyramid corresponding to the Nth frame image and an image pyramid corresponding to the (n+1) th frame image; the sampling points are obtained by sampling an initial image area of the tracking target in the Nth frame of image;

in the image layering of the image pyramid corresponding to the Nth frame image and the image pyramid corresponding to the (n+1) th frame image, carrying out iterative tracking on the sampling points by utilizing a Newton iteration method by minimizing the sum of matching errors in the neighborhood range of each sampling point, so as to obtain iterative optical flow of the sampling points in the image layering;

if the iteration number of the iteration tracking reaches the maximum iteration number, taking the optical flow of the last iteration as the optical flow of the sampling point in the image layering;

Determining a corresponding tracking point of the sampling point in the (n+1) th frame image according to the optical flow of the sampling point in the image layering;

6. The method of claim 1, wherein determining the value of the first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the n+1st frame image according to the range of the first confidence coefficient specifically includes:

determining a window size for calculating a normalized correlation coefficient according to the range of the first confidence coefficient;

for any sampling point, tracking by using an optical flow tracking algorithm to obtain a tracking point of the sampling point in the (N+1) th frame image; the sampling points are obtained by sampling an initial image area of the tracking target in the N frame image;

extracting a first rectangular image which takes the sampling point as a center and has the size of the window from the Nth frame image;

Extracting a second rectangular image with the tracking point as a center and the size of the window from the (N+1) th frame image;

calculating normalized correlation coefficients of the sampling points and the tracking points according to the first rectangular image and the second rectangular image;

calculating the median value of the normalized correlation coefficients of all the sampling points;

screening the tracking points according to the median, and taking the tracking points with normalized correlation coefficients larger than the median as stable tracking points;

7. The method as recited in claim 6, further comprising:

8. The method of claim 1, wherein determining the value of the first parameter set of the visual tracking algorithm in the process of visually tracking the tracking target in the n+1st frame image according to the range of the first confidence coefficient specifically includes:

Determining sampling points sampled in an initial image area of the tracking target in the N frame image according to the range of the first confidence coefficient;

based on the sampling points, uniformly sampling in an initial image area to obtain sampling points of the tracking target;

9. A vision tracking device, comprising:

10. A visual tracking device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method of claims 1 to 9.

11. A computer readable storage medium having stored thereon a computer program/instructions, which when executed by a processor, implement the steps of the method of claims 1 to 9.

12. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of claims 1 to 9.