CN116385915A

CN116385915A - Water surface floater target detection and tracking method based on space-time information fusion

Info

Publication number: CN116385915A
Application number: CN202211098977.2A
Authority: CN
Inventors: 陈任飞; 彭勇; 吴剑
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2023-07-04

Abstract

A method for detecting and tracking a water surface floater target based on space-time information fusion. First, surface float video datasets are obtained for different times, places, seasons and weather conditions. And secondly, inputting video data in single frame detection, and acquiring a water surface floater target detection frame of the current frame through an improved SSD detection algorithm. And thirdly, in multi-frame filtering, taking the target detection frame of the first frame of the video as the input of an improved KCF tracking algorithm to carry out target tracking, and obtaining a target tracking frame of the water surface floater under the current video frame. And finally, after the tracking of the fixed frame number in the video frame is completed, introducing an SSD detection algorithm again into the next frame of video frame to obtain a new detection frame of the floater target, and comparing the coincidence ratio of the new detection frame and the old tracking frame by adopting a candidate frame selection strategy to carry out tracking judgment. If the contact ratio meets the condition, continuing to track the current floater target; if the contact ratio does not meet the condition, the floater target is regarded as a new target and is output, and the new target is participated in the initialization tracking of the KCF tracking algorithm. According to the invention, whether the detection and tracking information belongs to a new floater target is judged through a space-time fusion strategy, so that the false detection rate and the omission rate of the target object are reduced, and the detection and tracking precision of the floater is improved.

Description

Water surface floater target detection and tracking method based on space-time information fusion

Technical Field

The invention belongs to the fields of machine learning and target detection and tracking, and relates to a water surface floater target detection and tracking method based on temporal and spatial information fusion.

Background

The floats are used as important sources of river and lake pollution, seriously destroy the water surface landscape and ecological environment, and the visual supervision of the river and the lake is implemented through the camera so as to improve the surface appearance of the river and the lake. The existing technology for detecting and tracking the floaters is affected by the problems of difficulty in sampling the floaters, shape change, environment change, frequent shielding and the like, and particularly the problems of water wave disturbance, dynamic light shadow, strong light reflection and the like cause that the actual management requirements of the floaters on the water surface are difficult to meet by the existing method. Therefore, finding a high-efficiency and rapid detection and tracking method for water surface floaters is one of the problems to be solved urgently in water pollution control.

At present, the detection technology of the water surface floaters mainly expands around two aspects of traditional image processing and a deep learning-based method, the traditional image processing is characterized by fast detection speed by recognizing foreground and background characteristics of images and a filtering theory, but has higher false detection rate and omission rate, and cannot meet the requirement of robustness. Along with the rapid development of artificial intelligence and deep learning technology, the object detection under the complex water surface environment is realized by extracting the object characteristic information of the floaters and training the model through the neural network of the multilayer winder. The SSD model is a mainstream method applied to detection and identification of a water surface floater target at present due to the advantages of accuracy and speed of the SSD model, but when the size of the floater target is small, slow motion and background complexity are changed greatly, the detection error rate is increased continuously, and meanwhile, an algorithm needs support of high-power-consumption experimental equipment and is difficult to transplant into embedded equipment. At present, the technology for detecting the object of the floating object on the water surface is mostly single-frame detection, and the problems of object omission and the like are easily caused when the video is processed because the time correlation between frames of the video and the motion state information of the object of the floating object are not fully utilized.

Target tracking is more demanding in terms of real-time than target detection. The float tracking technology is mainly divided into a traditional target tracking method, a deep learning algorithm and kernel correlation filtering. Particle filters, kalman filters, optical flow algorithms, intelligent particle filters, etc. in conventional target tracking techniques have significantly degraded performance when the tracking background is complex. The target tracking algorithm based on deep learning adopts a multi-layer convolutional neural network to extract the characteristics of the target, and replaces the original manual characteristics used in the tracking algorithm framework, such as ECO, MDNet, SANet, branchOut, daSiamRPN and SPM-Tracker. The process of extracting features from a deep network often requires multiple operations, reduces the speed of the tracking algorithm, and makes it difficult to meet the real-time requirements of the monitoring system. The correlation filter tracking algorithm based on time-frequency transformation is efficient in calculation and has been widely focused in recent years. The kernel correlation filtering algorithm improves the performance of tracking a floating object in a complex river scene through classifier training, target detection and model updating, but the dynamic change of the dimension of the floating object often generates tracking drift, and the problem of dimension change caused by the near-far motion state of the floating object in a fixed monitoring scene cannot be solved. The target tracking algorithm is mainly used for processing the target based on time information in the video sequence, the accuracy of determining spatial information such as specific positions, dimensions and the like of the target is low, errors can be accumulated continuously in the tracking process, and the target drifts to cause tracking failure.

Based on the actual water surface floater detection and tracking requirements, the current domestic and foreign related research results have precision and speed and are influenced by complex water surface environments, and the requirements of water surface floater detection and tracking cannot be met. On the basis of continuously expanding the deep learning technology, the invention provides the water surface floater target detection and tracking method based on space-time information fusion, so as to achieve the accurate detection and tracking of the water surface floater. In the single-frame detection stage, the method obtains the space information of the small-scale floaters by deleting a deep low-resolution detection layer and an enhanced shallow high-resolution characteristic layer in the SSD network, so that the detection problem of the small-scale floaters is solved; in the multi-frame filtering stage, using relevant filtering tracking to calculate the time information correlation of gradient characteristics between video frames so as to reduce the omission ratio; in the information fusion stage, information obtained by improving the SSD detection network and the nuclear correlation filtering algorithm is fused through feature comparison, so that the detection tracking speed and the detection tracking accuracy are ensured.

Disclosure of Invention

The invention aims to provide a method for detecting and tracking a water surface floater target based on temporal and spatial information fusion, which aims to solve the problems of precision, speed and scale of the existing target detection and tracking algorithm and reduce the influence of external illumination, shielding and deformation on target detection and tracking.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a method for detecting and tracking a water surface floater target based on space-time information fusion comprises the following steps:

s1: acquiring a water surface floater video data set of different time, place, season and weather conditions through a camera;

s2: inputting water surface floater video data in single frame detection, and acquiring a water surface floater target detection frame of a current frame through an improved SSD detection algorithm; the method comprises the following steps:

the improved SSD destination detection algorithm is mainly to adjust the structure of a deep low resolution detection layer and a shallow high resolution detection layer. First, because the detection layer with the resolution of 5*5 and below cannot extract the important features of the small-scale floater target, the detection layers with the resolutions of 5*5, 3*3 and 1*1 in the SSD network are deleted in the technical scheme; and secondly, carrying out enhancement treatment on the shallow high-resolution characteristic layer by adopting a characteristic summation (Add) mode. Firstly, carrying out feature addition summation treatment on a feature layer with a resolution of 76 x 76 and a feature layer F1 with a resolution of 38 x 38, adopting 1*1 point convolution to reduce the dimension of the F1 layer, and carrying out up-sampling to obtain F0 ^a Ensure F1 and F0 ^a Having the same resolution and channelA number; next, F0 is taken ^a The layers are added to the F0 layer according to the pixel by pixel, and feature fusion is carried out through convolution layer smoothing processing to obtain F0 ^o . The method for acquiring the video target detection frame by improving the SSD detection algorithm mainly comprises the following steps:

(2.1) generating default boxes with the same different areas of the aspect ratio on different scale feature maps of the video frame using a feature pyramid network in the improved SSD destination detection algorithm.

And (2.2) inputting the default frame in the step (2.1) as an abstract feature into a convolution predictor in an improved SSD target detection algorithm for training, and predicting and classifying the target position offset of the floater by using the trained convolution predictor to obtain a target detection frame.

(2.3) deleting the detection frames with the confidence coefficient lower than 0.7 from all the target detection frames obtained in the step (2.2), deleting redundant and repeated detection frames by adopting a non-maximum value inhibition method, and obtaining all the target detection frames of the i-th frame (i=1, 2,3 … … X) water surface floater targets in the video by screening twice.

S3: in multi-frame filtering, taking the target detection frame of the first frame in the step S2 as input of an improved KCF tracking algorithm, and carrying out target tracking to obtain a target tracking frame of the water surface floater under the current video frame; the method comprises the following steps:

the improved KCF tracking algorithm consists of a position estimate and a scale estimate. Firstly, according to the technical scheme, the position information of a target object is determined by extracting the directional gradient Fast Histogram (FHOG) characteristic of a floater target and training a KCF to obtain a characteristic response chart; secondly, the technical scheme adopts a pyramid sampling scale estimation strategy, pyramid multi-scale sampling is carried out around the position of the floating object, and an image training scale filter is adopted to determine the optimal scale information of the floating object. The method for acquiring the video target tracking frame by improving the KCF tracking algorithm mainly comprises the following steps of:

and (3.1) obtaining N water surface floater target detection frames in the 1 st frame of video frame through the step S2, firstly deleting target detection frames with reliability lower than 0.7 in the N water surface floater target detection frames, and then deleting the detection frames with high redundancy and repeated detection again by utilizing a non-maximum value inhibition algorithm to obtain M target detection frames.

And (3.2) inputting coordinate values of the M floater target detection frames in the step (3.1) into an improved KCF tracking algorithm, initializing an improved KCF position filter template and a scale filter template, constructing a cyclic matrix, and training a position filter of the improved KCF by using FHOG characteristics. An improved KCF scale filter is then constructed, and a scale adaptive strategy is employed to select the best scale of the float target near the location generated on the location filter.

And (3.3) setting the tracking fixed frame number of the video frame as T (T < X), and repeating the steps (3.1) - (3.3) according to the fixed frame number until the target tracking frame of the water surface floater under the T frame is obtained.

S4: and (3) after the tracking of the fixed frame number T in the video frame is completed in the information fusion, repeating the step S2 to obtain a new detection frame of the floater target of the video frame of the next frame, and comparing the coincidence ratio of the new detection frame and the old tracking frame by adopting a candidate frame selection strategy to carry out tracking judgment. If the contact ratio meets the condition, continuing to track the current floater target; if the contact ratio does not meet the condition, the floater target is regarded as a new target and is output, and the initialization of the KCF tracking algorithm is improved to track the new target; the method comprises the following steps:

(4.1) tracking a fixed frame number T (T is less than X) through the step S3, and then obtaining a target tracking frame of the water surface floater;

(4.2) after the tracking of the fixed frame number T is completed, introducing an improved SSD detection algorithm again into the next frame of video frame to obtain a new detection frame of the floater target. And (3) calculating the coincidence ratio of the new detection frame and the old tracking frame in the step (S3) to judge whether the detection and tracking are the same target or not. If the overlap ratio is less than or equal to 0.4, setting a floater in the video frame as a new target, initializing and improving a KCF algorithm, and repeating the step S3; if the coincidence ratio is more than 0.4, setting the new detection frame and the old tracking frame as the same target, comparing the confidence coefficient of the new detection frame with the normalized response of the old tracking frame, and taking the candidate frame with larger confidence coefficient as output.

And (4.3) setting the total frame number of the video frames as X, and repeating the steps (4.1) - (4.3) until the object tracking frame of the water surface floater under the X-th frame is obtained.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention comprises three parts of single frame detection, multi-frame filtering, space-time information fusion and the like, adopts an improved SSD detection algorithm to obtain the space position information of the floater target, adopts an improved KCF tracking algorithm to obtain the time and motion information of the floater target, judges whether the detection and tracking information belongs to a new floater target or not through a space-time fusion strategy, reduces the false detection rate and the omission rate of a target object, and improves the detection and tracking precision of the floater.

(2) The invention improves the traditional SSD detection algorithm, and improves the detection precision of the SSD detection algorithm aiming at the small-sized floater target by adjusting the structures of the deep low-resolution detection layer and the shallow high-resolution detection layer.

(3) The method improves the traditional KCF tracking algorithm, introduces a pyramid sampling floater scale estimation algorithm with better robustness, and relieves the problem of reduced tracking performance caused by incapability of realizing scale self-adaptive adjustment of the KCF algorithm, so that the floater target can be accurately tracked and positioned when complex conditions such as scale change, shielding and the like occur in the tracking process.

Drawings

FIG. 1 is a flow chart of a method for detecting and tracking a water surface floater target based on temporal and spatial information fusion.

Fig. 2 is a schematic diagram of an improved SSD detection algorithm.

FIG. 3 is a schematic diagram of feature fusion.

Fig. 4 is a schematic diagram of an improved KCF tracking algorithm.

Fig. 5 is a schematic diagram of a temporal-spatial information fusion strategy.

FIG. 6 is a schematic diagram of the performance of the method of the present invention in comparison to other detection tracking algorithms; FIG. 6 (a) is a 15 th frame detection trace; FIG. 6 (b) is a 232 th frame detection trace; FIG. 6 (c) is a 467 th frame detection trace; FIG. 6 (d) is a 23 rd frame detection trace; FIG. 6 (e) is a frame 145 detection trace; fig. 6 (f) is a 379 th frame detection trace.

Detailed Description

The detailed description and mode of carrying out the invention will be presented so that those skilled in the art can more clearly understand the invention. It is to be understood that the invention is not limited to the specific embodiments, but is intended to cover modifications within the spirit and scope of the present invention as defined by the appended claims.

As shown in fig. 1, a method for detecting and tracking a water surface floater target based on space-time information fusion is implemented as follows:

s1: recording complete video frames of the water surface floaters in 70 days by 5 cameras;

s2: and (3) processing the aspect ratio of the floater target in the video frame data set in the step S1 by adopting a K-means cluster analysis algorithm to obtain n cluster centers serving as initial values of a default frame of the target detection algorithm. And generating default frames with the same area and different aspect ratios on feature images of different scales of the video frame by adopting a feature pyramid network in an improved SSD target detection algorithm, inputting the default frames into a convolution predictor for training, and predicting and classifying the position offset of a floater target to obtain a target detection frame. And deleting the redundant and repeated detection frames by deleting the detection frames with the confidence coefficient lower than 0.7, and adopting a non-maximum value inhibition method to obtain the detection frame of the object of the water surface floater of the ith frame (i=1, 2,3 … … X) in the video. The initial default box generation method is as follows:

wherein s is _k Represents the default frame area on the kth feature map, m represents the feature map number, s _max Representing the maximum area of the default frame s _min Representing the minimum area of the default box, and k represents the kth feature map.

S3: and constructing a cyclic matrix of the floater target by taking the space coordinate information of a detection frame of an ith frame (i=1, 2,3 … … T, T < X) as an initial value of an improved KCF tracking algorithm, and performing ridge regression on a sample in a kernel space by using a Gaussian kernel function to solve a classifier. The method comprises the following steps:

ω＝(X ^T X+λI) ^-1 X ^T y

wherein X represents diagonalization of the cyclic matrix, X ^T X represents a kernel matrix of the kernel space, lambda represents a regularization parameter, y represents a target label vector, and lambda represents a Fourier transform operation. The solution of ridge regression based on kernel function is obtained by using the characteristics of the cyclic matrix:

α＝(K+λI) ^-1 y

in the formula, K represents the first line of the kernel matrix K, a new sample is obtained by extracting the kernel cyclic matrix through the characteristics aiming at the next frame of video frame sample z, and the response to all test samples in the Fourier domain is as follows:

in the method, in the process of the invention,

representing a kernel matrix K ^z First line data, K ^z The method is a cyclic matrix formed by training a sample set x and a sample set z to be tested, and the position with the largest data in the response chart f (z) is the position of the floater target.

And constructing a scale filter, obtaining the position f (z) of the floater target through convolution operation, and training the scale filter by using a minimized objective function. The loss function is as follows:

in the method, in the process of the invention,

representing the ith characteristic channel in the scale filter, r represents trainingSamples of scale filter, y _s Represents the desired output value, d ^s Representing the number of characteristic channels.

Updating the scale filter, applying the scale filter to the sample to be measured after dimension reduction compression, calculating a scale correlation score, wherein the highest scale correlation score is the final scale of the floating object, thereby realizing the self-adaptive estimation of the scale of the floating object.

S4: based on the target tracking result of the fixed frame number T, repeating the step S2, and reintroducing a new detection frame S for improving the SSD detection algorithm to obtain the next frame of video frame floater target _i Calculate a new detection frame S _i And the old tracking frame K in the step S3 _i The overlap ratio (IntersectionOverUnion, IOU) is adopted as a criterion of the overlap ratio, and the calculation method is as follows:

wherein S is _i ∩K _i Representing a new detection frame S _i And old tracking frame K _i Is the intersection of S _i ∪K _i Representing a new detection frame S _i And old tracking frame K _i Is a union of (a) and (b). If it is

Setting a floater in the video frame as a new target, and participating in the initialization of a tracking algorithm; if->

Setting a new detection frame S _i And old tracking frame K _i For the same purpose, the detection frame confidence conf of the improved SSD detection algorithm is compared (S _i ) Confidence conf (K) of tracking frame with improved KCF tracking algorithm _i ) And taking the candidate frame with larger confidence as a floater target tracking frame.

The improved SSD target detection algorithm in the invention is used as a water surface floater target detection tool, and aims to identify the information of the types, the quantity, the scale and the like of floaters. The main structure of the improved SSD destination detection algorithm in step S2 is shown in fig. 2. First of all, the first one,the deep low resolution detection layer is deleted. In the input image with the resolution of 300×300, the object of the floating object on the water surface contains partial low pixels and small-scale images, the pixels contained in the object are less than 10×10, the resolution of the detection layer F4 is 5*5 because of low resolution, blurred image, less information and more noise, the resolution is reduced by 58 times compared with the original input image, and the floating object under the detection layer is blurred and the shape and appearance information are greatly reduced, so that the resolution of the detection layer F4 with the resolution of 5*5 and below cannot extract the important features of the object of the small-scale floating object. The technical scheme is to delete the detection layers of 5*5, 3*3 and 1*1 resolution in the SSD network. Second, the shallow high resolution detection layer is enhanced. The technical scheme adopts a 76 x 76 high-resolution characteristic layer, but the too shallow high-resolution characteristic layer can lead to insufficient semantic information contained in the high-resolution characteristic layer, so that targets are difficult to distinguish, and the characteristics of the targets are required to be enhanced. According to the technical scheme, the shallow high-resolution characteristic layer is enhanced by adopting a characteristic summation (Add) mode in characteristic fusion, and compared with a characteristic splicing (Concat) fusion mode, the method has the advantages of saving parameters and calculation amount. The feature fusion process is shown in fig. 3. Firstly, performing feature addition and summation processing on a feature layer F0 with a resolution of 76 x 76 and a feature layer F1 with a resolution of 38 x 38, performing dimension reduction on the F1 layer by adopting 1*1 point convolution, and performing up-sampling to obtain F0 ^a Ensure F1 and F0 ^a Having the same resolution and channel number; next, F0 is taken ^a The layers are added to the F0 layer according to the pixel by pixel, and feature fusion is carried out through convolution layer smoothing processing to obtain F0 ^o 。

The main structure of the improved KCF target tracking algorithm in the above step S3 is shown in fig. 4. The technical scheme provides a scale self-adaptive floating target tracking method based on multi-feature fusion, which comprises position estimation and scale estimation. In the position estimation stage, extracting a directional gradient Fast Histogram (FHOG) characteristic of a floater target, training a KCF to obtain a characteristic response diagram and determining the position of the target object; in the scale estimation stage, pyramid multi-scale sampling is carried out around the position of the floating object by adopting a pyramid sampling scale estimation strategy, and the optimal scale of the floating object is determined by adopting an image training scale filter.

The above strategy of temporal and spatial information fusion in S4 is shown in fig. 5. During fusion, a detection candidate frame S is obtained in a first frame image by improving an SSD detection algorithm _i Determining the space position information of the target object; then the position information of the first frame of target is used as the input of an improved KCF tracking algorithm, and the subsequent frames track the target by adopting the improved KCF algorithm to acquire a tracking candidate frame K _j . And running a re-detection mechanism after tracking the fixed frame number, and ensuring the accuracy of continuous detection tracking by improving an SSD detection algorithm. By calculating S _i And K _j And (3) an overlap ratio (IOU) to determine whether the detection trace is the same float target.

In order to verify the feasibility and effectiveness of the invention, the invention is further described below with reference to examples.

Based on Ubuntu 18.04LTS operating system, the computer is configured as an Inter 7 CPU,32G running memory, the matched display card is RTX 3080, the algorithm platform is a Pytorch 1.10 deep learning framework of Python3.8 version, and the technology is subjected to performance evaluation by means of center position error (CLE), overlapping area ratio (OR) and detection tracking precision (DP).

FIG. 6 is a comparison of the performance of the method of the present invention with other detection tracking algorithms. In fig. 6 (a), the conventional SSD detection algorithm can detect a large-scale float, but cannot detect a small-scale float, and causes false detection. The SSD detection algorithm is improved, more semantic information is obtained through enhancing the shallow high-resolution 76-76 feature layer, a small-scale floater target is detected, the background of a floater moving area is relatively simple, and several detection tracking algorithms realize accurate tracking; in fig. 6 (b), the floater enters a wide range of the back-image and light-image change area, the water surface light-image change range is large, the degree of distinction between the floater and the back-image area in color is reduced, and although the improved SSD detection algorithm can detect the floater target, the tracking frame deviates from the center position of the target. The improved KCF tracking algorithm automatically deletes floaters deviating from the filter center position based on the history information before 232 frames, and has certain robustness for light and shadow. In fig. 6 (c), a plurality of strip-shaped ghosts appear on the regional background, the conventional SSD detection algorithm has false detection, two small-scale floaters are detected as 1 detection frame, and the improved SSD algorithm and the improved KCF algorithm keep better detection tracking results, but are lower than the fusion algorithm in tracking accuracy. Meanwhile, the conventional SSD algorithm has the problem of missed detection in each of the FIG. 6 (d), the FIG. 6 (e) and the FIG. 6 (f), and the improvement of the SSD algorithm optimizes the deep low-resolution and shallow high-resolution characteristic layer structure and effectively identifies small-scale water surface floaters. The tracking results of the improved SSD algorithm and the improved KCF algorithm are smaller in the first 23 frames, the tracking frames are basically overlapped, but the tracking accuracy is lower than that of the fusion algorithm; when 145 frames are formed, the improved SSD detection algorithm misdetects the reflective object as a floater target, and the improved KCF algorithm adaptively adjusts the tracking scale, so that the drift of a tracking frame is avoided; the area highlight reflection level is reduced at 379 frames, and the tracking frames are substantially in an overlapped state. In a complex water surface environment, the detection tracking performance can be better by fusing a small-scale detection algorithm and a KCF tracking algorithm.

The examples described above represent only embodiments of the invention and are not to be understood as limiting the scope of the patent of the invention, it being pointed out that several variants and modifications may be made by those skilled in the art without departing from the concept of the invention, which fall within the scope of protection of the invention.

Claims

1. The method for detecting and tracking the object of the water surface floater based on the space-time information fusion is characterized by comprising the following steps of:

(2.1) generating default frames with the same area and different aspect ratios on feature graphs of different scales of the video frames by adopting a feature pyramid network in an improved SSD destination detection algorithm;

(2.2) inputting the default frame in the step (2.1) as abstract features into a convolution predictor in an improved SSD target detection algorithm for training, and predicting and classifying the target position offset of the floater by using the trained convolution predictor to obtain a target detection frame;

(2.3) deleting the detection frames with the confidence coefficient lower than 0.7 from all the target detection frames obtained in the step (2.2), deleting redundant and repeated detection frames by adopting a non-maximum value inhibition method, and obtaining all the target detection frames of the i-th frame (i=1, 2,3 … … X) water surface floater targets in the video by screening twice;

(3.1) obtaining N water surface floater target detection frames in a 1 st frame of video frame through the step S2, firstly deleting target detection frames with reliability lower than 0.7 in the N water surface floater target detection frames, and then deleting the detection frames with high redundancy and repeated detection again by utilizing a non-maximum value inhibition algorithm to obtain M target detection frames;

(3.2) inputting coordinate values of the M floater target detection frames in the step (3.1) into an improved KCF tracking algorithm, initializing an improved KCF position filter template and a scale filter template, constructing a cyclic matrix, and training a position filter of the improved KCF by using FHOG characteristics; then constructing an improved KCF scale filter, and selecting the optimal scale of the floater target near the position generated on the position filter by adopting a scale self-adaptive strategy;

setting the tracking fixed frame number of the video frame as T (T < X), and repeating the steps (3.1) - (3.3) according to the fixed frame number until a water surface floater target tracking frame under the T frame is obtained;

s4: after the tracking of the fixed frame number T in the video frame is completed in the information fusion, repeating the step S2 to obtain a new detection frame of the floater target of the video frame of the next frame, and comparing the coincidence ratio of the new detection frame and the old tracking frame by adopting a candidate frame selection strategy to carry out tracking judgment; if the contact ratio meets the condition, continuing to track the current floater target; if the contact ratio does not meet the condition, the floater target is regarded as a new target and is output, and the initialization of the KCF tracking algorithm is improved to track the new target; the method comprises the following steps:

(4.2) after the tracking of the fixed frame number T is completed, reintroducing an improved SSD detection algorithm into the next frame of video frame to obtain a new detection frame of the floater target; calculating the coincidence ratio of the new detection frame and the old tracking frame in the step S3, and judging whether the detection and tracking are the same target or not; if the overlap ratio is less than or equal to 0.4, setting a floater in the video frame as a new target, initializing and improving a KCF algorithm, and repeating the step S3; if the coincidence ratio is more than 0.4, setting the new detection frame and the old tracking frame as the same target, comparing the confidence coefficient of the new detection frame with the normalized response of the old tracking frame, and taking the candidate frame with larger confidence coefficient as output;

2. The method for detecting and tracking the object of the water surface floater based on the temporal-spatial information fusion according to claim 1, wherein the step S2 is to improve the SSD object detection algorithm mainly by adjusting the structures of the deep low resolution detection layer and the shallow high resolution detection layer; first, the detection layers of 5*5, 3*3 and 1*1 resolution in the original SSD network are deleted; secondly, carrying out enhancement treatment on the shallow high-resolution characteristic layer by adopting a characteristic summation mode; firstly, carrying out feature addition summation treatment on a feature layer with a resolution of 76 x 76 and a feature layer F1 with a resolution of 38 x 38, adopting 1*1 point convolution to reduce the dimension of the F1 layer, and carrying out up-sampling to obtain F0 ^a Ensure F1 and F0 ^a Having the same resolution and channel number; next, F0 is taken ^a The layers are added to the F0 layer according to the pixel by pixel, and feature fusion is carried out through convolution layer smoothing processing to obtain F0 ^o 。

3. The method for detecting and tracking the object of the water surface floater based on the temporal-spatial information fusion according to claim 1, wherein the step S3 is characterized in that an improved KCF tracking algorithm consists of position estimation and scale estimation; firstly, training KCF to obtain a characteristic response diagram to determine the position information of a target object by extracting the directional gradient fast histogram characteristic of a floater target; secondly, pyramid multi-scale sampling is carried out around the position of the floating object by adopting a pyramid sampling scale estimation strategy, and the optimal scale information of the floating object is determined by adopting an image training scale filter.