CN116721132B - Multi-target tracking method, system and equipment for industrially cultivated fishes - Google Patents

Multi-target tracking method, system and equipment for industrially cultivated fishes Download PDF

Info

Publication number
CN116721132B
CN116721132B CN202310728939.9A CN202310728939A CN116721132B CN 116721132 B CN116721132 B CN 116721132B CN 202310728939 A CN202310728939 A CN 202310728939A CN 116721132 B CN116721132 B CN 116721132B
Authority
CN
China
Prior art keywords
target
fish
tracking
track
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310728939.9A
Other languages
Chinese (zh)
Other versions
CN116721132A (en
Inventor
段青玲
刘怡然
李备备
李道亮
周新辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202310728939.9A priority Critical patent/CN116721132B/en
Publication of CN116721132A publication Critical patent/CN116721132A/en
Application granted granted Critical
Publication of CN116721132B publication Critical patent/CN116721132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/80Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
    • Y02A40/81Aquaculture, e.g. of fish

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-target tracking method, a system and equipment for industrially cultured fishes, and relates to the field of multi-target tracking based on computer vision. According to the invention, the target output comprising factors such as a detection target frame, a target displacement prediction result, a target appearance representation result and the like can be obtained by constructing the fish multi-target tracking joint model, the target output is utilized to carry out linear matching on the detection target and the motion track, when the matched track cannot be obtained through the target displacement prediction result, the identity is recovered through the appearance representation result, the problems of tracking accuracy reduction caused by foam shielding, water ripple interference and fish deformation can be effectively solved, and the identity can be recovered after long-time shielding. In addition, the invention can simultaneously obtain the target displacement prediction result and the target appearance representation result of the fish movement through the shared characteristics, thereby greatly saving the reasoning time and realizing the multi-target online tracking of the fish in the industrial culture environment.

Description

Multi-target tracking method, system and equipment for industrially cultivated fishes
Technical Field
The invention relates to the field of multi-target tracking based on computer vision, in particular to a multi-target tracking method, a multi-target tracking system and multi-target tracking equipment for industrially cultured fishes.
Background
The multi-target tracking aims to obtain a track of a target set, and the behavior of the target can be judged according to track information, so that the method has wide application in the fields of video monitoring, automatic driving and behavioural research. The industrial aquaculture is a development trend of aquaculture, the aquaculture environment of the aquaculture is monitored in an industrial aquaculture workshop through an intelligent monitoring technology, and abnormal states and behaviors of the aquaculture are analyzed through analysis of video data, so that the industrial aquaculture system has important significance in realizing healthy aquaculture of the aquaculture. The trace information generated by tracking the fish can mine the fish behavior mode, evaluate the fish health level and perform water quality abnormality early warning, so that the tracking and the identification of the individual fish are key steps for analyzing the fish behavior by utilizing a computer vision technology.
In recent years, many studies on multi-target tracking in video have been directed to pedestrian and vehicle studies for realizing automatic driving of vehicles, and there have been some studies for tracking other targets in industrial production to assist in safe production. The detection-based approach is the current mainstream approach to multi-target tracking. The tracking task needs to recover the identity of the cross-frame object after detecting each frame of image target, according to the appearance characteristics or motion rule of the object. Tracking methods based on detection can be classified into a separation model (two-stage model) and a joint model depending on whether the step of acquiring appearance and motion characteristics is separated from the detection step. The advantage of separate models is that the accuracy is relatively high, since each model is the result of an optimization for a certain task. The joint model has the advantage of sharing the features and can reduce the reasoning time.
In industrial aquaculture, there have been studies on tracking fish and obtaining their activity trajectories. The problem of fish tracking is in many ways characterized differently than human tracking. Firstly, the appearance similarity of fishes is high, and the differences in the color, shape, texture and the like of the same fishes are not obvious enough. Thus, the need to train better characterizations for identifying fish individuals. Secondly, fish in the cultivation environment can encounter the problems of foam shielding and water surface reflection due to the fact that the aerator is started, so that long-time shielding is caused, and the movement model is likely to fail. Third, the opening of the aerator can lead to shaking of water waves, the appearance of the same fish can be changed very drastically, and the appearance model is likely to fail. Based on the three characteristics, during fish tracking, an appearance model and a motion model are required to be established. However, most of the current fish-tracking models are separation models, and problems of reduced efficiency and expansibility are likely to be encountered.
Disclosure of Invention
The invention aims to provide a multi-target tracking method, a multi-target tracking system and multi-target tracking equipment for industrially cultured fishes, which are used for solving the problems that the tracking accuracy is low, and abnormal behaviors of the fishes and abnormal quality of the cultured fishes cannot be found in time.
In order to achieve the above object, the present invention provides the following solutions:
a multi-target tracking method for industrially cultured fishes comprises the following steps:
acquiring fish videos and constructing a multi-scale tracking joint model; the multi-scale tracking joint model is a fish multi-target tracking network trained by adopting a data set;
acquiring a fish video, and inputting the fish video into the multi-scale tracking joint model to obtain target output; the target output includes: detecting a target frame, a target displacement prediction result and a target appearance representation result;
performing offset correction on the detection result through a target displacement prediction result to obtain a correction result, and dividing a high-score target and a low-score target according to a set condition;
adopting a Jonker-Volgenant algorithm, and linearly distributing a target track in a target displacement prediction result and the Gao Fenmu target by taking a GIoU distance as a cost matrix;
in the process of linearly distributing the target track in the target displacement prediction result and the Gao Fenmu target, judging whether all target tracks are matched to obtain a first judgment result;
when the first judgment result is negative, carrying out linear distribution on the target track which is not obtained to be matched with the low-resolution target;
In the process of linearly distributing the target track which is not matched with the low-resolution target, judging whether all the target tracks are matched to obtain a second judging result;
when the second judging result is negative, taking the target track which does not obtain the matching as an inactive track, and discarding the low-score targets which fail to match;
linearly distributing the inactive track and the target appearance representation result, recovering the inactive track matched with the target appearance representation result into an active track, and generating a fish multi-target tracking track;
and when the first judging result or the second judging result is yes, the fish target detection frame is incorporated into the tracking track, and a fish multi-target tracking track is generated.
Optionally, the multi-scale tracking joint model constructed includes: a dual encoder structure and a dual decoder structure;
the first encoder in the dual encoder structure is used for extracting multi-scale image features;
the second encoder in the double encoder structure is a space-time information encoder and is used for encoding historical information of fish positions in a video sequence into a query embedding and fusing the space-time information of the fish positions;
The double decoder structure comprises a tracking decoder for motion trail prediction and a re-identification decoder for appearance modeling;
an iterative query embedding is arranged between the encoder and the decoder for transmitting the merged fish position history information.
Optionally, the process of extracting the multi-scale image features by the first encoder in the dual encoder structure includes:
converting the feature map with any scale into a sequence by adopting a block cutting and embedding module;
and inputting the sequence into a spatial thumbnail attention layer to perform spatial thumbnail operation, and generating multi-scale image features.
Optionally, the operation of the spatial thumbnail attention layer is:
SR(X)=Norm(Reshape(X,R i )W S )
wherein SRA (&) is a space abbreviated attention function, Q is a query, K is a bond, and V is a value; concat (-) is a splicing operation; and->Parameters of linear mapping, C i For the number of channels of the feature map, d head Mapping dimensions for the attention header; head part j For attention header, j ε [0, N i ],N i The attention head number of the attention layer is shortened for the space of the ith stage; attention (&) is a function that maps a query and a set of key-value pairs to an output; SR (&) space thumbnail operation, X is an input variable, R i The compression rate of the attention layer is shortened for the space of the ith stage; reshape (·) is to change the feature map shape.
Optionally, the process of encoding the historical information of the fish's position in the video sequence into a query embedding by the second encoder in the dual encoder structure includes:
performing bilinear interpolation sampling on a 1/4W H scale feature map of a previous frame in a fish video sequence according to the position of a target fish center point to generate a sampled fish spatial position feature;
performing multi-head self-attention mechanism operation after the sampled position is normalized, and generating attention weight;
weighting and summing the fish space position feature images according to the attention weight, and performing standardization processing;
and inputting the normalized characteristic diagram into a forward network, and carrying out residual addition operation to enable the fish position information in the fish video sequence to be encoded into a query embedding.
Optionally, the operation of the multi-headed self-attention mechanism is:
MHA(Q,K,V)=Concat(head 0 ,…,head N )W O
head i =Attention(QW i Q ,KW i K ,VW i V )
wherein MHA (·) is a multi-head attention mechanism function, Q is a query, K is a bond, and V is a value; w (W) O ,W i Q ,W i K And W is i V The parameter of the linear mapping is that i is the attention head serial number; head part N N is the number of attention heads; d, d head Mapping dimensions for the attention header; softmax (·) is the change of the feature map shape.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: according to the industrial cultured fish multi-target tracking method provided by the invention, the target output comprising factors such as the detection target frame, the target displacement prediction result, the target appearance representation result and the like can be obtained by constructing the fish multi-target tracking joint model, the target output is utilized to carry out linear matching on the detection target and the motion track, when the matched track cannot be obtained through the target displacement prediction result, the identity is recovered through the appearance representation result, the problems of reduced tracking accuracy caused by foam shielding, water ripple interference and fish deformation can be effectively solved, and the identity can be recovered after long-time shielding. In addition, the invention can simultaneously obtain the target displacement prediction result and the target appearance representation result of the fish movement through the shared characteristics, thereby greatly saving the reasoning time and realizing the multi-target online tracking of the fish in the industrial culture environment.
In addition, the invention also provides the following three implementation structures for implementing the multi-target tracking method for the industrially cultured fish.
The multi-target tracking system for the industrially cultured fishes is applied to the multi-target tracking method for the industrially cultured fishes; the system comprises:
the model construction module is used for acquiring fish videos and constructing a multi-scale tracking joint model; the multi-scale tracking joint model is a fish multi-target tracking network trained by adopting a data set;
the target output module is used for inputting the fish video image into the multi-scale tracking joint model to obtain target output; the target output includes: detecting a target frame, a target displacement prediction result and a target appearance representation result;
the offset correction module is used for carrying out offset correction on the detection result through the target displacement prediction result to obtain a correction result, and acquiring a high-score target and a low-score target according to a set condition;
the linear distribution module is used for adopting a Jonker-Volgenant algorithm and linearly distributing the target track in the target displacement prediction result and the Gao Fenmu target by taking the GIoU distance as a cost matrix;
The first judging module is used for judging whether all target tracks are matched in the process of linearly distributing the target track in the target displacement prediction result and the Gao Fenmu target to obtain a first judging result; when the first judgment result is negative, carrying out linear distribution on the target track which is not obtained to be matched with the low-resolution target;
the second judging module is used for judging whether all the target tracks are matched or not in the process of linearly distributing the target tracks which are not matched with the low-resolution targets, so as to obtain a second judging result; when the second judging result is negative, taking the target track which does not obtain the matching as an inactive track, and discarding the low-score targets which fail to match;
the identity recovery module is used for carrying out linear distribution on the inactive track and the target appearance representation result, recovering the inactive track matched with the target appearance representation result into an active track, and generating a fish multi-target tracking track;
and when the first judging result or the second judging result is yes, the fish target detection frame is included in the tracking track, and a fish multi-target tracking track is generated.
An electronic device comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program to enable the electronic device to execute the industrial aquaculture fish multi-target tracking method.
A computer readable storage medium storing a computer program which when executed by a processor implements the above-described multi-objective fish-tracking method for industrial farming.
The technical effects achieved by the three implementation structures provided by the invention are the same as those achieved by the method provided by the invention, so that the description is omitted here.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a multi-target tracking method for industrially cultivated fish provided by the invention;
FIG. 2 is a diagram of the overall structure provided by the present invention;
FIG. 3 is a schematic diagram of the operation of an iterative query in the present invention;
FIG. 4 is a flowchart of another method for tracking multiple targets of industrially cultivated fish according to the present invention;
FIG. 5 is a diagram of a multi-objective tracking effect provided by the present invention; fig. 5 (a) is a diagram of a multi-target tracking effect at time t; FIG. 5 (b) is a graph of multi-objective tracking effects at time t+10; FIG. 5 (c) is a graph of multi-objective tracking effects at time t+20; fig. 5 (d) is a diagram of the multi-target tracking effect at time t+30.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a multi-target tracking method, a multi-target tracking system and multi-target tracking equipment for industrially cultured fishes, which improve tracking accuracy and timely find abnormal behaviors of the fishes and abnormal water quality of the cultured fishes.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
As shown in fig. 1-2, the present invention provides a multi-target tracking method for industrially cultured fish, comprising:
step 100: and obtaining fish videos and constructing a multi-scale tracking joint model. The multi-scale tracking joint model is a fish multi-target tracking network trained by adopting a data set.
The multi-scale tracking joint model constructed by the method comprises the following steps: a dual encoder structure and a dual decoder structure. The first encoder in the dual encoder structure is used to extract multi-scale image features. The second encoder in the dual encoder structure is a spatial-temporal information encoder for encoding historical information of fish's position in the video sequence into a query embedding for fusing the spatial-temporal information of fish's position. The dual decoder structure includes a tracking decoder for motion trajectory prediction and a re-recognition decoder for appearance modeling. An iterative query embedding is arranged between the encoder and the decoder for transmitting the merged fish position history information.
In practice, the invention includes sequential dual encoders, with the first PVT (Pyramid Vision Transformer transliterated: pyramid vision transformer) as the backbone network, using a progressive contraction strategy to generate the signature. For feature graphs of any scale, the pictures are converted into sequences through a Patch embedding, then a spatial-thumbnail (SRA) is passed through a spatial-thumbnail (spatial-reduction attention), and before the attention operation, a spatial-thumbnail operation is carried out, so that the calculation amount and the memory occupation can be reduced.
The second encoder (spatiotemporal information encoder) encodes historical information of the fish's position in the video sequence into a query embedding by a self-attention mechanism for the encoding of position information. And performing bilinear interpolation sampling on the characteristic diagram of the t-1 frame 1/4W H according to the position of the target center point, and then performing normalization and then performing self-attention operation. Finally, after the attention weight and the input are added and standardized, the attention weight and the input are sent into a forward network, and residual error addition operation is carried out again, namely, the historical position information is fused.
In practical applications, the trace decoder and the re-identification decoder have the same structure. The tracking decoder associates the multi-scale features of the current frame with the query embedding, and can obtain an association of the position of the tracking target among the previous images with the features of the current image for subsequent tracking prediction.
In the track decoder and the re-identification decoder there is also a multi-headed self-attention structure.
In the tracking decoder, a query is derived from historical information of the target position in the spatial-temporal information encoder. The value and key are multi-scale features from the PVT output.
In the re-recognition decoder, the query is from a sample of the 1/4w x h feature map extracted for PVT, after having undergone a self-attention mechanism, the query generates an attention weight matrix for weighting the feature map, and then adds to its residual (i.e. self-attention processed query and non-self-attention processed query are added, i.e. the generated attention weights are weighted with the feature map), and then performs a cross-attention operation with the key and value.
The cross attention mechanism is the fusion of the space-time information of the target position and the current image characteristics, and the obtained weight matrix is sent into a forward propagation network after residual operation to obtain the final output. The cross-attention operation in both decoders is implemented using a multi-scale variable-attention mechanism module.
Step 101: and inputting the fish video into the multi-scale tracking joint model to obtain target output. The target output includes: detecting a target frame, a target displacement prediction result and a target appearance representation result.
In practical application, inputting the target fish position in the characteristic diagram of the previous frame and the continuous two-frame characteristic diagram in the video to be detected into a fish multi-target tracking joint model to obtain the detection target frame with scores, the predicted target displacement and the appearance characteristics of the target.
Step 102: and carrying out offset correction on the detection result through the target displacement prediction result to obtain a correction result, and dividing the high-score target and the low-score target according to the set condition.
Step 103: and linearly distributing the target track and the high-resolution target in the target displacement prediction result by using a Jonker-Volgenant algorithm and using the GIoU distance as a cost matrix.
Step 104: and in the process of linearly distributing the target track in the target displacement prediction result and the high-resolution target, judging whether all the target tracks are matched to obtain a first judgment result.
Step 105: and when the first judgment result is negative, performing linear distribution on the target track which does not obtain the matching and the low-resolution target.
Step 106: and in the process of linearly distributing the target track which is not matched with the low-resolution target, judging whether all the target tracks are matched, and obtaining a second judging result.
Step 107: and when the second judging result is negative, taking the target track which does not obtain the matching as an inactive track, and discarding the low-score targets which fail to match.
Step 108: and linearly distributing the inactive track and the target appearance representation result, recovering the inactive track matched with the target appearance representation result into an active track, and generating the fish multi-target tracking track.
Step 109: and when the first judging result or the second judging result is yes, the fish target detection frame is incorporated into the tracking track, and a fish multi-target tracking track is generated.
Further, in the actual application process, steps 102 to 108 take the GIoU distance as a cost matrix after offset correction is performed by predicting the target displacement, and the target track and the detected high-score target (detection threshold > =0.9) are linearly allocated by using a Jonker-volgent algorithm. And carrying out linear matching on the tracking tracks which do not obtain the matching for the second time with a low-resolution target (0.4= < detecthreshold < 0.9), if the matching is still unsuccessful, setting the track which fails to match as an inactive track, and discarding the detection target which fails to match. And for the detection target for which no match is obtained, a new track is set. And for the inactive tracks, performing linear distribution by using the appearance characteristics, and if the matching is obtained, recovering to be the active tracks.
In the target tracking process, the tracking threshold is set to 0.3, the matching threshold is set to 0.9, the minimum threshold of the iou distance is set to 0.1, and the lost tracking object is regarded as disappearing if the tracking object does not appear again after remaining 60 frames.
Further, to improve the accuracy of the multi-scale tracking joint model predictions, training and testing of the initially constructed network structure is required.
In the invention, a camera is arranged above a culture pond of a culture factory, and fish video data is collected in a overlooking angle. During collection, the whole fish movement range is contained in the coverage area of the lens as much as possible. The video is then segmented into short video convenient data labels of 10-50 s. And finally, marking the acquired videos frame by adopting an MOT format, and dividing the marked videos into a training set and a testing set.
Specifically, the camera is installed at a position which is 1.5m right above the culture pond, and can shoot all fish individuals in the whole culture pond. The camera parameters used to capture the video were Hikvision3T86FWDV2-I3S (8 megapixels, 4mm focal length). The acquired video resolution is 1920 x 2560 and the frame frequency is 20fps.
In practical application, the labeled fish video frames are sequentially input to the Patch email in the PVT backbone network and encoded into image blocks. And quickly inputting the encoded image into a spatial thumbnail attention layer for spatial thumbnail operation, and generating multi-scale image features.
In practical application, for a previous frame of feature map in a fish video sequence, bilinear interpolation sampling is performed according to the position of a center point of a target fish, and a sampled position is generated. And performing multi-head self-attention operation after the position normalization processing after sampling, and generating attention weights. And adding the attention weight and the characteristic diagram of the previous frame, and then carrying out normalization processing. And inputting the normalized characteristic diagram into a forward network, carrying out residual addition operation, and fusing the fish position information of the characteristic diagram of the previous frame and the fish position information of the current characteristic diagram. According to the fish video sequence, repeatedly fusing the fish position information of the previous frame of feature image and the fish position information of the current feature image, so that the fish position information in the fish video sequence is encoded into a query embedding.
In the model training part, firstly, a labeled fish video frame is input, specifically comprising a t frame image, a t-1 frame image and corresponding labels, and the working principle is shown in figure 3.
During model training, backbone network PVT uses a progressive contraction strategy to generate feature maps, requiring 4 phases if 4 different scale feature maps are to be generated. For feature graphs of any scale, a Patch embedding process is performed to convert a picture (video frame) into a sequence, and then the sequence passes through a spatial thumbnail attention layer, where the calculation method of the spatial thumbnail attention layer can be expressed by the following formula.
SR(X)=Norm(Reshape(X,R i )W S ) (3)
Wherein SRA (-) represents a spatial thumbnail attention function, Q represents a query, K represents a bond, V represents a value, concat (-) is a concatenation operation, and->Is a parameter of the linear mapping, C i Is the number of channels of the feature map, d head Is the mapping dimension of the attention header, head j Is the attention header, j E [0, N i ],N i Is the attention header number of the attention layer of the i-th stage, SR (·) is the spatial compression operation, X is the input variable, R i Representing the compression rate of the Attention layer of stage i, attention (·) is a function that maps a query and a set of key-value pairs to an output, where the query, key, value, and output are vectors.
In the model training process, the space-time information encoder is used for fusing fish position space-time information, performing bilinear interpolation sampling on a characteristic diagram of 1/4W H of a t-1 frame according to the position of a target center point, and then performing standardization and then performing multi-head self-attention operation. Finally, after the attention weight and the input are added and standardized, the attention weight and the input are sent into a forward network, and residual error addition operation is carried out again, namely, the historical position information is fused. The operation of the multi-head attention Mechanism (MHA) can be expressed by the following formula:
MHA(Q,K,V)=Concat(head 0 ,…,head N )W O (4)
head i =Attention(QW i Q ,KW i K ,VW i V ) (5)
wherein MHA (·) is a multi-head attention mechanism function, concat (·) is a splice operation, head i Is the attention head, W O ,W i Q ,W i K And W is i V Is a parameter of the linear mapping, N is the attention header number, d head Is the mapping dimension of the attention header.
The output of the spatial-temporal information encoder is the iterative query (ITQ) and can be expressed by equation (7) and equation (8).
Output encoder2 =Concat(Q t-1 ,ITQ) (7)
FFN(x)=Norm(Relu(0,xW 1 +b 1 )W 2 +b 2 ) (8)
Wherein Output is provided encoder2 Representing the output of the space-time encoder, concat (-) is a splice operation, Q t-1 Is a query at time t-1, ITQ represents an iterative query, FFN (-) represents a forward propagating network layer, norm (-) is intra-layer normalization, reLU (-) is an activation function, W 1 And W is 2 Is the weight, b 1 And b 2 Is offset.
In the model training process, 3 tasks are respectively a detection task, a tracking task and a re-identification task. As shown in fig. 4, the 3 tasks are trained by using respective loss functions, and the multiple loss functions are combined to train in a gradient descent manner, so as to obtain optimal solutions of the respective task targets. The combination of the loss functions of the plurality of tasks is shown in equation (9).
L combine =w det ×L det +w track ×L track +w reid ×L reid (9)
Wherein L is combine Is a loss function of a plurality of task combinations, L det ,L track And L reid Loss functions of detection task, tracking task and re-identification task, respectively, w det ,w track And w reid Respectively their weights.
The detection task is actually composed of 3 regression tasks, one task is to predict the target center point position, one task is to predict the width and height of the target frame, and the other is to predict the target center point offset. The tracking task is also a regression task, which uses sparse regression loss functions (Sparse Regression Loss, SRL). The loss function is calculated only when the target center point is present.
Wherein SRL is a sparse regression loss function, T xy Is the position of the true mark,is the predicted position and K is the number of target frame centers. />Is the actual response heat map of the center position (x, y) of the target frame.
Wherein G (·) is a Gaussian kernel function, (x) k ,y k ) Representing the centre of interestTrue callout, σ is kernel size.
The re-identification task employs a triplet loss function. It makes the distance between identical identity features as small as possible and the distance between different identity features as large as possible. As shown in equation (12).
Wherein L is reid Is a re-identification loss function, r i Appearance characteristic representation representing object i, p i Is equal to r i Positive example sample with same identity, n i Is equal to r i With opposite samples of different identities, margin is a given constant.
In the model training process, model parameters pre-trained on the COCO data set by PVT are first used for initialization. The size of the input picture is set to 960 x 1280, the batch size is 1, and the learning rate is 2 x 10 -5 Dropout is 0.1 and weight_decade is 10 -4 . In total, 60 epochs were trained, and the 50 th epoch started weight decay.
After the training process is finished, the fish multi-target tracking joint model can be obtained. During the tracking process, the category (because there may be multiple kinds of fishes) of the detected target in the video, the confidence, the position and the size of the center point of the detection frame, the target displacement prediction and the target appearance feature can be obtained based on the trained joint model, as shown in fig. 5.
In order to demonstrate the effect of the present invention, a related experiment was performed. The space-time information encoder (STE) and the Iterative Query (IQ) mechanism are matched to realize space-time information fusion of fish positions, so that tracking accuracy is improved. The Re-ID branch obtains a good quality characterization, improving recognition accuracy, as shown in table 1. The larger the index value of the upward arrow in the table, the better, and the smaller the index value of the downward arrow, the better.
Table 1 ablation experiment results table
As can be seen from table 1, MOTA, MT, IDF1 can reach 94.8%,93.3% and 82.5%, respectively, when both the spatial-temporal information encoder, iterative query and decoupled re-recognition decoder presented herein are present. IDS, FM and MOTP decrease to 98, 257 and 0.147, respectively, indicating that the number of ID switches during tracking, the number of track segments and the tracking position error are reduced compared to if any module were absent.
The present invention was compared to the other 10 multi-target tracking methods, as shown in table 2.
TABLE 2 fish tracking comparative experiment results table
The invention is a complete encoder-decoder structure, and iterative query embedding is arranged between the encoder and the decoder and is used for transmitting the merged fish position history information.
The first encoder in the sequential double encoder structure adopts PVT as a backbone network to extract multi-scale image characteristics, the second encoder codes position information, namely a space-time information encoder, aiming at the position information, and the space-time information encoder codes historical information of fish positions in a video sequence into a query embedding for fusing the space-time information of fish positions.
A parallel double decoder structure is adopted, namely a tracking decoder for motion trail prediction and a re-identification decoder for appearance modeling. During tracking, firstly, linear distribution is carried out by utilizing motion characteristics so as to reduce interference of shielding and deformation problems on tracking. When the matched track cannot be acquired through the motion model, the identity is recovered through the appearance model, and the long-time shielding condition can be dealt with.
The method can realize multi-target online tracking of the fish, acquire complete fish motion trail, provide support for quantitative analysis of the motion law of the fish and provide a basis for analysis of abnormal behaviors of the fish. The method not only can timely discover abnormal behaviors of the fishes and abnormal culture water quality and reduce potential economic loss, but also has important significance for fish behavioural research, and has the following specific effects:
1. the provided fish multi-target tracking method can obtain the position and the appearance characteristics of the fish targets from end to end simultaneously in a joint training mode, so that the reasoning time is saved, and the multi-target online tracking of the fish in the industrial culture environment is realized.
2. The method comprises a coding module aiming at position information, wherein historical information of fish positions in a video sequence is coded into a query embedding, and the query embedding adopts an iterative mode to fuse space-time information of fish positions. By means of the information coding module and the iterative query mode, automatic selection and rejection of historical information can be achieved, and subjective space-time feature selection is avoided.
3. By fusing the space-time information of the fish positions, the method can accurately track the track of the target fish, and avoid failure of the appearance model caused by ripple interference and fish deformation.
4. The parallel double decoder structure is adopted to decode tracking prediction and identity identification representation respectively, so that mutual interference of the tracking prediction and the identity identification representation is avoided, and a better appearance model is obtained.
5. During tracking, firstly, linear distribution is carried out by utilizing motion characteristics so as to reduce interference of shielding and deformation problems on tracking. When the matched track cannot be acquired through the motion model, the identity is recovered through the appearance model, and the long-time shielding condition can be dealt with.
Example two
In order to execute the method corresponding to the embodiment to realize the corresponding functions and technical effects, a multi-target tracking system for industrially cultured fishes is provided below. The system comprises:
the model construction module is used for acquiring fish videos and constructing a multi-scale tracking joint model. The multi-scale tracking joint model is a fish multi-target tracking network trained by adopting a data set.
And the target output module is used for inputting the fish video image into the multi-scale tracking joint model to obtain target output. The target output includes: detecting a target frame, a target displacement prediction result and a target appearance representation result.
And the offset correction module is used for carrying out offset correction on the detection result through the target displacement prediction result to obtain a correction result, and acquiring a high-score target and a low-score target according to the set condition.
And the linear distribution module is used for linearly distributing the target track and the high-resolution target in the target displacement prediction result by using a Jonker-Volgenant algorithm and taking the GIoU distance as a cost matrix.
And the first judging module is used for judging whether all the target tracks are matched in the process of linearly distributing the target track in the target displacement prediction result and the high-resolution target to obtain a first judging result. And when the first judgment result is negative, performing linear distribution on the target track which does not obtain the matching and the low-resolution target.
And the second judging module is used for judging whether all the target tracks are matched in the process of linearly distributing the target tracks which are not matched with the low-resolution targets, so as to obtain a second judging result. And when the second judging result is negative, taking the target track which does not obtain the matching as an inactive track, and discarding the low-score targets which fail to match.
And the identity recovery module is used for linearly distributing the inactive track and the target appearance representation result, recovering the inactive track matched with the target appearance representation result into an active track, and generating a fish multi-target tracking track.
And when the first judging result or the second judging result is yes, the fish target detection frame is included in the tracking track, and a fish multi-target tracking track is generated.
According to the method, firstly, fish tracking videos in overlook angles are collected, and a fish multi-target tracking data set is constructed. And secondly, constructing a fish multi-target tracking model based on space-time information fusion. And finally, carrying out multi-target on-line tracking on the fish based on the model, and drawing the fish motion trail.
The fish multi-target tracking model based on space-time information fusion has the following 3 characteristics:
(1) The model is a transducer-based end-to-end combined frame, has 3 branches of target detection, track prediction and target re-identification, and can model the motions and the appearance of the fish at the same time, so that the online multi-target tracking of the fish is realized.
(2) The method comprises the steps of encoding historical information of fish positions in a video sequence into a query embedding by an encoder module aiming at the position information, wherein the query embedding adopts an iterative mode to fuse space-time information of the fish positions for predicting the fish positions.
(3) The parallel double decoder structure is adopted to decode the fish motion information and the appearance information respectively, so that mutual interference of the fish motion information and the appearance information is avoided, the motion characteristics are firstly utilized for linear distribution during tracking, and when the matched track cannot be acquired through the motion model, the identity is recovered through the appearance model.
The method can solve the problem of reduced tracking accuracy caused by foam shielding, water ripple interference and fish deformation, can recover the identity after long-time shielding, acquire a relatively complete fish motion trail, provide support for quantitative analysis of fish motion rules, provide a basis for analysis of fish abnormal behaviors, not only can timely find out fish abnormal behaviors and abnormal aquaculture water quality, reduce potential economic loss, but also has important significance for fish behavioural research.
Example III
The embodiment of the invention provides electronic equipment which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program to enable the electronic equipment to execute the industrial aquaculture fish multi-target tracking method provided in the embodiment one.
In practical applications, the electronic device may be a server.
In practical applications, the electronic device includes: at least one processor (processor), memory (memory), bus, and communication interface (Communications Interface).
Wherein: the processor, communication interface, and memory communicate with each other via a communication bus.
And the communication interface is used for communicating with other devices.
And a processor, configured to execute a program, and specifically may execute the method described in the foregoing embodiment.
In particular, the program may include program code including computer-operating instructions.
The processor may be a central processing unit, CPU, or specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors comprised by the electronic device may be the same type of processor, such as one or more CPUs. But may also be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory or may further comprise non-volatile memory, such as at least one disk memory.
Based on the description of the above embodiments, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions executable by a processor to implement the method of any embodiment
The industrial cultured fish multi-target tracking system provided by the embodiment of the application exists in various forms, including but not limited to:
(1) A mobile communication device: such devices are characterized by mobile communication capabilities and are primarily aimed at providing voice, data communications. Such terminals include: smart phones (e.g., iPhone), multimedia phones, functional phones, and low-end phones, etc.
(2) Ultra mobile personal computer device: such devices are in the category of personal computers, having computing and processing functions, and generally having mobile internet access capabilities. Such terminals include: PDA, MID, and UMPC devices, etc., such as iPad.
(3) Portable entertainment device: such devices may display and play multimedia content. The device comprises: audio, video players (e.g., iPod), palm game consoles, electronic books, and smart toys and portable car navigation devices.
(4) Other electronic devices with data interaction functions.
Thus, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of a storage medium for a computer include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory or other memory technology, a compact disc read only memory (CD-ROM), a compact disc Read Only Memory (ROM),
Digital Versatile Disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, may be used to store information that may be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types. The application may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (9)

1. The multi-target tracking method for the industrially cultured fishes is characterized by comprising the following steps of:
acquiring fish videos and constructing a multi-scale tracking joint model; the multi-scale tracking joint model is a fish multi-target tracking network trained by adopting a data set;
acquiring a fish video, and inputting the fish video into the multi-scale tracking joint model to obtain target output; the target output includes: detecting a target frame, a target displacement prediction result and a target appearance representation result;
Performing offset correction on the detection result through a target displacement prediction result to obtain a correction result, and dividing a high-score target and a low-score target according to a set condition;
adopting a Jonker-Volgenant algorithm, and linearly distributing a target track in a target displacement prediction result and the Gao Fenmu target by taking a GIoU distance as a cost matrix;
in the process of linearly distributing the target track in the target displacement prediction result and the Gao Fenmu target, judging whether all target tracks are matched to obtain a first judgment result;
when the first judgment result is negative, carrying out linear distribution on the target track which is not obtained to be matched with the low-resolution target;
in the process of linearly distributing the target track which is not matched with the low-resolution target, judging whether all the target tracks are matched to obtain a second judging result;
when the second judging result is negative, taking the target track which does not obtain the matching as an inactive track, and discarding the low-score targets which fail to match;
linearly distributing the inactive track and the target appearance representation result, recovering the inactive track matched with the target appearance representation result into an active track, and generating a fish multi-target tracking track;
And when the first judging result or the second judging result is yes, the fish target detection frame is incorporated into the tracking track, and a fish multi-target tracking track is generated.
2. The method for multi-objective tracking of industrially cultivated fish according to claim 1, wherein the multi-scale tracking joint model constructed by the method comprises: a dual encoder structure and a dual decoder structure;
the first encoder in the dual encoder structure is used for extracting multi-scale image features;
the second encoder in the double encoder structure is a space-time information encoder and is used for encoding historical information of fish positions in a video sequence into a query embedding and fusing the space-time information of the fish positions;
the double decoder structure comprises a tracking decoder for motion trail prediction and a re-identification decoder for appearance modeling;
an iterative query embedding is arranged between the encoder and the decoder for transmitting the merged fish position history information.
3. The method of claim 2, wherein the process of extracting multi-scale image features by the first encoder in the dual encoder structure comprises:
Converting the feature map with any scale into a sequence by adopting a block cutting and embedding module;
and inputting the sequence into a spatial thumbnail attention layer to perform spatial thumbnail operation, and generating multi-scale image features.
4. A method of multi-objective tracking of industrially cultivated fish according to claim 3 wherein the spatial abbreviated attention layer operates as:
SR(X)=Norm(Reshape(X,R i )W S )
wherein SRA (&) is a space abbreviated attention function, Q is a query, K is a bond, and V is a value; concat (-) is a splicing operation; and->Parameters of linear mapping, C i For the number of channels of the feature map, d head Mapping dimensions for the attention header; head part j For attention header, j ε [0, N i ],N i The attention head number of the attention layer is shortened for the space of the ith stage; attention (& gt)) A function for mapping the query and a set of key-value pairs to an output; SR (&) space thumbnail operation, X is an input variable, R i The compression rate of the attention layer is shortened for the space of the ith stage; reshape (·) is to change the feature map shape.
5. The method of claim 2, wherein the encoding of historical information of fish locations in the video sequence into a query embedding by the second encoder in the dual encoder structure comprises:
Performing bilinear interpolation sampling on a 1/4W H scale feature map of a previous frame in a fish video sequence according to the position of a target fish center point to generate a sampled fish spatial position feature;
performing multi-head self-attention mechanism operation after the sampled position is normalized, and generating attention weight;
weighting and summing the fish space position feature images according to the attention weight, and performing standardization processing;
and inputting the normalized characteristic diagram into a forward network, and carrying out residual addition operation to enable the fish position information in the fish video sequence to be encoded into a query embedding.
6. The method of claim 5, wherein the multi-head self-attention mechanism operates as:
MHA(Q,K,V)=Concat(head 0 ,…,head N )W O
head i =Attention(QW i Q ,KW i K ,VW i V )
wherein MHA (·) is a multi-head attention mechanism function, Q is a query, K is a bond, and V is a value; w (W) O ,W i Q ,W i K And W is i V The parameter of the linear mapping is that i is the attention head serial number; head part N N is the number of attention heads; d, d head Mapping dimensions for the attention header; softmax (·) is the change of the feature map shape.
7. A multi-target tracking system for industrially cultured fish, characterized by being applied to the multi-target tracking method for industrially cultured fish according to any one of claims 1 to 6; the system comprises:
The model construction module is used for acquiring fish videos and constructing a multi-scale tracking joint model; the multi-scale tracking joint model is a fish multi-target tracking network trained by adopting a data set;
the target output module is used for inputting the fish video image into the multi-scale tracking joint model to obtain target output; the target output includes: detecting a target frame, a target displacement prediction result and a target appearance representation result;
the offset correction module is used for carrying out offset correction on the detection result through the target displacement prediction result to obtain a correction result, and acquiring a high-score target and a low-score target according to a set condition;
the linear distribution module is used for adopting a Jonker-Volgenant algorithm and linearly distributing the target track in the target displacement prediction result and the Gao Fenmu target by taking the GIoU distance as a cost matrix;
the first judging module is used for judging whether all target tracks are matched in the process of linearly distributing the target track in the target displacement prediction result and the Gao Fenmu target to obtain a first judging result; when the first judgment result is negative, carrying out linear distribution on the target track which is not obtained to be matched with the low-resolution target;
The second judging module is used for judging whether all the target tracks are matched or not in the process of linearly distributing the target tracks which are not matched with the low-resolution targets, so as to obtain a second judging result; when the second judging result is negative, taking the target track which does not obtain the matching as an inactive track, and discarding the low-score targets which fail to match;
the identity recovery module is used for carrying out linear distribution on the inactive track and the target appearance representation result, recovering the inactive track matched with the target appearance representation result into an active track, and generating a fish multi-target tracking track;
and when the first judging result or the second judging result is yes, the fish target detection frame is included in the tracking track, and a fish multi-target tracking track is generated.
8. An electronic device comprising a memory and a processor, the memory configured to store a computer program, the processor configured to execute the computer program to cause the electronic device to perform the multi-objective tracking method of industrially farmed fish according to any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the multi-objective tracking method of industrially farmed fish according to any one of claims 1 to 6.
CN202310728939.9A 2023-06-20 2023-06-20 Multi-target tracking method, system and equipment for industrially cultivated fishes Active CN116721132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310728939.9A CN116721132B (en) 2023-06-20 2023-06-20 Multi-target tracking method, system and equipment for industrially cultivated fishes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310728939.9A CN116721132B (en) 2023-06-20 2023-06-20 Multi-target tracking method, system and equipment for industrially cultivated fishes

Publications (2)

Publication Number Publication Date
CN116721132A CN116721132A (en) 2023-09-08
CN116721132B true CN116721132B (en) 2023-11-24

Family

ID=87873100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310728939.9A Active CN116721132B (en) 2023-06-20 2023-06-20 Multi-target tracking method, system and equipment for industrially cultivated fishes

Country Status (1)

Country Link
CN (1) CN116721132B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784746A (en) * 2020-08-10 2020-10-16 上海高重信息科技有限公司 Multi-target pedestrian tracking method and device under fisheye lens and computer system
CN112598713A (en) * 2021-03-03 2021-04-02 浙江大学 Offshore submarine fish detection and tracking statistical method based on deep learning
CN113592896A (en) * 2020-04-30 2021-11-02 中国农业大学 Fish feeding method, system, equipment and storage medium based on image processing
CN113706579A (en) * 2021-08-09 2021-11-26 华北理工大学 Prawn multi-target tracking system and method based on industrial culture
CN114202563A (en) * 2021-12-15 2022-03-18 中国农业大学 Fish multi-target tracking method based on balance joint network
CN114359341A (en) * 2021-12-29 2022-04-15 湖南国科微电子股份有限公司 Multi-target tracking method and device, terminal equipment and readable storage medium
CN114882344A (en) * 2022-05-23 2022-08-09 海南大学 Small-sample underwater fish body tracking method based on semi-supervision and attention mechanism
WO2022217840A1 (en) * 2021-04-15 2022-10-20 南京莱斯电子设备有限公司 Method for high-precision multi-target tracking against complex background
CN115937251A (en) * 2022-11-03 2023-04-07 中国农业大学 Multi-target tracking method for shrimps

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592896A (en) * 2020-04-30 2021-11-02 中国农业大学 Fish feeding method, system, equipment and storage medium based on image processing
CN111784746A (en) * 2020-08-10 2020-10-16 上海高重信息科技有限公司 Multi-target pedestrian tracking method and device under fisheye lens and computer system
CN112598713A (en) * 2021-03-03 2021-04-02 浙江大学 Offshore submarine fish detection and tracking statistical method based on deep learning
WO2022217840A1 (en) * 2021-04-15 2022-10-20 南京莱斯电子设备有限公司 Method for high-precision multi-target tracking against complex background
CN113706579A (en) * 2021-08-09 2021-11-26 华北理工大学 Prawn multi-target tracking system and method based on industrial culture
CN114202563A (en) * 2021-12-15 2022-03-18 中国农业大学 Fish multi-target tracking method based on balance joint network
CN114359341A (en) * 2021-12-29 2022-04-15 湖南国科微电子股份有限公司 Multi-target tracking method and device, terminal equipment and readable storage medium
CN114882344A (en) * 2022-05-23 2022-08-09 海南大学 Small-sample underwater fish body tracking method based on semi-supervision and attention mechanism
CN115937251A (en) * 2022-11-03 2023-04-07 中国农业大学 Multi-target tracking method for shrimps

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Ren-Jie Huang ; Yi-Chung Lai.《Applying convolutional networks to underwater tracking without training》.《2018 IEEE International Conference on Applied System Invention (ICASI)》.2018,全文. *
基于图像处理的鱼群运动监测方法研究;袁永明;施;;南方水产科学(第05期);全文 *
陈海涛 ; 马骏 ; 李峰 ; 鹿明 ; 鲁啸天.《面向视频卫星的多目标跟踪技术》.《中国空间科学技术》.2022,全文. *

Also Published As

Publication number Publication date
CN116721132A (en) 2023-09-08

Similar Documents

Publication Publication Date Title
Fan et al. Point 4d transformer networks for spatio-temporal modeling in point cloud videos
Li et al. Deep neural network for structural prediction and lane detection in traffic scene
CN109711463B (en) Attention-based important object detection method
CN113792113A (en) Visual language model obtaining and task processing method, device, equipment and medium
CN110555387B (en) Behavior identification method based on space-time volume of local joint point track in skeleton sequence
CN111523378B (en) Human behavior prediction method based on deep learning
CN113221787B (en) Pedestrian multi-target tracking method based on multi-element difference fusion
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN111783712A (en) Video processing method, device, equipment and medium
CN113393496A (en) Target tracking method based on space-time attention mechanism
CN111652181A (en) Target tracking method and device and electronic equipment
CN115188066A (en) Moving target detection system and method based on cooperative attention and multi-scale fusion
CN111353429A (en) Interest degree method and system based on eyeball turning
Shang et al. Cattle behavior recognition based on feature fusion under a dual attention mechanism
CN108875555B (en) Video interest area and salient object extracting and positioning system based on neural network
Huang et al. Efficient Detection Method of Pig‐Posture Behavior Based on Multiple Attention Mechanism
CN116721132B (en) Multi-target tracking method, system and equipment for industrially cultivated fishes
CN113837977B (en) Object tracking method, multi-target tracking model training method and related equipment
CN116311504A (en) Small sample behavior recognition method, system and equipment
CN110659576A (en) Pedestrian searching method and device based on joint judgment and generation learning
CN115359550A (en) Gait emotion recognition method and device based on Transformer, electronic device and storage medium
Lu et al. Lightweight green citrus fruit detection method for practical environmental applications
CN114140718A (en) Target tracking method, device, equipment and storage medium
CN114140524A (en) Closed loop detection system and method for multi-scale feature fusion
CN114399648A (en) Behavior recognition method and apparatus, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant