CN113706579A - Prawn multi-target tracking system and method based on industrial culture - Google Patents

Prawn multi-target tracking system and method based on industrial culture Download PDF

Info

Publication number
CN113706579A
CN113706579A CN202110909169.9A CN202110909169A CN113706579A CN 113706579 A CN113706579 A CN 113706579A CN 202110909169 A CN202110909169 A CN 202110909169A CN 113706579 A CN113706579 A CN 113706579A
Authority
CN
China
Prior art keywords
target
tracking
detection
prawn
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110909169.9A
Other languages
Chinese (zh)
Inventor
刘利平
乔乐乐
孙建
何航宇
石义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Science and Technology
Original Assignee
North China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Science and Technology filed Critical North China University of Science and Technology
Priority to CN202110909169.9A priority Critical patent/CN113706579A/en
Publication of CN113706579A publication Critical patent/CN113706579A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Animal Husbandry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Agronomy & Crop Science (AREA)
  • Evolutionary Biology (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-target prawn tracking system and method based on industrial culture, and belongs to the technical field of target tracking. According to the invention, on the basis of a YOLOv3 model, a Mish function is introduced into a backbone network Darknet53 of the model, a Focus module, a space pyramid pooling module and a feature pyramid module are added, a multi-target tracking method based on improved Yolov3 is provided, the improved Yolov3 target detection and Deepsort multi-target prawn tracking model are fused, the multi-target real-time prawn tracking under the actual culture environment can be realized, and good technical support can be provided for realizing accurate management of large-scale prawn culture.

Description

Prawn multi-target tracking system and method based on industrial culture
Technical Field
The invention belongs to the technical field of target tracking, and particularly relates to a multi-target prawn tracking system and method based on industrial aquaculture.
Background
Aquaculture is an important component of agricultural production, and in recent decades, Chinese aquaculture industry has rapidly developed and drawn attention. The computer vision technology is a non-invasive observation technology with good stability developed gradually on the basis of image processing, artificial intelligence, pattern recognition and other technologies, and the principle is to collect video sequence images of a shooting area by using an imaging system such as a camera and the like, and detect and track a moving target in the images in an image processing mode so as to obtain parameters of the target. Are being accepted by a number of researchers in aquaculture behavior. Delcourt and the like perform certain research on tracking of individual behaviors in a fish school by using a computer vision system, track a plurality of targets in the movement of the fish school, and have good robustness and reliability. The young swordfish tracing the river migration is counted by adopting an acoustic camera, a static background and a series of impurity interferences are removed from an obtained acoustic image, a moving target is extracted, then the moving target is tracked by a Kalman filter, the swordfish and garbage are distinguished according to different moving directions of the target, and the counting of the migratory swordfish is completed after the garbage interferences are removed.
Multi-target tracking algorithms can be divided into two categories: 1) this problem is solved by two independent models: the detection model first locates the object of interest by bounding boxes in the image, then correlates the model to extract Re-identification (Re-ID) features for each bounding box and links them to existing trajectories according to some criteria defined on the features. In recent years, significant progress has been made in object detection and Re-ID respectively, which in turn improves tracking performance, however, these methods cannot reason at video rate because these two networks do not share features, 2) as multitask learning matures, one-step methods of jointly detecting objects and learning Re-ID features have begun to receive much attention, since two models share most functions, they have the potential of significantly reducing inference time, however, compared with two-step methods, the accuracy of one-step methods is usually significantly reduced, and therefore, a multi-target shrimp tracking system and method based on industrial aquaculture is urgently needed to solve the above problems.
Disclosure of Invention
The invention aims to: in order to solve the problem that the accuracy of the one-step method is usually reduced remarkably, the multi-target prawn tracking system and method based on industrial culture are provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
a multi-target prawn tracking method based on industrial breeding comprises the following steps:
s101, acquiring a plurality of continuous video images of a target through a camera, wherein the target is a continuous video image comprising multiple targets in an environment;
s102, determining characteristic information in continuous video images, and determining video sub-images, wherein the sub-images comprise key frames of all targets;
s103, performing target detection and extraction through a Yolov3 detector based on the feature information of the key frames of the sub-images;
and S104, matching the extracted target, and completing target motion trajectory tracking detection through real-time input of a Deepsort tracker.
As a further description of the above technical solution:
the method specifically comprises the steps of establishing a Yolov3 model system to realize the detection and tracking capacity for improving small scale and shielding targets, and the training method specifically comprises the following steps:
s201, screening and removing the video images of the invalid segments with no targets and lens pollution at night;
s202, 3 data sets are constructed and respectively used for training a target detection model, a re-identification model and verifying a multi-target tracking effect;
s203, extracting key frames of the target detection data set by using ffmpeg, and labeling 6024 (1920 pixels multiplied by 1080 pixels) acquired prawn images by using a LabelImg labeling tool to manufacture data in a PASCAL VOC standard data set format;
s204, dividing the training set into a training set and a testing set according to the ratio of 4: 1;
s205, in order to improve the accuracy of the re-recognition result, the prawn individual is ensured to exist only by manually screening video data, then the DarkLabel is used for labeling the video, different individuals are distinguished according to different labels in the labeling process, and finally a re-recognition data set is constructed according to the format of the Market-1501 data set.
As a further description of the above technical solution:
the method specifically comprises the steps of matching extracted targets by using a multi-target tracking algorithm to perform track tracking check, improving the tracking effect of the multiple targets by extracting depth apparent features, taking detection results, bounding box, confidence and feature as input based on the existing accurate detection result, wherein the bounding box is mainly used for screening detection frames, the bounding box and the feature (ReID) are used for matching calculation with a tracker, the prediction module utilizes a Kalman filter, and the update module partially utilizes an IOU to perform matching of the Hungarian algorithm.
As a further description of the above technical solution:
the system adopts improved Yolov3 as a target detection module of a prawn multi-target tracking model, and the specific improvement comprises the addition of a Mosaic data enhancement module, a Focus module, a CSP module, an FPN + PAN module and the introduction of a Mish function to enhance the generalization capability of the model, and the introduction of a GIOU loss function to optimize an intersection-to-parallel ratio loss function.
As a further description of the above technical solution:
the specific enhancement method for the Mosaic data enhancement comprises the step of splicing 4 pictures in a random scaling, random cutting and random arrangement mode.
As a further description of the above technical solution:
the enhancement of the Focus module comprises the following steps of slicing an image, taking values of every other pixel in the image, inputting an original 608 × 608 × 3 image into the Focus module, changing the original 608 × 608 × 3 image into a 304 × 304 × 12 feature map by adopting slicing operation, changing the original 608 × 608 × 3 feature map into a 304 × 304 × 32 feature map through 32 convolution operations of convolution kernels, and obtaining a downsampling feature map without information loss.
As a further description of the above technical solution:
the CSP module enhancement comprises a CSP module which comprises convolution, batch normalization, a leak relu activation function and X residual error units and is used for dividing feature mapping of a basic layer into two parts, and then through cross-stage hierarchical structure combination, the accuracy rate can be guaranteed while the calculated amount is reduced.
As a further description of the above technical solution:
the spatial pyramid pooling module belongs to multi-scale fusion, and performs three times of maximal pooling operations by using a spatial pyramid pooling SPP module to input the characteristic Fin∈RC×H×WMaximum pooling of 5 × 5, 9 × 9 and 13 × 13 is respectively carried out, the feature map size is kept by complementing 0 around the feature map, and then the feature maps subjected to three-time pooling are spliced in channel dimension to complete feature fusion.
As a further description of the above technical solution:
the FPN + PAN comprises a feature pyramid which is added behind an FPN layer from bottom to top and comprises two PAN structures, wherein the FPN layer conveys strong semantic features from top to bottom, the feature pyramid conveys strong positioning features from bottom to top, and parameter aggregation is carried out on different detection layers from different stem layers.
As a further description of the above technical solution:
and introducing an intersection ratio (IoU) to quantify a fit degree detection task of the prediction frame and the real frame, setting a threshold value to be 0.5, and if IoU is greater than 0.5, determining that the detection is correct, otherwise, determining that the detection is wrong.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
according to the invention, on the basis of a YOLOv3 model, a Mish function is introduced into a backbone network Darknet53 of the model, a Focus module, a space pyramid pooling module and a feature pyramid module are added, a multi-target tracking method based on improved Yolov3 is provided, the improved Yolov3 target detection and Deepsort multi-target prawn tracking model are fused, the multi-target real-time prawn tracking under the actual culture environment can be realized, and good technical support can be provided for realizing accurate management of large-scale prawn culture.
Drawings
FIG. 1 is a network structure diagram of the Yolov3 detection algorithm of the present invention;
FIG. 2 is a diagram of the Focus network architecture of the present invention;
FIG. 3 is a CSP network architecture of the present invention;
FIG. 4 is a diagram of the SPP network structure of the present invention
FIG. 5 is a diagram of the FPN + PAN network architecture of the present invention;
FIG. 6 is a Deepsort tracking flow chart of the present invention;
FIG. 7 is a diagram of the results of some prawn assays of the present invention;
FIG. 8 shows the tracking results of the prawn of the present invention;
FIG. 9 is a flow chart of the prawn tracking algorithm of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-9, in one embodiment, the present invention provides a technical solution: a multi-target prawn tracking method based on industrial breeding comprises the following steps:
s101, acquiring a plurality of continuous video images of a target through a camera, wherein the target is a continuous video image comprising multiple targets in an environment;
s102, determining characteristic information in continuous video images, and determining video sub-images, wherein the sub-images comprise key frames of all targets;
s103, performing target detection and extraction through a Yolov3 detector based on the feature information of the key frames of the sub-images;
and S104, matching the extracted target, and completing target motion trajectory tracking detection through real-time input of a Deepsort tracker.
The data sets are from water works of Thangsan Ruida, the shooting time of the video to be tested is 12 months and 5-30 days in 2020, 137 sections of videos are totally obtained, the total time is 3120min, the MP4 format is adopted, invalid segments such as nights, no targets, lens pollution and the like are removed through screening, and 3 data sets are constructed and are respectively used for training a target detection model, a re-recognition model and verifying the multi-target tracking effect. Extracting key frames of a target detection data set by using ffmpeg, and labeling 6024 (1920 pixels × 1080 pixels) acquired prawn images by using a LabelImg labeling tool to manufacture data in a PASCAL VOC standard data set format; and dividing the training set and the test set according to the ratio of 4: 1. In order to improve the accuracy of the re-recognition result, firstly, manually screening video data to ensure that all prawns in all videos exist only and the same prawn cannot appear in different videos, then labeling the videos by using DarkLabel, distinguishing different individuals according to different labels in the labeling process, and finally constructing re-recognition data according to a Market-1501 data set format;
firstly, a camera video image sequence is sent to an improved Yolov3 detector for training, the trained Yolov3 detector is used for aquaculture prawn detection in a complex environment, a Yolov3 trained prawn detection model is used, prawn detection results in prawn videos are used as real-time input of a Deepsort tracker, and the self defects of the Deepsort are overcome by using a high-precision target detection algorithm.
Referring to fig. 1, in an embodiment, the target Detection algorithm of the present invention specifically includes that Tracking-By-Detection is a multi-target Tracking manner of a detector + tracker, a proper and excellent detector is selected to have a great influence on a Tracking effect, a Yolo target Detection algorithm converts a target Detection problem into a regression problem, only one deep convolutional neural network model is used for target Detection, rapid target Detection and identification can be achieved with high accuracy, Yolov3 is a very classical algorithm of a Yolo series of target Detection, and consists of two parts, namely a Darknet-53 feature extraction network and a multi-scale prediction network;
specifically, the three basic components of Yolov 3: 1) the CBL is the minimum component in the Yolov3 network structure and consists of a Conv + Bn + Leaky _ relu activation function. 2) Res unit is used for reference of a residual error structure in the Resnet network [10], so that the network can be constructed more deeply. 3) ResX: consisting of one CBL and X residual components, is the large component in Yolov3. The CBL preceding each Res module functions as a down-sample. Concat may implement tensor stitching, which expands the dimensions of two tensors, e.g., 26 × 26 × 256 and 26 × 26 × 512 two tensor stitching, resulting in 26 × 26 × 768;
the Yolov3 algorithm adopts a DarkNet53 network as a backbone network, uses 52 convolution layers to extract a main structure of characteristics, processes medium-low-layer matrixes and bottom-layer matrixes through convolution and matrix splicing operations to generate output of 3 scales, and predicts the types and positions of the target by processing characteristic graphs of different sizes through a prediction network.
In consideration of the fact that a prawn target body in a video acquired by the method is small and is often far away from a camera and the good effect of Yolov3 on small target detection, Yolov3 is adopted as a target detection module of a prawn multi-target tracking model, meanwhile, in order to improve the detection precision and the operation speed of the model, the method is improved to obtain a system of the multi-target tracking method, the system adopts improved Yolov3 as a target detection module of the prawn multi-target tracking model, the specific improvement comprises the steps of adding a Mosaic data enhancement module, a Focus module, a CSP module, an FPN + PAN module and introducing a Mish function to enhance the generalization capability of the model, and the GIOU loss function is introduced to optimize an intersection-ratio loss function;
like many neural network models, the target detection model works best when training large amounts of data. Typically, the data available is limited and many researchers around the world are researching augmentation strategies to increase the amount of data available. The method for enhancing the prawn training data set by using the Mosaic data enhancement refers to a CutMix data enhancement mode proposed by 2019, however, CutMix only uses two pictures for splicing, and Mosaic data enhancement adopts 4 pictures for splicing in a mode of random zooming, random clipping and random arrangement.
As shown in fig. 2, the Focus module enhancement includes a slicing operation on a picture, where every other pixel in an image takes a value, an original 608 × 608 × 3 image is input to a Focus module, the original 608 × 608 × 3 image is changed into a 304 × 304 × 12 feature map by the slicing operation, and is changed into a 304 × 304 × 32 feature map by a convolution operation with 32 convolution kernels, so as to obtain a downsampling feature without information loss.
As shown in fig. 3, the CSP module enhancement includes that the CSP module includes convolution, batch normalization, a leave relu activation function and X residual error units, and is used to divide the feature mapping of the base layer into two parts, and then merge through a Cross-Stage hierarchy structure, so that the accuracy can be guaranteed while the computation amount is reduced, the CSP module is based on the empirical improvement of the CSPNet in 2019, the size of the convolution kernel in front of each CSP module is 3 × 3, and stride is 2, so that the CSP module can play a role of down-sampling, the CSPNet is called Cross Stage fractional Network, and mainly solves the problem of large computation amount in the inference from the perspective of Network structure design. The authors of CSPNet believe that the problem of inferential computationally prohibitive is due to gradient information duplication in network optimization.
As shown in fig. 4, the spatial pyramid pooling module belongs to multi-scale fusion, and performs three maximal pooling operations using the spatial pyramid pooling SPP module to input features Fin∈RC×H×WPerforming maximum pooling of 5 × 5, 9 × 9 and 13 × 13 respectively, keeping the size of the feature map by supplementing 0 around the feature map, and then performing channel dimension splicing on the three pooled feature maps to complete feature fusion, wherein the calculation process of the SPP is shown as formula (1):
Figure BDA0003202819780000081
wherein Maxpooln×n() Representing a maximum pooling operation of core size,
Figure BDA0003202819780000082
indicating a catenate operation.
As shown in fig. 5, the FPN + PAN includes that a bottom-up feature pyramid is added behind the FPN layer, where the feature pyramid includes two PAN structures, the FPN layer conveys strong semantic features from top to bottom, and the feature pyramid conveys strong localization features from bottom to top, so that parameter aggregation is performed on different detection layers from different stem layers.
Referring to fig. 6, in an embodiment, the technical scheme further includes performing trajectory tracking check on extracted targets by using a multi-target tracking algorithm, improving a tracking effect of the multiple targets by extracting depth apparent features, taking detection results, namely, bounding box, confidence and feature as input, based on an existing accurate detection result, the confidence being mainly used for screening detection frames, the bounding box and feature (reid) being used for matching calculation with a tracker, the prediction module using a kalman filter, and the update module partially using the IOU to perform matching of the hungarian algorithm.
Specifically, matching in deep sort refers to matching between the currently valid trajectory and the current detection, and linear weighting of two measurement modes, namely motion matching and apparent matching, is used as a final measurement mode.
The degree of motion match is characterized using the mahalanobis distance between the detection and the location of the trajectory predicted by the kalman filter. As shown in equation (2):
Figure BDA0003202819780000091
representing the motion matching degree between the jth detection and the ith track, wherein Si is a covariance matrix of an observation space of the track at the current moment predicted by a Kalman filter, yi is a predicted observation quantity of the track at the current moment, and dj is the state of the jth detection
The use of mahalanobis distance as the matching degree metric alone may result in excessive changes in the target label, especially when the camera is moving, which may result in the failure of the mahalanobis distance metric, so that this time should be remedied by the apparent matching degree. And using the minimum cosine distance between the target detection frame of the current frame and the feature vectors of all detection frame sets contained in the track as the apparent matching degree between the detection and the track. As shown in formula (3)
Figure BDA0003202819780000101
In the formula
Figure BDA0003202819780000102
Set of features, r, for the ith tracker that was the last k successfully associated featuresjThe feature vector of the jth detection result of the current frame.
Fusion of two metrics:
Figure BDA0003202819780000103
where is the lambda hyperparameter used to adjust the weights of the different terms.
Selecting and combining a lost target, a false alarm rate and an accuracy MOTA after label conversion in order to compare the Deepsort with other multi-target tracking models; average frame overlapping rate MOTP; and (4) evaluating by taking the frequency ID of the tracking track change target label and the running frequency Runtime as evaluation indexes. The tracking results pairs for deep sort and other methods on the Mot16 dataset are shown in table 1.
TABLE 1 Deepsort vs. other Multi-target tracking computation Performance
Figure BDA0003202819780000104
As an online tracking method, the deep sort has good performance and meets the requirements of video multi-target detection and tracking;
furthermore, an Leaky Relu activation function used by the original DarkNet53 is replaced by a Mish activation function, the Mish activation function is a smooth, continuous and non-monotonic function, the Mish activation function has no upper bound and no lower bound, and the gradient conduction of the model can be smoother by using the Mish activation function, so that the gradient descending effect is better than that of Relu, more effective information is reserved, the generalization capability of the model is enhanced, the detection capability of the model on the target can be improved by using the Mish activation function, the detection and tracking of the overlapped target are facilitated, and the occurrence of ID switching is effectively reduced.
In one embodiment, in order to verify feasibility and accuracy of multi-target tracking of industrial aquaculture prawns by combining Yolov3 and Deepsort frameworks, the algorithm is integrated under a Pytroch 1.7 deep learning framework and a programming language Python3.7 platform, the experimental configuration environment is GTX1080ti, the video memory size is 11G, and Intel Xeon E5-2630v4 are realized, and the main frequency is a 2.2 Hz CPU. Developing a Python-based language, and tracking and visualizing the language based on OpenCV;
introducing an intersection ratio (IoU) to quantify the joint degree detection task of the prediction frame and the real frame, setting a threshold value to be 0.5, if IoU is greater than 0.5, determining that the detection is correct, otherwise, determining that the detection is wrong
During target detection network training, the identification type in a yolov3.cfg file is changed to 1, the weight attenuation factor is set to 0.0005, the initial learning rate is 0.001, the momentum parameter is set to 0.9, the size of a training batch is 64, and the training is performed for 1000 times. The training completion time was 51.4 hours and the test was performed with a trained detector.
Further, the target detection is different from the classification task, and the output result of the detection model is unstructured, and the number, position, size, and the like of the detected objects cannot be known in advance, so the detection task needs to introduce an intersection ratio (IoU) to quantify the degree of fit between the prediction frame a and the real frame B, the threshold value is set to 0.5, if IoU is greater than 0.5, correct detection is considered, otherwise, false detection is considered, and an intersection ratio calculation formula is used, as shown in formula (4)
Figure BDA0003202819780000111
In the formula SA∩BRepresents the area of the overlapping region of the predicted frame and the real frame, SA∪BRepresenting the area of the predicted and real box covered regions, IoURepresenting the intersection ratio of the prediction box and the real box.
The GIOU is proposed for relieving the gradient problem of IOU loss when the detection frames are not overlapped, and a penalty term is added on the basis of the original IOU loss.
LGIOU=1-IOU(A,B)+|C-A∪B|/|C| (5)
Where A is the prediction box, B is the real box, and C is the minimum bounding box of A and B.
In a general target map, a background and a target exist, a prediction frame is also divided into correct and incorrect frames, so 4 samples are generated, a True case (TP) represents that the prediction frame is correctly matched with the True frame, a False positive case (FP) represents that a model carries out error classification on the positive case, generally refers to a detection target with IoU smaller than a threshold value, a False negative case (FN) represents that the model does not detect the True frame, and a True negative case (TN) represents that the True frame is a background and is not detected. Based on the four samples, we use accuracy (real proportion in all examples divided into positive examples), recall rate (proportion in all positive examples divided into positive examples), PR curve (recall rate as abscissa and precision as ordinate), harmonic mean F1(F1-score), parameters of model (Params) and maps (area enclosed by PR curve and coordinate axis) as detection performance indexes to compare the detection performance of the Yolov3 model trained by the training set with that of the improved Yolov3 model.
The experimental results comparing the Yolov3 model with the modified Yolov3 model on the shrimp target detection dataset are shown in table 2:
TABLE 2 comparison of target detection results for different algorithms
Figure BDA0003202819780000121
As can be seen from Table 2, the mAP of the present invention is 93.4%, which is 1.7 percentage points higher than Yolov3, and the model parameter is only 11.6% of Yolov3. The algorithm of the invention not only improves the performance integrally compared with Yolov3, but also reduces the parameter processing scale. The target detection result of the prawn of the invention is shown in figure 9.
In one embodiment, in order to verify the performance of the multi-target tracking of the prawns, a test is performed on a multi-target tracking dataset of the prawns, and the result is compared with a Yolov3-deep sort algorithm, and the result is shown in table 3:
TABLE 3 comparison of the results of the inventive experiment with Yolov3-Deepsort
Figure BDA0003202819780000131
As can be seen from Table 3, the multi-target tracking accuracy of the invention is 47.6%, which is improved by 19.1% compared with the Yolov3-Deepsort algorithm, the multi-target tracking accuracy is improved by 16.1%, the target label change frequency is reduced by 69%, and the video processing speed is also improved from 17f/s to 24 f/s.
The same video segment was tracked using two different algorithms, the results of which are shown in fig. 8. The selected frame is a Yolov3-Deepsort missed detection target, and compared with a Yolov3-Deepsort algorithm, the missed detection target is obviously fewer, the number of times of changing the target label is reduced, and the detection tracking capability of the prawn with high shielding degree is better.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (10)

1. A multi-target prawn tracking method based on industrial aquaculture is characterized by comprising the following steps:
s101, acquiring a plurality of continuous video images of a target through a camera, wherein the target is a continuous video image comprising multiple targets in an environment;
s102, determining characteristic information in continuous video images, and determining video sub-images, wherein the sub-images comprise key frames of all targets;
s103, performing target detection and extraction through a Yolov3 detector based on the feature information of the key frames of the sub-images;
and S104, matching the extracted target, and completing target motion trajectory tracking detection through real-time input of a Deepsort tracker.
2. The multi-target shrimp tracking method based on industrial aquaculture of claim 1, characterized by further comprising the step of establishing a Yolov3 model system to realize the detection and tracking capability for improving small scale and shielding targets, wherein the training method specifically comprises the following steps:
s201, screening and removing the video images of the invalid segments with no targets and lens pollution at night;
s202, 3 data sets are constructed and respectively used for training a target detection model, a re-identification model and verifying a multi-target tracking effect;
s203, extracting key frames of the target detection data set by using ffmpeg, and labeling 6024 (1920 pixels multiplied by 1080 pixels) acquired prawn images by using a LabelImg labeling tool to manufacture data in a PASCAL VOC standard data set format;
s204, dividing the training set into a training set and a testing set according to the ratio of 4: 1;
s205, in order to improve the accuracy of the re-recognition result, the prawn individual is ensured to exist only by manually screening video data, then the DarkLabel is used for labeling the video, different individuals are distinguished according to different labels in the labeling process, and finally a re-recognition data set is constructed according to the format of the Market-1501 data set.
3. The multi-target shrimp tracking method based on industrial aquaculture as claimed in claim 1, characterized by further comprising matching extracted targets by using a multi-target tracking algorithm to perform trajectory tracking inspection, improving tracking effect of the multi-targets by extracting depth appearance features, taking detection results bounding box, confidence and feature as input based on existing accurate detection results, wherein the confidence is mainly used for screening detection frames, the bounding box and feature (reid) are used for matching calculation with a tracker, the prediction module uses a kalman filter, and the update module uses an IOU to perform matching of Hungarian algorithm.
4. The multi-target prawn tracking method based on industrial aquaculture as claimed in any one of claims 1-3, further comprising a system using the multi-target tracking method, wherein the system adopts improved Yolov3 as a target detection module of a multi-target prawn tracking model, and the specific improvements include adding a Mosaic data enhancement module, a Focus module, a CSP module, an FPN + PAN module and introducing a Mish function to enhance the generalization capability of the model, and introducing a GIOU loss function to optimize a cross-over ratio loss function.
5. The multi-target shrimp tracking system based on industrial aquaculture as claimed in claim 4, wherein the specific enhancement method of the Mosaic data enhancement comprises splicing 4 pictures by means of random scaling, random cutting and random arrangement.
6. The multi-target prawn tracking system based on industrial aquaculture as claimed in claim 4, wherein the Focus module enhancement includes a picture slicing operation, values are taken at every other pixel in the image, the original 608 × 608 × 3 image is input into the Focus module, the slicing operation is adopted to become a 304 × 304 × 12 feature map, and the feature map is changed into a 304 × 304 × 32 feature map through a convolution operation of 32 convolution kernels, so as to obtain a downsampled feature map without information loss.
7. The multi-target prawn tracking system based on industrial aquaculture as claimed in claim 4, wherein the CSP module enhancement comprises the CSP module including convolution, batch normalization, a leave relu activation function and X residual error units, and is used for dividing feature mapping of a base layer into two parts, and then through cross-stage hierarchical structure combination, the accuracy rate can be guaranteed while the calculation amount is reduced.
8. The multi-target shrimp tracking system based on industrial aquaculture of claim 4, wherein the spatial pyramid pooling module belongs to multi-scale fusion, three times of maximal pooling operations are performed by using the spatial pyramid pooling SPP module, and the input features F are inputin∈RC×H×WMaximum pooling of 5 × 5, 9 × 9 and 13 × 13 is respectively carried out, the feature map size is kept by complementing 0 around the feature map, and then the feature maps subjected to three-time pooling are spliced in channel dimension to complete feature fusion.
9. The multi-target prawn tracking system based on factory breeding as claimed in claim 4, wherein the FPN + PAN comprises a bottom-up feature pyramid added behind the FPN layer, wherein the bottom-up feature pyramid comprises two PAN structures, the FPN layer conveys strong semantic features from top to bottom, the feature pyramid conveys strong positioning features from bottom to top, and parameter aggregation is performed on different detection layers from different main stem layers.
10. The multi-target shrimp tracking system based on industrial aquaculture of claim 4, further comprising introducing an intersection ratio (IoU) to quantify the fit detection task of the prediction frame and the real frame, setting a threshold value to be 0.5, and if IoU is greater than 0.5, then the detection is considered to be correct, otherwise, the detection is considered to be false.
CN202110909169.9A 2021-08-09 2021-08-09 Prawn multi-target tracking system and method based on industrial culture Pending CN113706579A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110909169.9A CN113706579A (en) 2021-08-09 2021-08-09 Prawn multi-target tracking system and method based on industrial culture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110909169.9A CN113706579A (en) 2021-08-09 2021-08-09 Prawn multi-target tracking system and method based on industrial culture

Publications (1)

Publication Number Publication Date
CN113706579A true CN113706579A (en) 2021-11-26

Family

ID=78651930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110909169.9A Pending CN113706579A (en) 2021-08-09 2021-08-09 Prawn multi-target tracking system and method based on industrial culture

Country Status (1)

Country Link
CN (1) CN113706579A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690546A (en) * 2022-12-30 2023-02-03 正大农业科学研究有限公司 Shrimp length measuring method and device, electronic equipment and storage medium
CN115690565A (en) * 2022-09-28 2023-02-03 大连海洋大学 Target detection method for cultivated fugu rubripes by fusing knowledge and improving YOLOv5
CN116721132A (en) * 2023-06-20 2023-09-08 中国农业大学 Multi-target tracking method, system and equipment for industrially cultivated fishes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914664A (en) * 2020-07-06 2020-11-10 同济大学 Vehicle multi-target detection and track tracking method based on re-identification
CN112347943A (en) * 2020-11-09 2021-02-09 哈尔滨理工大学 Anchor optimization safety helmet detection method based on YOLOV4

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914664A (en) * 2020-07-06 2020-11-10 同济大学 Vehicle multi-target detection and track tracking method based on re-identification
CN112347943A (en) * 2020-11-09 2021-02-09 哈尔滨理工大学 Anchor optimization safety helmet detection method based on YOLOV4

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TOM HARDY: "YOLOv4/v5的创新点汇总!", 《HTTPS://BLOG.CSDN.NET/QQ_29462849/ARTICLE/DETAILS/118561934》 *
朱健: "基于YOLOv3和DeepSort的太阳活动区检测与跟踪", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690565A (en) * 2022-09-28 2023-02-03 大连海洋大学 Target detection method for cultivated fugu rubripes by fusing knowledge and improving YOLOv5
CN115690565B (en) * 2022-09-28 2024-02-20 大连海洋大学 Method for detecting cultivated takifugu rubripes target by fusing knowledge and improving YOLOv5
CN115690546A (en) * 2022-12-30 2023-02-03 正大农业科学研究有限公司 Shrimp length measuring method and device, electronic equipment and storage medium
CN116721132A (en) * 2023-06-20 2023-09-08 中国农业大学 Multi-target tracking method, system and equipment for industrially cultivated fishes
CN116721132B (en) * 2023-06-20 2023-11-24 中国农业大学 Multi-target tracking method, system and equipment for industrially cultivated fishes

Similar Documents

Publication Publication Date Title
WO2020253629A1 (en) Detection model training method and apparatus, computer device, and storage medium
CN113706579A (en) Prawn multi-target tracking system and method based on industrial culture
CN111160269A (en) Face key point detection method and device
US20130251246A1 (en) Method and a device for training a pose classifier and an object classifier, a method and a device for object detection
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN110765865B (en) Underwater target detection method based on improved YOLO algorithm
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN113269737B (en) Fundus retina artery and vein vessel diameter calculation method and system
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN112634369A (en) Space and or graph model generation method and device, electronic equipment and storage medium
CN111260628A (en) Large nursery stock number counting method based on video image and electronic equipment
CN116229052A (en) Method for detecting state change of substation equipment based on twin network
CN116152928A (en) Drowning prevention early warning method and system based on lightweight human body posture estimation model
CN115375737A (en) Target tracking method and system based on adaptive time and serialized space-time characteristics
CN112991281B (en) Visual detection method, system, electronic equipment and medium
CN114283326A (en) Underwater target re-identification method combining local perception and high-order feature reconstruction
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN111531546B (en) Robot pose estimation method, device, equipment and storage medium
CN115393252A (en) Defect detection method and device for display panel, electronic equipment and storage medium
Dong et al. A detection-regression based framework for fish keypoints detection
CN115862130B (en) Behavior recognition method based on human body posture and trunk sports field thereof
CN115937991A (en) Human body tumbling identification method and device, computer equipment and storage medium
CN112991280B (en) Visual detection method, visual detection system and electronic equipment
CN116912670A (en) Deep sea fish identification method based on improved YOLO model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211126

WD01 Invention patent application deemed withdrawn after publication