CN116311001A - Method, device, system, equipment and medium for identifying fish swarm behavior - Google Patents
Method, device, system, equipment and medium for identifying fish swarm behavior Download PDFInfo
- Publication number
- CN116311001A CN116311001A CN202310561907.4A CN202310561907A CN116311001A CN 116311001 A CN116311001 A CN 116311001A CN 202310561907 A CN202310561907 A CN 202310561907A CN 116311001 A CN116311001 A CN 116311001A
- Authority
- CN
- China
- Prior art keywords
- target
- video
- fish
- feature
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 241000251468 Actinopterygii Species 0.000 title claims abstract description 204
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000006399 behavior Effects 0.000 claims abstract description 184
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 114
- 238000012549 training Methods 0.000 claims abstract description 24
- 230000004927 fusion Effects 0.000 claims description 92
- 238000000605 extraction Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 11
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 9
- 238000005286 illumination Methods 0.000 claims description 9
- 229910052760 oxygen Inorganic materials 0.000 claims description 9
- 239000001301 oxygen Substances 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 claims description 5
- 238000005520 cutting process Methods 0.000 claims description 4
- 230000004634 feeding behavior Effects 0.000 abstract description 12
- 239000002699 waste material Substances 0.000 abstract description 3
- 238000007500 overflow downdraw method Methods 0.000 abstract 1
- 239000000523 sample Substances 0.000 description 63
- 230000037406 food intake Effects 0.000 description 20
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 239000000047 product Substances 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000009360 aquaculture Methods 0.000 description 5
- 244000144974 aquaculture Species 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- HJBWJAPEBGSQPR-UHFFFAOYSA-N DMCA Natural products COC1=CC=C(C=CC(O)=O)C=C1OC HJBWJAPEBGSQPR-UHFFFAOYSA-N 0.000 description 4
- 125000003275 alpha amino acid group Chemical group 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000005276 aerator Methods 0.000 description 2
- 238000010009 beating Methods 0.000 description 2
- 238000005266 casting Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000003911 water pollution Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000036528 appetite Effects 0.000 description 1
- 235000019789 appetite Nutrition 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 235000012631 food intake Nutrition 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 239000007921 spray Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/80—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
- Y02A40/81—Aquaculture, e.g. of fish
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a fish swarm behavior recognition method, a device, a system, equipment and a medium, and relates to the field of image recognition, wherein the method comprises the following steps: acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video; inputting the target image features, the target audio features and the target water quality features into a multi-modal fish swarm behavior recognition model to obtain target fish swarm behaviors corresponding to a target video; the multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video. According to the invention, the characteristics corresponding to the image, the audio and the water quality data are mutually fused by adopting a multi-mode fusion method, so that the anti-interference capability of identifying the feeding behavior of the fish shoal is improved, the feeding behavior analysis is carried out from multiple directions and multiple angles, the feeding behavior state of the fish shoal is accurately identified, the accurate feeding of the fish shoal is realized, and the waste of feed is reduced.
Description
Technical Field
The present invention relates to the field of image recognition, and in particular, to a method, apparatus, system, device, and medium for identifying fish school behaviors.
Background
The fish swarm feeding behavior is judged to be influenced by illumination, water turbidity, water surface reflection, aerator noise or artificial noise through visual characteristics, so that the fish swarm feeding behavior cannot be accurately identified, and further a feeding decision is influenced.
Disclosure of Invention
The invention provides a method, a device, a system, equipment and a medium for identifying fish swarm behaviors, which are used for solving the technical problem that the identification of the fish swarm ingestion behaviors is inaccurate in the prior art, and provides a multi-mode fusion algorithm for fusing video, audio and water quality parameters, so that the accurate prediction of the fish swarm ingestion behaviors is realized.
In a first aspect, the present invention provides a method for identifying fish school behaviors, including:
acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video;
inputting the target image characteristics, the target audio characteristics and the target water quality characteristics into a multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model;
The multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
According to the fish swarm behavior recognition method provided by the invention, the method for acquiring the target image characteristics, the target audio characteristics and the target water quality characteristics of the target video comprises the following steps:
cutting the original fish school video according to a preset duration to obtain all target videos;
for each target video, respectively extracting image features of the target video based on a double-flow model and a video encoder to obtain a first image feature and a second image feature, and splicing the first image feature and the second image feature to obtain target image features of the target video;
performing audio feature extraction on the target video based on a pre-training audio neural network model to obtain target audio features of the target video;
text feature extraction is carried out on water quality data corresponding to a target video based on a text encoder, and target water quality features of the target video are obtained;
the water quality data includes an acid-base number, a dissolved oxygen number, and a temperature.
According to the fish-swarm behavior recognition method provided by the invention, the target image feature, the target audio feature and the target water quality feature are input into a multi-modal fish-swarm behavior recognition model, and the target fish-swarm behavior corresponding to the target video is obtained and output by the multi-modal fish-swarm behavior recognition model, comprising:
inputting the target image features and the target audio features to a first submodule in the multi-mode fish school behavior recognition model, and obtaining a first video fusion feature output by the first submodule; inputting the target image features and the target audio features to a second submodule in the multi-mode fish school behavior recognition model, and obtaining second video fusion features output by the second submodule;
performing feature fusion on the first video fusion feature and the second video fusion feature according to preset weights to obtain target fusion features;
inputting the query embedded feature and the target fusion feature to a query decoder of the first submodule, and acquiring target shoal behaviors corresponding to target videos output by the query decoder;
the query embedding feature is generated by embedding a target water quality feature into the target fusion feature.
According to the fish school behavior recognition method provided by the invention, the steps of inputting the target image features and the target audio features into the first submodule in the multi-mode fish school behavior recognition model, and obtaining the first video fusion features output by the first submodule include:
inputting the target image features and the target audio features to a feature enhancement layer of the first sub-module, and acquiring the image enhancement features and the audio enhancement features output by the feature enhancement layer;
and inputting the image enhancement features and the audio enhancement features to a bottleneck attention layer of the first sub-module, and obtaining a first video fusion feature output by the bottleneck attention layer.
According to the fish school behavior recognition method provided by the invention, the inputting of the target image features and the target audio features to the second submodule in the multi-mode fish school behavior recognition model, the obtaining of the second video fusion features output by the second submodule, includes:
inputting the target image features and the target audio features to a multi-level basic component modularized common attention layer in the second sub-module, and obtaining a second video fusion feature output by the multi-level basic component modularized common attention layer;
The multi-level basic component modularization common attention layer is formed by connecting basic component modularization common attention layers of each level in series;
the basic component modularization common attention layer of each hierarchy is formed by self-attention units and leading attention units.
According to the fish swarm behavior recognition method provided by the invention, the feature fusion is carried out on the first video fusion feature and the second video fusion feature according to the preset weight, and the target fusion feature is obtained, which comprises the following steps:
determining a first weight characteristic according to the first weight parameter and the first video fusion characteristic;
determining a second weight characteristic according to the second weight parameter and the second video fusion characteristic;
determining the target fusion feature according to the first weight feature and the second weight feature;
the preset weight comprises the first weight parameter and the second weight parameter;
the first weight parameter and the second weight parameter are determined according to the influence degree of the first sub-module and the second sub-module on the fish swarm behavior recognition result.
In a second aspect, there is provided a fish school behavior recognition apparatus, comprising:
A first acquisition unit: the method comprises the steps of acquiring target image characteristics, target audio characteristics and target water quality characteristics of target video;
a second acquisition unit: the multi-modal fish swarm behavior recognition model is used for inputting the target image features, the target audio features and the target water quality features to the multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model;
the multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
In a third aspect, there is provided a fish school behavior recognition system, comprising:
the video acquisition equipment is used for acquiring an original fish school video;
the water quality acquisition equipment is used for acquiring water quality data;
the illumination transmitter is used for acquiring illumination intensity;
the light source is used for supplementing light for the video acquisition equipment;
the fish school behavior recognition device is also included.
In a fourth aspect, the present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method for identifying fish school behavior as described in any one of the above when executing the program.
In a fifth aspect, the invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method of fish school behavior identification as described in any of the above.
According to the fish swarm behavior recognition method, device, system, equipment and medium, the target image characteristics, the target audio characteristics and the target water quality characteristics of the target video are obtained through characteristic extraction, the target image characteristics, the target audio characteristics and the target water quality characteristics are input into the multi-modal fish swarm behavior recognition model, and the target fish swarm behaviors corresponding to the target video are obtained.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a fish school behavior recognition method according to the present invention;
FIG. 2 is a schematic flow chart of acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video according to the present invention;
fig. 3 is a schematic flow chart of a target shoal behavior corresponding to a target video acquisition method provided by the invention;
fig. 4 is a schematic flow chart of acquiring a first video fusion feature according to the present invention;
FIG. 5 is a schematic flow chart of the method for acquiring the target fusion feature;
FIG. 6 is a second flow chart of the fish school behavior recognition method according to the present invention;
FIG. 7 is a third flow chart of the fish school behavior recognition method according to the present invention;
FIG. 8 is a schematic structural view of a multi-level basic assembly modular common focus layer provided by the present invention;
FIG. 9 is a schematic diagram of a fish school behavior recognition system according to the present invention;
FIG. 10 is a schematic diagram of a connection structure of an illuminance transmitter provided by the present invention;
fig. 11 is a schematic structural diagram of a fish school behavior recognition device provided by the invention;
fig. 12 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The bait is one of important variable costs in aquaculture, and accounts for more than 50% of the total cost, so that the feeding of the bait in aquaculture is very important, and overfeeding and underfeeding are common problems in the feeding link in the aquaculture process. The real-time analysis and monitoring of the variation of the feeding behavior of the shoal of fish in the aquaculture water body is one of important bases for making scientific bait casting strategies, and can effectively reduce bait waste and avoid water pollution.
Machine vision is widely applied to the fields of image classification, target recognition and the like due to the advantages of wide applicability and reliable data acquisition, and the combination of specific image preprocessing and enhancement algorithms. And (3) by acquiring a fish swarm ingestion picture, judging the ingestion behavior of the fish swarm through image processing, feature extraction and quantification, and making a feeding decision. However, the method based on machine vision is generally limited to a culture environment with clear culture water body and good illumination, has high requirements on the culture environment, and is influenced by factors such as turbidity of the water body, reflection of the water surface and the like; the sound signals of fish ingestion are strong and obvious in change, so that a basic basis can be provided for fish ingestion research, however, the acoustic technology is easily interfered by an aerator and artificial noise, and the application in actual production practice is limited; based on different sensors for detecting temperature, dissolved oxygen value and pH value, the water quality parameters of the culture water body are obtained, necessary information can be provided for accurate feeding, the change of the water quality parameters of the culture water body directly influences the appetite of fishes, however, the change of the water quality parameters is a longer process, and the ingestion of fish shoals is a shorter process, so that the identification of the ingestion behaviors of the fishes is very difficult to reflect through the water quality.
In summary, single image, audio frequency and water quality parameters can provide reference directions for the feeding behavior of the fish shoal, but single information cannot accurately identify the feeding behavior of the fish shoal, so that bait is wasted and water is polluted.
Fig. 1 is a schematic flow chart of a fish school behavior recognition method provided by the present invention, and provides a fish school behavior recognition method, which includes:
102, inputting the target image features, the target audio features and the target water quality features into a multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model;
The multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
In step 101, the target video may be a video stream acquired in real time in an underwater environment, and the video duration is intercepted, so that the determined video with uniform duration is rich in information generated by the feeding state of the fish shoal. Visually, fish shoals can cause changes in the body of water during constant swimming and eating; in terms of sound, remarkable sounds such as water spray beating and collision sound between fish shoals can be generated when the fish shoals ingest; the intake behavior of the fish shoal can also generate tiny changes on dissolved oxygen values, acid-base values, temperature and other environmental factors in the water body, and the changes of the environmental factors can also be used as characteristics of the intake behavior, so that the intake behavior of the fish shoal can be comprehensively and accurately reflected by comprehensively utilizing various information to identify the intake behavior of the fish shoal.
Specifically, the target image features are the features of no audio feature and only dynamic image in the target video, the target audio features are the features of no dynamic image in the target video and only the features of sound in the underwater environment, which may include the food intake sound of the fish shoal, the working sound of the oxygenerator, the beating and collision sounds generated by the fish shoal for ingestion, and the like, and the target water quality features mainly include the dissolved oxygen value, the acid-base value and the temperature in the time period where the target video is located.
In step 102, according to the sample image feature, the sample audio feature and the sample water quality feature of each sample video, training is performed with the sample fish swarm behavior of each sample video to determine a multi-modal fish swarm behavior recognition model, so that after the target image feature, the target audio feature and the target water quality feature are input into the multi-modal fish swarm behavior recognition model, the target fish swarm behavior corresponding to the target video can be obtained and output by the multi-modal fish swarm behavior recognition model, and further, a bait casting strategy is determined according to the target fish swarm behavior, thereby effectively reducing bait waste and avoiding water pollution.
The invention can construct a fish-swarm feeding behavior recognition model based on a deep learning framework (PyTorch) and by using a language (Python), the invention sets Batch Size (Batch Size) to 32, iterative cycle number to 1000, learning rate to 0.001, optimizes network parameters by an Adam optimizer, sets Weight attenuation (Weight Decay) to 0.0001 for preventing overfitting, and takes extracted image features, audio features and water quality features as the input of the model during training.
The invention discloses a high-precision multi-mode fusion fish-swarm ingestion behavior recognition algorithm (Adaptive DMCA-UMT), which is improved on the basis of a multi-mode model (Unified Multimodal Transformer, UMT) to remarkably improve recognition accuracy.
Compared with the prior method which only uses single video or audio data, the method can correlate various data types and supplement features, and under the condition that the features of certain data types are not obvious enough, the method can obtain better effects, so that compared with the method for identifying fish shoal behaviors by single data, the method has more excellent identification performance.
The invention also utilizes the water camera device and the water quality probe to collect the video stream, the audio stream and the water quality data of the fish swarm ingestion under the control of the operation processor, the light source is used for supplementing light for the waterproof camera device, the light source and the illuminance transmitter supplement light when the underwater light is insufficient, and then the operation processor identifies the fish swarm ingestion behavior through a trained model. The method and the device can be effectively applied to the aquaculture environment, and a reliable and accurate technical means is provided for researching and monitoring the feeding behavior of the fish shoals.
According to the fish swarm behavior recognition method, device, system, equipment and medium, the target image characteristics, the target audio characteristics and the target water quality characteristics of the target video are obtained through characteristic extraction, the target image characteristics, the target audio characteristics and the target water quality characteristics are input into the multi-modal fish swarm behavior recognition model, and the target fish swarm behaviors corresponding to the target video are obtained.
Fig. 2 is a schematic flow chart of acquiring target image features, target audio features and target water quality features of a target video according to the present invention, where the acquiring the target image features, the target audio features and the target water quality features of the target video includes:
the water quality data includes an acid-base number, a dissolved oxygen number, and a temperature.
In step 1011, the invention can adopt the waterproof camera equipment to acquire the video of the feeding behavior of the fish shoal in the underwater target area to determine the original fish shoal video, optionally, the invention firstly preprocesses the original fish shoal video and uniformly cuts the original fish shoal video into videos with preset duration, for example, 150 seconds of video, then the 150 seconds of video is the target video, and the invention can execute the cutting operation on the original fish shoal video so as to acquire all the target videos.
In step 1012, for each target video, image feature extraction is performed on the target video based on the dual-stream model SlowFast, image feature extraction is performed on the target video based on the video encoder, a first image feature and a second image feature are obtained, as an optional embodiment, the target video is set to be a video of 150 seconds, image features corresponding to the dual-stream model SlowFast are respectively extracted every 2 seconds, image features corresponding to the video encoder are extracted, normalization processing is performed on the two features to splice the two features into one video vector, then all moments are traversed, all video vectors are determined, and all video vectors are spliced, so that the target image feature of the target video is obtained.
In step 1013, optionally, the present invention performs audio feature extraction on the target video based on the pre-trained audio neural network model (Pretrained Audio Neural Networks, PANN) to obtain an audio vector of the target video, where the audio vector is the target audio feature.
In step 1014, optionally, the text encoder in the Pre-Training neural network model (Contrastive Language-Image Pre-Training, CLIP) performs text feature extraction on water quality data corresponding to the target video to obtain target water quality features of the target video, where the water quality data includes an acid-base number, a dissolved oxygen value and a temperature, and those skilled in the art understand that while the target video is obtained, the water quality data in the time period of the target video may be obtained by using a temperature sensor, a dissolved oxygen sensor and an acid-base value measurement device, so as to extract the target water quality features of the target video.
Fig. 3 is a schematic flow chart of obtaining a target shoal of fish behavior corresponding to a target video, where the inputting the target image feature, the target audio feature, and the target water quality feature into a multi-modal fish behavior recognition model obtains a target shoal of fish behavior corresponding to the target video output by the multi-modal fish behavior recognition model, and includes:
The query embedding feature is generated by embedding a target water quality feature into the target fusion feature.
In step 1021, the first sub-module may be a multi-mode model (Unified Multimodal Transformer, UMT), and since the multi-mode model UMT has a significant effect in video recognition, the recognition accuracy is greater than that of a single mode, the target image feature and the target audio feature are input to the first sub-module in the multi-mode fish school behavior recognition model, and the first video fusion feature output by the first sub-module is obtained.
The second submodule comprises a multi-level basic component modularized common Attention layer (DMCA), and for fusing image features and audio features, the DMCA module is introduced into the second submodule to further improve video-audio joint features, so that the target image features and the target audio features are input into the second submodule in the multi-mode fish swarm behavior recognition model, and the second video fusion features output by the second submodule are obtained.
In step 1022, in order to improve accuracy of identification of fish intake behavior, image features and audio features are further fused, DMCA is introduced based on UMT model, and the model of UMT model is the model DMCA-UMT, however, in order to make the model weigh important data and non-important data of two kinds of fusion mode information, that is, important data increase weight, non-important data decrease weight, self-adaptive weights need to be added to the mode fusion of two kinds of data respectively, so that anti-interference capability and identification accuracy of the model for fish intake behavior identification are improved, that is, feature fusion is performed on the first video fusion feature and the second video fusion feature according to preset video and audio weights, and target fusion features are obtained.
In step 1023, the target water quality feature is first embedded into the target fusion feature to generate a query embedded feature, and then the query embedded feature and the target fusion feature are input into a query decoder of the first sub-module to obtain a target fish swarm behavior corresponding to the target video output by the query decoder.
The invention takes the target water quality characteristics and the target fusion characteristics as input, calculates attention weights between video clips and Query texts through a Query Generator (Query Generator), determines whether each video clip contains information described in the texts and predicts a Query embedding, further, a Query Decoder (Query Decoder) takes the video-audio joint characteristics and a text-guided moment Query as input, namely, inputs the Query embedding characteristics and the target fusion characteristics to the Query Decoder, decodes the Query embedding characteristics and the target fusion characteristics, obtains final joint moment retrieval by using a prediction head, and the moment retrieval is defined as a key point detection problem, namely, each moment can be represented by a time center and a duration window of the key point detection problem, and the center point can be estimated by predicting a time heat map and extracting a local maximum value; the window may be further regressed from the characteristics of the center.
For each real time search, its center isAnd window d, quantizing the center point to beAnd use of one-dimensional Gaussian kernel +.>Fill heat map->Wherein->Is the time coordinate, +.>Is the window adaptive standard deviation.
The present invention uses a gaussian focus loss function to optimize the center point prediction as:
in the formula (1), the components are as follows,and->Representing weights and indices, actually set to 2.0 and 4.0, the L1 penalty is optimized for window and offset regression.
Wherein in the formula (2),for the window true value, +.>For window prediction value, < +.in formula (3)>Offset of true value, +.>Is the predicted offset.
The total training penalty will be a weighted sum of all the penalties described above:
Fig. 4 is a schematic flow chart of acquiring a first video and audio fusion feature provided by the present invention, where the inputting the target image feature and the target audio feature to a first sub-module in the multi-mode fish school behavior recognition model acquires the first video and audio fusion feature output by the first sub-module, and includes:
In step 10211, inputting the target image features and target audio features to a feature enhancement layer of the first sub-module, the feature enhancement layer comprising a Uni-modal Encoder (Uni-modal Encoder) for enhancing features of global context of each mode, the first sub-module being formed by stacking a plurality of encoding layers, each encoding layer being composed of a multi-headed self-attention and a feed-forward network, in each attention head, for image features or audio feature modesIs from (1)The calculation formula of the attention is:
in the formula (5), the amino acid sequence of the compound,representing input features->Representing an output characteristic; />Representing query linear transformation weights +.>Representing the linear transformation weight of the key value,/->Representing the value linear transformation weight, and +.>Representing the linear transform weights of the output matrix.
In step 10212, the present invention employs multi-modal learning to achieve overall feature capture, so after step 10211 is performed, cross-modal Encoder (Cross-modal Encoder) captures Cross-modal global dependencies, optionally the Cross-modal Encoder is a bottleneck attention layer (Attention Bottlenecks). The bottleneck attention layer can be divided into two stages, namely feature compression and feature expansion, and in the invention, only two modes of image features and audio features exist, and the feature compression process can be expressed as follows:
In the formula (6), the amino acid sequence of the compound,input and output, respectively denoted as the bottleneck attention layer, characteristicsThe purpose of the sign compression is to refine and compress the multimodal information into the bottleneck attention layer.
After compressing the multi-mode information, further expanding the compressed roar feature, the invention uses a multi-head attention to transmit the compressed feature to the image feature and the audio feature mode, and the feature expansion process can be expressed as follows:
optionally, the inputting the target image feature and the target audio feature to the second sub-module in the multi-mode fish school behavior recognition model, and obtaining the second video fusion feature output by the second sub-module includes:
inputting the target image features and the target audio features to a multi-level basic component modularized common attention layer in the second sub-module, and obtaining a second video fusion feature output by the multi-level basic component modularized common attention layer;
the multi-level basic component modularization common attention layer is formed by connecting basic component modularization common attention layers of each level in series;
the basic component modularization common attention layer of each hierarchy is formed by self-attention units and leading attention units.
Those skilled in the art understand that the basic component modular common attention layer (MCA) is composed of two attention units: one is Self-Attention (SA) for intra-modal interactions; the other is a Guided-Attention unit (GA) for inter-modality interaction, the self-Attention unit and the design principle of the Guided-Attention unit comes from scaled dot product Attention.
Further, assuming that the input of the scaled dot product attention is Q, K is a key value, V represents a value, Q, K, V are set to the same dimension d, the dot product of Q and K is calculated, divided byAnd go throughOverfunction->To obtain the attention weight, the feature obtained by the weighted sum of V +.>Can be expressed as:
in order to further improve the representation capability of the participation characteristic, the invention introduces multi-head attention, which consists of h parallel 'heads', wherein each head corresponds to an independent zooming dot product attention function. I.e. output characteristicsExpressed as:
in the formula (10), the amino acid sequence of the compound,is->Projection matrix of individual heads->。/>Is the dimension of the output of each head, in order to prevent the multi-head attention model from becoming too large, it is usually set +.>。
Fig. 8 is a schematic structural diagram of a multi-level basic component modular common attention layer provided by the present invention, which constructs two attention units, namely a self-attention unit (SA) and a guided attention unit (GA), on a multi-head attention basis. The self-attention unit consists of a multi-head attention layer and a feedforward layer, and in addition, the residual connection and the layer normalization are carried out on the output of the two layers, and the optimization guiding attention unit carries out the interaction between modes on the relation between the input X and the input Y respectively. On the basis, the self-Attention unit and the guiding Attention unit are combined in a modularized manner to obtain a basic component modularized common Attention layer (MCA), and finally, a plurality of MCA layers are connected in series to form a multi-level basic component modularized common Attention layer (DMCA).
Fig. 5 is a schematic flow chart of obtaining a target fusion feature according to the present invention, wherein the feature fusion is performed on the first audio-video fusion feature and the second audio-video fusion feature according to a preset weight, so as to obtain the target fusion feature, which includes:
the preset weight comprises the first weight parameter and the second weight parameter;
the first weight parameter and the second weight parameter are determined according to the influence degree of the first sub-module and the second sub-module on the fish swarm behavior recognition result.
In step 10221, a first weight feature is determined according to the product of the first weight parameter and the first video fusion feature.
In step 10222, a second weighting characteristic is determined according to the product of the second weighting parameter and the second video fusion characteristic.
In step 10223, the target fusion feature is determined from the sum of the first weight feature and the second weight feature, referring to the following formula:
In the formula (11), q is a target fusion characteristic,for the first weight parameter, +.>For the second weight feature +.>For the first video fusion feature->Is a second video fusion feature.
The invention can automatically adjust the fusion proportion by self-adaptive weight, the sum of the first weight parameter and the second weight parameter is 1, the first weight parameter and the second weight parameter can automatically change the size according to the training of a model and the adjustment of an optimizer, the proportion of data with great influence on the result can be increased, the proportion of data with small influence can be decreased, and the output characteristics of a basic assembly modularized common attention layer and a cross-mode encoder passing through a multi-level are subjected to characteristic fusion, so as to obtain the target fusion characteristics.
FIG. 6 is a second flow chart of the fish school behavior recognition method according to the present invention, wherein optionally, the multi-mode fish school behavior recognition model is determined by training the fish school behavior of each sample video according to the sample image feature, the sample audio feature and the sample water quality feature of each sample video, and the present invention sets the waterproof camera 15cm below the water surface of the farm, and places the waterproof camera on the side of the glass cylinder to shoot in order to obtain a larger fish school movement field of view, and processes the original fish school ingestion video shot by the waterproof camera, thereby obtaining the image and audio data; the method comprises the steps of utilizing an electrochemical water quality probe provided by a full-automatic Internet of things circulating water culture system to collect the change of water quality data in a period of time corresponding to the feeding video of a fish school, and determining three water quality data, namely an acid-base number, a dissolved oxygen value and a temperature; preprocessing the obtained image, audio data and water quality data, and extracting data characteristics from the preprocessed image, audio data and water quality data to serve as sample image characteristics, sample audio characteristics and sample water quality characteristics of each sample video; labeling sample fish swarm behaviors corresponding to each sample video, preprocessing the data, determining a training data set, a test data set and a verification data set, randomly taking 60% of total data as the training set, 20% of total data as the verification set, dividing the rest 20% of data into the test sets, and providing a multi-information-fused fish swarm ingestion behavior recognition algorithm by combining complex ingestion behaviors of fish swarms in a circulating culture pond and optimizing a loss function by using the algorithm; setting network initial parameters, taking the training data set and the verification data set as algorithm input, training the model, and generating a trained algorithm model; and finally, identifying the fish intake behavior by using the trained algorithm model, and detecting the fish intake behavior.
Fig. 7 is a third flow chart of the fish school behavior recognition method provided by the invention, fig. 7 shows that after feature extraction by a video feature extraction module, an audio feature extraction module and a water quality feature extraction module, the water quality features of image features, audio features and text features are obtained, the image features and the audio features are processed according to a single-mode encoder, a first video and audio fusion feature is output through a cross-mode encoder, a second video and audio fusion feature is output according to multi-level modularization common attention among common attention of depth modules, and a first weight parameter w is used for determining the first video and audio fusion feature 1 And determining a first weight feature by the product of the first video fusion features; according to the second weight parameter w 2 Determining a second weight characteristic according to the product of the second video and audio fusion characteristic, determining the target fusion characteristic according to the sum of the first weight characteristic and the second weight characteristic, then carrying out query generator and query decoder on the water quality characteristic of the target fusion characteristic and the text representation, and finally obtaining the target corresponding to the target video output by the multi-mode fish swarm behavior recognition modelAnd (5) fish shoal behavior.
Fig. 9 is a schematic structural diagram of a fish school behavior recognition system provided by the present invention, and the present invention discloses a fish school behavior recognition system, including:
The video acquisition equipment is used for acquiring an original fish school video;
the water quality acquisition equipment is used for acquiring water quality data;
the illumination transmitter is used for acquiring illumination intensity;
the light source is used for supplementing light for the video acquisition equipment;
the fish school behavior recognition device is also included.
The invention also provides a fish school behavior recognition system, which is recognition equipment of fish school feeding behaviors, and comprises: the device comprises video acquisition equipment, water quality acquisition equipment, a light source, an illuminance transmitter, a memory, a fish school behavior recognition device and a fish school ingestion behavior recognition program which is stored on the memory and can run on the processor, wherein the fish school behavior recognition device is respectively connected with the waterproof camera, the water quality probe, the light source and the illuminance transmitter.
The video acquisition equipment can acquire the fish swarm ingestion video in real time under the control of the fish swarm behavior recognition device, and extract the ingestion video into a video stream and an audio stream. The water quality acquisition equipment is a water quality probe and is connected with the computer through a communication technology to transmit water quality data to the computer; the light source is used for supplementing light for the waterproof camera equipment, the illuminance transmitter can sense the light intensity of the environment and transmit light intensity information to the fish school behavior recognition device, the fish school behavior recognition device controls the light source switch and the illumination intensity according to the light intensity information, the computer sends video, audio and water quality data to the fish school behavior recognition device, and the fish school behavior recognition device can conduct fish school ingestion behavior recognition according to a trained model.
According to the fish swarm behavior recognition method, device, system, equipment and medium, the target image characteristics, the target audio characteristics and the target water quality characteristics of the target video are obtained through characteristic extraction, the target image characteristics, the target audio characteristics and the target water quality characteristics are input into the multi-modal fish swarm behavior recognition model, and the target fish swarm behaviors corresponding to the target video are obtained.
Fig. 10 is a schematic diagram of a connection structure of an illuminance transmitter provided by the invention, wherein the illuminance transmitter comprises an illuminance sensor, a microcontroller and a communication interface in sequence, the microcontroller is respectively connected with the illuminance sensor and the communication interface, the microcontroller can control the illuminance sensor to collect data and transmit the data collected by the illuminance sensor to a processor through the communication interface, and the processor is a fish school behavior recognition device.
Fig. 11 is a schematic structural diagram of a fish school behavior recognition device according to the present invention, and the present invention provides a fish school behavior recognition device, including a first obtaining unit 1: the working principle of the first obtaining unit 1 may refer to the foregoing step 101 for obtaining the target image feature, the target audio feature and the target water quality feature of the target video, which are not described herein.
The fish school behavior recognition device further comprises a second acquisition unit 2: the second obtaining unit 2 is configured to obtain the target shoal behavior corresponding to the target video, and the working principle of the second obtaining unit 2 may refer to the foregoing step 102, which is not repeated herein.
The multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
According to the fish swarm behavior recognition method, device, system, equipment and medium, the target image characteristics, the target audio characteristics and the target water quality characteristics of the target video are obtained through characteristic extraction, the target image characteristics, the target audio characteristics and the target water quality characteristics are input into the multi-modal fish swarm behavior recognition model, and the target fish swarm behaviors corresponding to the target video are obtained.
Fig. 12 is a schematic structural diagram of an electronic device provided by the present invention. As shown in fig. 12, the electronic device may include: processor 110, communication interface (Communications Interface) 120, memory 130, and communication bus 140, wherein processor 110, communication interface 120, memory 130 communicate with each other via communication bus 140. The processor 110 may invoke logic instructions in the memory 130 to perform a fish school behavior recognition method comprising: acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video; inputting the target image characteristics, the target audio characteristics and the target water quality characteristics into a multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model; the multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
In addition, the logic instructions in the memory 130 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing a method of fish school behavior recognition provided by the methods described above, the method comprising: acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video; inputting the target image characteristics, the target audio characteristics and the target water quality characteristics into a multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model; the multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform the method of fish school behavior identification provided by the above methods, the method comprising: acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video; inputting the target image characteristics, the target audio characteristics and the target water quality characteristics into a multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model; the multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for identifying fish school behaviors, comprising:
acquiring target image characteristics, target audio characteristics and target water quality characteristics of a target video;
inputting the target image characteristics, the target audio characteristics and the target water quality characteristics into a multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model;
the multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
2. The fish school behavior recognition method according to claim 1, wherein said acquiring the target image feature, the target audio feature, and the target water quality feature of the target video comprises:
cutting the original fish school video according to a preset duration to obtain all target videos;
for each target video, respectively extracting image features of the target video based on a double-flow model and a video encoder to obtain a first image feature and a second image feature, and splicing the first image feature and the second image feature to obtain target image features of the target video;
performing audio feature extraction on the target video based on a pre-training audio neural network model to obtain target audio features of the target video;
text feature extraction is carried out on water quality data corresponding to a target video based on a text encoder, and target water quality features of the target video are obtained;
the water quality data includes an acid-base number, a dissolved oxygen number, and a temperature.
3. The method for identifying fish-shoal behaviors according to claim 1, wherein the inputting the target image feature, the target audio feature, and the target water quality feature into the multi-modal fish-shoal behavior identification model, obtaining the target fish-shoal behaviors corresponding to the target video output by the multi-modal fish-shoal behavior identification model, includes:
Inputting the target image features and the target audio features to a first submodule in the multi-mode fish school behavior recognition model, and obtaining a first video fusion feature output by the first submodule; inputting the target image features and the target audio features to a second submodule in the multi-mode fish school behavior recognition model, and obtaining second video fusion features output by the second submodule;
performing feature fusion on the first video fusion feature and the second video fusion feature according to preset weights to obtain target fusion features;
inputting the query embedded feature and the target fusion feature to a query decoder of the first submodule, and acquiring target shoal behaviors corresponding to target videos output by the query decoder;
the query embedding feature is generated by embedding a target water quality feature into the target fusion feature.
4. The fish-swarm behavior recognition method according to claim 3, wherein said inputting the target image features and the target audio features to the first sub-module in the multi-modal fish-swarm behavior recognition model, and obtaining the first video-audio fusion features output by the first sub-module, comprises:
Inputting the target image features and the target audio features to a feature enhancement layer of the first sub-module, and acquiring the image enhancement features and the audio enhancement features output by the feature enhancement layer;
and inputting the image enhancement features and the audio enhancement features to a bottleneck attention layer of the first sub-module, and obtaining a first video fusion feature output by the bottleneck attention layer.
5. The fish-swarm behavior recognition method according to claim 3, wherein said inputting the target image features and the target audio features to the second sub-module in the multi-modal fish-swarm behavior recognition model, and obtaining the second video-audio fusion features output by the second sub-module, comprises:
inputting the target image features and the target audio features to a multi-level basic component modularized common attention layer in the second sub-module, and obtaining a second video fusion feature output by the multi-level basic component modularized common attention layer;
the multi-level basic component modularization common attention layer is formed by connecting basic component modularization common attention layers of each level in series;
the basic component modularization common attention layer of each hierarchy is formed by self-attention units and leading attention units.
6. The fish school behavior recognition method according to claim 3, wherein the performing feature fusion on the first audio-visual fusion feature and the second audio-visual fusion feature according to a preset weight to obtain a target fusion feature comprises:
determining a first weight characteristic according to the first weight parameter and the first video fusion characteristic;
determining a second weight characteristic according to the second weight parameter and the second video fusion characteristic;
determining the target fusion feature according to the first weight feature and the second weight feature;
the preset weight comprises the first weight parameter and the second weight parameter;
the first weight parameter and the second weight parameter are determined according to the influence degree of the first sub-module and the second sub-module on the fish swarm behavior recognition result.
7. A fish school behavior recognition device, comprising:
a first acquisition unit: the method comprises the steps of acquiring target image characteristics, target audio characteristics and target water quality characteristics of target video;
a second acquisition unit: the multi-modal fish swarm behavior recognition model is used for inputting the target image features, the target audio features and the target water quality features to the multi-modal fish swarm behavior recognition model, and obtaining target fish swarm behaviors corresponding to the target video, wherein the target fish swarm behaviors are output by the multi-modal fish swarm behavior recognition model;
The multi-mode fish school behavior recognition model is determined by training with the sample fish school behavior of each sample video according to the sample image characteristics, the sample audio characteristics and the sample water quality characteristics of each sample video.
8. A fish school behavior recognition system, comprising:
the video acquisition equipment is used for acquiring an original fish school video;
the water quality acquisition equipment is used for acquiring water quality data;
the illumination transmitter is used for acquiring illumination intensity;
the light source is used for supplementing light for the video acquisition equipment;
further comprising a fish school behavior recognition means as recited in claim 7.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the fish school behavior recognition method of any one of claims 1-6 when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the fish school behavior identification method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310561907.4A CN116311001B (en) | 2023-05-18 | 2023-05-18 | Method, device, system, equipment and medium for identifying fish swarm behavior |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310561907.4A CN116311001B (en) | 2023-05-18 | 2023-05-18 | Method, device, system, equipment and medium for identifying fish swarm behavior |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116311001A true CN116311001A (en) | 2023-06-23 |
CN116311001B CN116311001B (en) | 2023-09-12 |
Family
ID=86781913
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310561907.4A Active CN116311001B (en) | 2023-05-18 | 2023-05-18 | Method, device, system, equipment and medium for identifying fish swarm behavior |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116311001B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116843085A (en) * | 2023-08-29 | 2023-10-03 | 深圳市明心数智科技有限公司 | Freshwater fish growth monitoring method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634202A (en) * | 2020-12-04 | 2021-04-09 | 浙江省农业科学院 | Method, device and system for detecting behavior of polyculture fish shoal based on YOLOv3-Lite |
CN112883861A (en) * | 2021-02-07 | 2021-06-01 | 同济大学 | Feedback type bait casting control method based on fine-grained classification of fish school feeding state |
CN113537106A (en) * | 2021-07-23 | 2021-10-22 | 仲恺农业工程学院 | Fish feeding behavior identification method based on YOLOv5 |
US20210368748A1 (en) * | 2020-05-28 | 2021-12-02 | X Development Llc | Analysis and sorting in aquaculture |
CN115861906A (en) * | 2023-03-01 | 2023-03-28 | 北京市农林科学院信息技术研究中心 | Fish school feeding intensity identification method, device and system and feeding machine |
CN116052064A (en) * | 2023-04-03 | 2023-05-02 | 北京市农林科学院智能装备技术研究中心 | Method and device for identifying feeding strength of fish shoal, electronic equipment and bait casting machine |
-
2023
- 2023-05-18 CN CN202310561907.4A patent/CN116311001B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210368748A1 (en) * | 2020-05-28 | 2021-12-02 | X Development Llc | Analysis and sorting in aquaculture |
CN112634202A (en) * | 2020-12-04 | 2021-04-09 | 浙江省农业科学院 | Method, device and system for detecting behavior of polyculture fish shoal based on YOLOv3-Lite |
CN112883861A (en) * | 2021-02-07 | 2021-06-01 | 同济大学 | Feedback type bait casting control method based on fine-grained classification of fish school feeding state |
CN113537106A (en) * | 2021-07-23 | 2021-10-22 | 仲恺农业工程学院 | Fish feeding behavior identification method based on YOLOv5 |
CN115861906A (en) * | 2023-03-01 | 2023-03-28 | 北京市农林科学院信息技术研究中心 | Fish school feeding intensity identification method, device and system and feeding machine |
CN116052064A (en) * | 2023-04-03 | 2023-05-02 | 北京市农林科学院智能装备技术研究中心 | Method and device for identifying feeding strength of fish shoal, electronic equipment and bait casting machine |
Non-Patent Citations (7)
Title |
---|
GUO QIANG: "Fish feeding behavior detection method based on shape and texture features", JOURNAL OF SHANGHAI OCEAN UNIVERSITY 27 (2), pages 181 - 189 * |
YANG, XINTING: "9 Deep learning for smart fish farming: applications, opportunities and challenges", REVIEWS IN AQUACULTURE, pages 66 - 90 * |
YUHAO ZENG: "Fish school feeding behavior quantification using acoustic signal and improved Swin Transformer", COMPUTERS AND ELECTRONICS IN AGRICULTURE(VOLUME 204, JANUARY 2023) * |
何笑;吐尔洪江・阿布都克力木;贺欢;: "基于小波变换的水下鱼群图像增强算法", 计算机技术与发展, no. 09, pages 234 - 237 * |
姜伟: "循环水养殖中基于鱼类行为的精准投喂方法的研究", 《中国优秀硕士学位论文全文数据库(农业科技辑)》, pages 052 - 47 * |
陈彩文;杜永贵;周超;孙传恒;: "基于支持向量机的鱼群摄食行为识别技术", 江苏农业科学, no. 07, pages 83 - 87 * |
陈明;张重阳;冯国富;陈希;陈冠奇;王丹;: "基于特征加权融合的鱼类摄食活动强度评估方法", 农业机械学报, no. 02, pages 245 - 253 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116843085A (en) * | 2023-08-29 | 2023-10-03 | 深圳市明心数智科技有限公司 | Freshwater fish growth monitoring method, device, equipment and storage medium |
CN116843085B (en) * | 2023-08-29 | 2023-12-01 | 深圳市明心数智科技有限公司 | Freshwater fish growth monitoring method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116311001B (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | Evaluation of fish feeding intensity in aquaculture using a convolutional neural network and machine vision | |
CN116311001B (en) | Method, device, system, equipment and medium for identifying fish swarm behavior | |
CN115861906B (en) | Method, device and system for identifying feeding strength of fish shoal and bait casting machine | |
CN113592896B (en) | Fish feeding method, system, equipment and storage medium based on image processing | |
JP7006776B2 (en) | Analytical instruments, analytical methods, programs and aquatic organism monitoring systems | |
Zeng et al. | Fish school feeding behavior quantification using acoustic signal and improved Swin Transformer | |
CN113349111A (en) | Dynamic feeding method, system and storage medium for aquaculture | |
CN115115830A (en) | Improved Transformer-based livestock image instance segmentation method | |
Li et al. | Cow individual identification based on convolutional neural network | |
CN115546622A (en) | Fish shoal detection method and system, electronic device and storage medium | |
CN116052064B (en) | Method and device for identifying feeding strength of fish shoal, electronic equipment and bait casting machine | |
CN115578678A (en) | Fish feeding intensity classification method and system | |
Zhou et al. | Deep images enhancement for turbid underwater images based on unsupervised learning | |
Du et al. | Feeding intensity assessment of aquaculture fish using Mel Spectrogram and deep learning algorithms | |
Zhang et al. | A high-precision facial recognition method for small-tailed Han sheep based on an optimised Vision Transformer | |
CN116630080B (en) | Method and system for determining capacity of aquatic product intensive culture feed based on image recognition | |
McLeay et al. | Deep convolutional neural networks with transfer learning for waterline detection in mussel farms | |
CN116895012A (en) | Underwater image abnormal target identification method, system and equipment | |
Jovanović et al. | Splash detection in fish Plants surveillance videos using deep learning | |
CN116206195A (en) | Offshore culture object detection method, system, storage medium and computer equipment | |
CN116798066A (en) | Sheep individual identity recognition method and system based on deep measurement learning | |
CN112749687B (en) | Picture quality and silence living body detection multitasking training method and device | |
CN115170942A (en) | Fish behavior identification method with multilevel fusion of sound and vision | |
Yang et al. | Fish feeding behavior recognition using adaptive dmca-umt algorithm | |
Zhua | A Deep Learning-Based Embedding Framework for Object Detection and Recognition in Underwater Marine Organisms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |