CN115359511A

CN115359511A - Pig abnormal behavior detection method

Info

Publication number: CN115359511A
Application number: CN202210934696.XA
Authority: CN
Inventors: 杨秋妹; 陈淼彬; 肖德琴; 刘啸虎; 康俊琪; 黄一桂; 周家鑫; 刘克坚
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2022-08-04
Filing date: 2022-08-04
Publication date: 2022-11-18

Abstract

The invention provides a method for detecting abnormal behaviors of pigs, which comprises the following steps: s1: extracting images frame by frame from live videos of pigs acquired in real time; s2: carrying out target detection and cutting on each extracted frame image by adopting an improved Yolov5n model to obtain a target screenshot of each pig in each frame image; s3: extracting feature vectors from a coder through double-current convolution; s4: clustering and classifying the feature vectors by adopting K-means and a classification algorithm; s5: obtaining classification scores of all targets of the current frame through a classifier, and combining all the classification scores to form an abnormal prediction graph; s6: performing Gaussian filtering time sequence smoothing on the abnormal prediction image, and recording the obtained highest classification score as the abnormal score of the current frame image; s7: judging whether the abnormal score of the current frame image is a positive number or not; if yes, no abnormal behavior exists, otherwise, the abnormal behavior exists. The method for detecting the abnormal behavior of the pig solves the problem that the conventional abnormal detection method cannot realize universal detection of the abnormal behavior of the pig.

Description

Method for detecting abnormal behaviors of pigs

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a pig abnormal behavior detection method.

Background

In animal husbandry, particularly in pig farms in enclosed farming environments, infectious diseases between animals cause severe damage to their welfare, are prone to fatal infections, and cause significant economic losses to farmers. The necessary condition for realizing welfare breeding of live pigs not only provides a good living environment for swinery, but also continuously monitors animal behaviors to discover abnormality as soon as possible for timely diagnosis and treatment, thereby realizing benefit maximization.

The behavior of the pigs reflects the welfare condition and social interaction of animals, and is an important basis for analyzing the health condition of the pigs and managing healthy breeding. Close interactions between pigs may have a negative impact on the health of the pig and reduce animal welfare, for example seating can occur in both male and sows, especially in estrus, often as a result of pigs placing both forepaws on the body or head of another pig, which is simply lying down or rapidly dodging causing bruising, limping and leg fractures, which would result in severe economic losses to the animal industry. Therefore, by monitoring different abnormal behaviors of the live pigs in time, the abnormal conditions of the live pigs can be evaluated, so that diseases of the live pigs are prevented or the diseases are prevented from spreading, and the breeding welfare level of the live pigs is improved.

In recent years, a great deal of research has been conducted using neural network-based methods due to the remarkable expression of deep learning in the field of anomaly detection. However, implementing pig behavior monitoring in a closed pig farm breeding environment presents formidable challenges to computer vision, such as confusion between different pigs due to visual similarity, sudden movements due to aggressive behavior of pigs, frequent occlusion, pigs only crowding with each other, etc. Abnormal behavior detection training effects under supervised learning are susceptible to video surveillance data set distribution imbalance, and the performance of the abnormal behavior detection training effects depends on the usability and quality of a manually annotated training data set to a great extent and is not suitable for video-based abnormal behavior detection. Therefore, from the perspective of unsupervised learning, a large number of researchers propose a method for detecting abnormal behaviors of self-adaptive video data without the need of labeled data set training, and achieve better effects on various data sets. However, the existing unsupervised method mainly designs a targeted algorithm to identify specific abnormal behaviors such as attack, tail biting, climbing and the like, and has the defect that a special algorithm is designed to detect only one abnormal behavior, and the general abnormal behavior detection of pigs cannot be realized.

Disclosure of Invention

The invention provides a pig abnormal behavior detection method for overcoming the technical defect that the conventional abnormal detection method cannot realize the universal pig abnormal behavior detection.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a method for detecting abnormal behaviors of pigs comprises the following steps:

s1: acquiring live videos of pigs in real time, and extracting images from the live videos of the pigs frame by frame;

s2: carrying out target detection and cutting on each extracted frame image by adopting an improved Yolov5n network model to respectively obtain a target screenshot of each pig in each frame image;

the improved Yolov5n network model is as follows: adding a channel attention module after the 4 th layer, the 6 th layer and the 8 th layer of a trunk feature extraction network of the existing Yolov5n network model, splicing the channel attention module with the upper sampling layers of the 18 th layer, the 22 th layer and the 26 th layer of the neck network, and adding a C3 layer and a channel attention module after the 11 th layer of the trunk feature extraction network;

s3: constructing an end-to-end trainable double-flow convolution automatic encoder network based on an object as a center, extracting appearance characteristic vectors and motion characteristic vectors of all pigs in a target screenshot, and performing characteristic fusion to form characteristic vectors of corresponding frames;

the double-current convolution automatic encoder network only adopts images of normal behaviors of pigs for training;

s4: clustering the fusion characteristic vectors by adopting a K-means clustering algorithm, and inputting the result into a binary classifier for training to obtain a trained classifier;

s5: in each frame of image, obtaining classification scores of all target screenshots in a current frame of image through a classifier, and combining all classification scores to form an abnormal prediction image of the current frame of image;

s6: performing Gaussian filtering time sequence smoothing on the abnormal prediction image of the current frame image, and recording the obtained highest classification score as the abnormal score of the current frame image;

s7: judging whether the abnormal score of the current frame image is a positive number;

if yes, the pigs in the current frame image have no abnormal behaviors;

if not, the pigs in the current frame image only have abnormal behaviors.

According to the scheme, the improved Yolov5n network model is adopted to perform target detection and cutting on each frame of image to obtain the target screenshot of each pig in each frame of image, all pigs can be effectively detected in the actual pig farm environment shielded in a complex way, then the target screenshot is classified by the classifier which is trained only by the images of the normal behaviors of the pigs to obtain the classification score, the abnormal score of each frame of image is further obtained, the abnormal behavior detection of the pigs is finally realized according to the abnormal score, the lack of the training data of the actual abnormal behaviors is made up, and the abnormal behaviors of the pigs can be accurately identified.

Preferably, the channel attention module comprises compression, excitation and zoom operations; wherein the content of the first and second substances,

the compression operation is as follows: compressing the dimension H W C of the original feature layer to 1W 1C using global average pooling;

the excitation operation is as follows: fusing feature graph information of each feature channel by using two full-connection layers, and then normalizing the weight by using a Sigmoid function;

the zooming operation is as follows: and mapping the weight output after the excitation operation into the weight of a group of characteristic channels, and then multiplying and weighting the weight by the characteristics of the original characteristic diagram to realize the characteristic recalibration of the original characteristics on the channel dimension.

Preferably, the improvement of the Yolov5n network model further comprises adding a 64-time down-sampling detection layer, so that the scale of the output feature map is 20 × 20.

Preferably, in step S5, a target screenshot is selected from the current frame image, the feature vectors of the selected target screenshot are extracted and clustered into k clusters through step S3, then the clustering results are respectively input into k classifiers to obtain k classification scores, the highest classification score is selected as the abnormal score of the selected target screenshot, and the step is repeated until the abnormal classification scores of all target screenshots in the current frame image are obtained.

Preferably, the classifier is a binary classifier, and the ith binary classifier is defined as follows:

wherein, w _j Representing weight vectors, b bias values, x samples input to a binary classifier, x canIs classified as normal or abnormal, x _j Represents the jth element of the sample, and m represents the dimension of x.

Preferably, the k binary classifiers are trained by:

a1: selecting images of normal behaviors of pigs from life videos of the pigs as training images;

a2: carrying out target detection and cutting on the training image by adopting an improved Yolov5n network model to respectively obtain target screenshots of all pigs in the training image;

a3: converting the target screenshot into a gray image, and subtracting a pixel value of an adjacent frame image of the training image to obtain a corresponding frame difference image;

a4: respectively taking the gray frame image and the gray frame difference image obtained in the step A3 as the input of an external viewing sub-network and an action sub-network in the convolution automatic encoder network for abnormal behaviors of the pigs taking the object as the center, and extracting the appearance characteristic vector and the action characteristic vector of each pig in the target screenshot through the network;

the auto-encoder network comprises a look sub-network for extracting look feature vectors from the target screenshots and an action sub-network for extracting action feature vectors from the frame difference images;

a5: fusing the appearance characteristic vector and the action characteristic vector to obtain a fused characteristic vector of the training image;

a6: performing k-means clustering on the fusion characteristic vectors to obtain a clustering result cluster i, i =1, 2.. Times, k;

a7: and inputting the clustering result into k binary classifiers to obtain k trained binary classifiers.

Preferably, the appearance sub-network and the action sub-network both comprise an attention module and a memory module; wherein, the first and the second end of the pipe are connected with each other,

the calculation formula of the attention module is as follows:

u ^t,t′ ＝a(s ^t-1 ,h ^t′ )

wherein, c ^t Representing the context vector at time T, T representing the total time length, α ^t,t′ Attention weight, h, representing the neighborhood of t at time t ^t′ Denotes the hidden unit output at time t', alpha denotes the attention weight, u ^t,t′ Output score, u, representing the neighborhood of t at time t ^t,k Representing the output score, s, of the k neighborhood at time t ^t-1 Representing a hidden state at time t-1;

the memory storage module comprises M memory items p _m M =1, \ 8230;, M, various prototype feature patterns for recording normal behavior data of pigs;

mapping for each query

By having corresponding weights to pairs

Memory term p of _m Performing weighted average to read memory item and obtain characteristics

Wherein, the first and the second end of the pipe are connected with each other,

representing memory items p _m′ Weight of (b), p _m′ Represents the m' thA memory item;

the update formula of the memory term is as follows:

where ← denotes update operation, f denotes L2 norm, v _t ^′k,m Representing a match probability value

The reconstruction of (a) is performed,

and representing the query index set of the memory storage module.

Preferably, when the memory term is updated, the weighted fraction epsilon of the tth frame image _t If the image of the t frame is larger than the preset threshold value, the image of the t frame is regarded as an abnormal frame, and the abnormal frame is not used for updating the memory item;

the weighted fraction ε is calculated by the following formula _t ：

Wherein the content of the first and second substances,

the weight value of the representation feature is given,

representing a certain feature in the neighborhood of t, I _t The characteristic at the t-th time is shown, and i and j represent spatial indexes.

Preferably, the loss function of the automatic encoder

Comprises the following steps:

wherein the content of the first and second substances,

in order to reconstruct the error, the image is reconstructed,

in order to characterize a compact loss function,

in order to characterize the separation loss function,

is a hyper-parameter.

Preferably, the first and second liquid crystal materials are,

the reconstruction error is:

the characteristic compact loss function is:

the feature separation loss function is:

where T represents the total time, T represents the time index, K represents the index of the query map, K represents the total number of query maps,

representing a certain feature in the neighborhood of t, I _t Features indicating time t, p _p Representing query mappings

P is a query mapping

The index of the most recent item of (c),

represents the weight of the mth memory item, M represents the index of the memory item, M represents the total number of the memory items, p _n Representing query mappings

The second most recent memory term of (c).

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a method for detecting abnormal behaviors of pigs, which is characterized in that target detection and cutting are carried out on each frame of image by adopting an improved Yolov5n network model to obtain a target screenshot of each pig in each frame of image, and all pigs can be effectively detected under the actual pig farm environment with complex shielding; and then constructing an end-to-end trainable double-current convolution automatic encoder network based on an object as a center, only training by adopting a video of the normal behavior of the pig, only paying attention to the pig object existing in the scene, not needing to manually extract image characteristics, accurately positioning the abnormality in each frame, and judging the size of the occurrence scale and the duration time of the abnormal behavior of the pig. Simultaneously, a memory module with the characteristics of learning and storing the prototype of the normal pig behavior and a memory updating strategy are provided; and then, solving the problem of detecting abnormal behaviors of the pigs by adopting an unsupervised two-classification method, clustering the characteristic vectors obtained by network learning of the automatic encoder, and using the obtained clustering result for training a classifier. The classification scores are obtained by classifying the target screenshots through a trained classifier, the abnormal score of each frame of image is further obtained, and finally the abnormal behavior detection of the pigs is realized according to the abnormal scores, so that the defect of training data of the actual abnormal behaviors is made up, and the abnormal behaviors of the pigs can be accurately identified.

Drawings

FIG. 1 is a flow chart of the steps for implementing the technical solution of the present invention;

FIG. 2 is a schematic structural diagram of an improved Yolov5n network model in the present invention;

FIG. 3 is a schematic flow chart of obtaining a frame difference map according to the present invention;

FIG. 4 is a schematic diagram of an object-centric pig abnormal behavior detection network according to the present invention;

FIG. 5 is a schematic diagram of a sub-network of the autoencoder network of the present invention;

FIG. 6 is a diagram illustrating the reading of memory items according to the present invention;

FIG. 7 is a diagram illustrating memory entry updating according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a method for detecting abnormal behavior of pigs comprises the following steps:

s5: in each frame of image, obtaining classification scores of all target screenshots in the current frame of image through a classifier, and combining all the classification scores to form an abnormal prediction image of the current frame of image;

s7: judging whether the abnormal score of the current frame image is a positive number or not;

if yes, the pigs in the current frame image only have no abnormal behaviors;

if not, the pig in the current frame image only has abnormal behaviors.

In the specific implementation process, the improved Yolov5n network model is adopted to carry out target detection and cutting on each frame of image to obtain a target screenshot of each pig in each frame of image, all pigs can be effectively detected under the actual pig farm environment with complex shielding, then the target screenshot is input into a double-current convolution automatic encoder network to extract the appearance and motion characteristic vectors of each pig in the target screenshot, the fused characteristic vectors are clustered, and the obtained clustering result is used for training the classifier. The classification scores are obtained by classifying the target screenshots through the trained classifier, the abnormal scores of all frames of images are further obtained, and finally the abnormal behavior detection of the pigs is realized according to the abnormal scores, so that the lack of training data of the actual abnormal behaviors is made up, and the abnormal behaviors of the pigs can be accurately identified.

Example 2

as shown in fig. 2, the improved Yolov5n network model is: adding a channel attention module after the 4 th layer, the 6 th layer and the 8 th layer of a trunk feature extraction network of the existing Yolov5n network model, splicing the channel attention module with the upper sampling layers of the 18 th layer, the 22 th layer and the 26 th layer of the neck network, and adding a C3 layer and a channel attention module after the 11 th layer of the trunk feature extraction network;

in practical implementation, aiming at the problems that the boundary frame positioning is not accurate enough, so that overlapped objects are difficult to distinguish, the robustness is poor and the like in the existing Yolov5n network model, the embodiment adds an attention module of an SE-Net channel in a Backbone feature extraction network backhaul for improvement, establishes feature mapping in the interaction between convolution network channels, enables the network model to automatically learn global feature information and highlight useful feature information, and inhibits other less important feature information at the same time, so that the network model is more focused on the purpose of training a shielding object.

More specifically, the channel attention module comprises compression, excitation and scaling operations; wherein the content of the first and second substances,

More specifically, the improvement of the Yolov5n network model further comprises adding a 64-fold downsampling detection layer, so that the scale of the feature map of the output is 20 × 20.

In the specific implementation process, on the basis of original 3 detection layers with different scales (40x 40, 80x80 and 160x 160) of a Yolov5n network model, a detection layer with an ultra-small scale (20x 20) is added, namely after 8-time, 16-time and 32-time down-sampling, a 64-time down-sampling detection layer is added, so that a detection layer characteristic diagram with a 20x 20 scale is obtained, the network depth is further deepened, the network model can extract semantic information with higher levels, the information is richer, the multi-scale learning capacity of the model in a complex scene is enhanced, and the detection performance of the model is improved.

more specifically, in step S5, a target screenshot is selected from the current frame image, the feature vectors of the selected target screenshot are extracted and clustered into k clusters through step S3, then the clustering results are respectively input into k classifiers to obtain k classification scores, the highest classification score is selected as the abnormal score of the selected target screenshot, and the steps are repeated until the abnormal classification scores of all target screenshots in the current frame image are obtained.

More specifically, the classifier is a binary classifier, and the ith binary classifier is defined as follows:

wherein, w _j Representing a weight vector, b representing a bias value, x representing a sample input to a binary classifier, x being able to be classified as a normal sample or an abnormal sample, x _j Represents the jth element of the sample and m represents the dimension of x.

More specifically, k binary classifiers are trained by:

in practical implementation, the image containing a plurality of swineries is subjected to mask processing, a mask layer is added on the original image by taking the check swinery as a boundary, pigs of other columns are covered, a training data set of multiple scenes (different pigsties, different numbers of pigs, different shielding degrees, different illumination, different shapes of pigs and the like) is constructed according to the mask layer, and the pigs on the image are manually marked;

in practical implementation, because the default of the preset anchor frame of the existing Yolov5n network model is mainly for the coco data set (microsoft provides a public data set), which is completely different from the length-width ratio of the label frame of the training data set in the embodiment (the maximum length-width ratio of the label frame of the coco data set reaches 1.

A3: converting the target screenshot into a gray image, and subtracting the pixel value of the target screenshot from the pixel value of an adjacent frame image of the training image to obtain a corresponding frame difference image, as shown in fig. 3;

a4: respectively taking the gray frame image and the gray frame difference image obtained in the step A3 as the input of an appearance sub-network and an action sub-network in the convolution automatic encoder network for abnormal behaviors of the pigs taking the object as the center, and extracting appearance characteristic vectors and action characteristic vectors through the network, as shown in a figure 4;

the auto-encoder network comprises a look sub-network for extracting look feature vectors from the target screenshot and an action sub-network for extracting action feature vectors from the frame difference image;

more specifically, the appearance sub-network and the action sub-network both comprise an attention module and a memory module; wherein the content of the first and second substances,

the calculation formula of the attention module is as follows:

u ^t,t′ ＝a(s ^t-1 ,h ^t′ )

wherein, c ^t Representing the context vector at time T, T representing the total time length, alpha ^t,t′ Attention weight, h, representing the neighborhood of t at time t ^t′ Denotes the hidden unit output at time t', alpha denotes the attention weight, u ^t,t′ Output score, u, representing the neighborhood of t at time t ^t,k Output score, s, representing k neighborhood at time t ^t-1 A hidden state representing time t-1;

in a specific implementation process, the automatic encoder network has a dual-flow structure composed of two sub-networks, namely an appearance sub-network and an action sub-network, wherein the two sub-networks include an encoder, a memory storage module and a decoder, and the encoder and the decoder include a space convolution layer, three convolution LSTM layers (ConvLSTM), three attention modules and two maximum pooling layers (MaxPool), as shown in fig. 5.

By constructing an end-to-end trainable double-flow convolution automatic encoder network based on an object as a center, only a pig object existing in a scene is concerned, image features do not need to be extracted manually, meanwhile, the abnormality in each frame can be accurately positioned, the size of the occurrence scale and the duration of the abnormal behavior of the pig are judged, and the method has the technical advantages of time saving, high efficiency, high accuracy and high robustness.

The memory module comprises M memory items p _m M =1, \8230M, various prototype characteristic patterns for recording normal behavior data of pigs;

as shown in fig. 6, C in fig. 6 represents calculating cosine similarity of the two, S represents softmax function, and W represents weighted average; for reading memory items by computing per-query mappings

And all memory items p _m Cosine similarity between the two images to obtain a two-dimensional graph with the size of M multiplied by K, and then applying a softmax function along the vertical direction to obtain the reading matching probability

Mapping for each query

By having corresponding weights to pairs

Wherein the content of the first and second substances,

representing memory items p _m′ Weight of p _m′ Represents the m' th memory item;

as shown in fig. 7, C in fig. 7 represents calculating cosine similarity of the two, S represents a softmax function, W represents a weighted average, and n represents maximum normalization; for update operations, for each memory term p _m By calculating

Further select the distance p _m Recent query mapping

Then using the query index set

Updating the memory term, wherein the updating formula of the memory term is as follows:

where ← denotes update operation, f denotes L2 norm, v _t ^′k,m Representing write matching probability values

And (4) reconstructing.

A5: fusing the appearance characteristic vector and the action characteristic vector to obtain a fusion characteristic vector of the training image;

a6: performing k-means clustering on the fusion characteristic vectors to obtain a clustering result cluster i, i =1,2,.., k;

In the specific implementation process, a context is constructed through K-means clustering, and in the context, one subset in a normal sample is equivalent to a pseudo-abnormal sample relative to the other subset, so that the problem of lack of a real abnormal sample is solved. K-means clustering clusters normal samples into K clusters, each cluster representing some normal behavior of the pig, respectively, different from the behavior represented by the other clusters, i.e. from the perspective of a given cluster i, samples belonging to other clusters (from the data set {1, 2., K } \ i \ representing other than i ]) can be considered as abnormal samples.

if yes, the pigs in the current frame image have no abnormal behaviors;

if not, the pig in the current frame image only has abnormal behaviors.

Example 3

s2: performing target detection and cutting on each extracted frame image by adopting an improved Yolov5n network model to respectively obtain a target screenshot of each pig in each frame image;

more specifically, the channel attention module comprises compression, excitation and scaling operations; wherein, the first and the second end of the pipe are connected with each other,

the compression operation is as follows: compressing the dimension H x W x C of the original feature layer to 1 x C using global mean pooling;

the zooming operation comprises the following steps: and mapping the weight output after the excitation operation into the weight of a group of characteristic channels, and multiplying and weighting the weight by the characteristics of the original characteristic diagram to realize characteristic recalibration of the original characteristics on the channel dimension.

More specifically, the improvement of the Yolov5n network model further comprises adding a 64-time down-sampling detection layer, so that the scale of the output feature map is 20 × 20.

s4: clustering the fusion feature vectors by adopting a K-means clustering algorithm, and inputting the result into a binary classifier for training to obtain a trained classifier; s5: in each frame of image, obtaining classification scores of all target screenshots in the current frame of image through a classifier, and combining all the classification scores to form an abnormal prediction image of the current frame of image;

more specifically, in step S5, a target screenshot is selected from the current frame image, the feature vectors of the selected target screenshot are extracted and clustered into k clusters through step S3, then the clustering results are respectively input into k classifiers to obtain k classification scores, the highest classification score is selected as the abnormal score of the selected target screenshot, and the step is repeated until the abnormal classification scores of all target screenshots in the current frame image are obtained.

wherein w _j Representing a weight vector, b representing a bias value, x representing a sample input to a binary classifier, x being able to be classified as a normal sample or an abnormal sample, x _j Represents the jth element of the sample, and m represents the dimension of x.

More specifically, k binary classifiers are trained by:

a3: converting the target screenshot into a gray image, and subtracting the pixel value of the gray image from the adjacent frame image of the training image to obtain a corresponding frame difference image;

a4: b, respectively taking the gray frame image and the gray frame difference image obtained in the step A3 as the input of an appearance sub-network and an action sub-network in the convolution automatic encoder network for abnormal behaviors of the pigs taking the object as the center, and extracting the appearance feature vector and the action feature vector of each pig in the target screenshot through the network;

more specifically, the appearance sub-network and the action sub-network both comprise an attention module and a memory module; wherein, the first and the second end of the pipe are connected with each other,

the calculation formula of the attention module is as follows:

u ^t,t′ ＝a(s ^t-1 ,h ^t′ )

mapping for each query

By passingFor having corresponding weight

the update formula of the memory term is as follows:

where ← denotes update operation, f denotes L2 norm, v _t ^′k,m Representing the probability value of a match

The reconstruction of (a) is performed,

representing the query index set of the memory storage module.

More specifically, when the memory item is updated, if the t-th frame image is weighted, the fraction epsilon is obtained _t If the image is larger than the preset threshold value, the t frame image is regarded as an abnormal frame, and the abnormal frame is not used for updating the memory item;

the weighted fraction epsilon is calculated by the following formula _t ：

Wherein the content of the first and second substances,

the weight value of the representation feature is given,

representing a certain feature in the neighborhood of t, I _t The characteristic at the t-th moment is shown, and i and j represent spatial indexes.

In the specific implementation process, when normal samples and abnormal samples exist simultaneously, in order to prevent the memory from recording the characteristics of the abnormal samples of the pigs, the abnormal condition of the video frames is measured by using weighted rule score, and the memory item is updated only when the frames are determined to be normal.

More particularly, the loss function of the automatic encoder

Comprises the following steps:

wherein the content of the first and second substances,

in order to reconstruct the error,

in order to characterize a compact loss function,

in order to characterize the separation loss function,

is a hyper-parameter.

More specifically, the present invention is to provide a novel,

the reconstruction error is:

the characteristic compact loss function is:

the feature separation loss function is:

where T represents the total time, T represents the time index, l represents the index of the query map, K represents the total number of query maps,

P is a query mapping

The index of the most recent item of (c),

representing the weight of the mth memory item, M representing the index of the memory item, M representing the total number of memory items, p _n Representing query mappings

The second most recent memory entry of (2).

In a specific implementation process, a memory module of the automatic encoder is trained through a feature compact function and a feature classification loss function, so that the diversity and the discrimination of memory items are ensured.

if yes, the pigs in the current frame image have no abnormal behaviors;

if not, the pig in the current frame image only has abnormal behaviors.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method for detecting abnormal behaviors of pigs is characterized by comprising the following steps:

the improved Yolov5n network model comprises the following steps: adding a channel attention module after the 4 th layer, the 6 th layer and the 8 th layer of a trunk feature extraction network of the existing Yolov5n network model, splicing the channel attention module with the upper sampling layers of the 18 th layer, the 22 th layer and the 26 th layer of the neck network, and adding a C3 layer and a channel attention module after the 11 th layer of the trunk feature extraction network;

s4: clustering the fusion characteristic vectors by adopting a K-means clustering algorithm, and inputting the result into a binary classifier for training to obtain a trained classifier; s5: in each frame of image, obtaining classification scores of all target screenshots in a current frame of image through a classifier, and combining all classification scores to form an abnormal prediction image of the current frame of image;

if yes, the pigs in the current frame image have no abnormal behaviors;

if not, the pigs in the current frame image only have abnormal behaviors.

2. The method of claim 1, wherein the channel attention module comprises compression, excitation, and scaling operations; wherein the content of the first and second substances,

the excitation operation is as follows: fusing feature map information of each feature channel by using two full connection layers, and then normalizing the weight by using a Sigmoid function;

the zooming operation comprises the following steps: and mapping the weight output after the excitation operation into the weight of a group of characteristic channels, and then multiplying and weighting the weight by the characteristics of the original characteristic diagram to realize the characteristic recalibration of the original characteristics on the channel dimension.

3. The method of claim 1, wherein the improvement of the Yolov5n network model further comprises adding a 64-fold down-sampling detection layer to make the feature map scale of the output 20 × 20.

4. The method of claim 1, wherein in step S5, a target screenshot is selected from the current frame image, the feature vectors of the selected target screenshot are extracted and clustered into k clusters through step S3, then the clustering results are respectively input into k classifiers to obtain k classification scores, the highest classification score is selected as the abnormal score of the selected target screenshot, and the steps are repeated until the abnormal classification scores of all target screenshots in the current frame image are obtained.

5. The method of claim 4, wherein the classifier is a binary classifier, and the ith binary classifier is defined as follows:

6. The method of claim 5, wherein k binary classifiers are trained by:

7. The method of claim 6, wherein the object-centric dual-stream convolutional automatic encoder network trained for motion and appearance comprises an appearance sub-network and an action sub-network, both sub-networks comprising an attention-based convolutional LSTM module and a memory module; wherein the content of the first and second substances,

the calculation formula of the attention module is as follows:

u ^t，t′ ＝a(s ^t-1 ，h ^t′ )

wherein, c ^t Representing the context vector at time T, T representing the total time length, α ^t，t′ Attention weight, h, representing the neighborhood of t at time t ^t′ Denotes the hidden unit output at time t', alpha denotes the attention weight, u ^t，t′ Output score, u, representing the neighborhood of t at time t ^t，k Representing the output score, s, of the k neighborhood at time t ^t-1 A hidden state representing time t-1;

the memory module comprises M memory items p _m M = 1.. Said, M, various prototype signature patterns for recording normal behavior data of pigs;

mapping for each query

By having corresponding weights to pairs

the update formula of the memory term is as follows:

where ← denotes the update operation, f denotes the L2 norm,

representing the probability value of a match

The reconstruction of (2) is performed,

representing the query index set of the memory storage module.

8. The method of claim 7, wherein the weighted score ε is determined if the t-th frame is weighted when updating the memory term _t If the image of the t frame is larger than the preset threshold value, the image of the t frame is regarded as an abnormal frame, and the abnormal frame is not used for updating the memory item;

the weighted fraction epsilon is calculated by the following formula _t ：

a weight value representing a characteristic of the image,

9. The method of claim 6, wherein the loss function of the automatic encoder is a loss function of the pig

Comprises the following steps:

in order to reconstruct the error,

in order to characterize a compact loss function,

in order to characterize the separation loss function,

is a hyper-parameter.

10. The method of claim 9, wherein the abnormal behavior of pig is detected,

the reconstruction error is:

the characteristic compact loss function is:

the feature separation loss function is:

wherein T represents the total time, T represents the time index, K represents the index of the query map, K represents the total number of query maps,

representing a certain feature in the neighborhood of t, I _t Characteristic of the t-th moment, p _p Representing query mappings

P is a query mapping

The index of the most recent item of (c),

The second most recent memory entry of (2).