CN115048984A

CN115048984A - Sow oestrus recognition method based on deep learning

Info

Publication number: CN115048984A
Application number: CN202210536083.0A
Authority: CN
Inventors: 刘同海; 王源; 李守晓; 张航; 孟玉环
Original assignee: Tianjin Agricultural University
Current assignee: Tianjin Agricultural University
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-09-13

Abstract

The invention discloses a sow oestrus sound identification method based on deep learning, which comprises the steps of firstly collecting sound signals of an oestrus sow, preprocessing the collected sound signals, cutting, sampling and marking the sound signals subjected to noise reduction according to sound production characteristics, and dividing data labels into sow oestrus sounds and non-oestrus sounds; converting the preprocessed sound signal data into a logarithmic Mel spectrum data set; then, a deep learning model is built, the logarithmic Mel spectrogram is used as model input, training and testing are carried out, and finally the deep learning model capable of classifying oestrus sounds and non-oestrus sounds of sows is obtained; and judging the sound signals of the sows by using the trained deep learning model and outputting a classification result. According to the sow oestrus monitoring method, the logarithmic Mel spectrogram is adopted to acquire the sow oestrus sound information, and the improved deep learning algorithm is used for training and learning, so that the oestrus identification precision can be effectively improved, the early warning of the sow oestrus state is realized, and the sow oestrus monitoring level is improved.

Description

Sow oestrus recognition method based on deep learning

Technical Field

The invention relates to the technical field of sound intelligent monitoring, in particular to a sow oestrus sound identification method based on deep learning.

Background

The pig breeding is one of main industries in the agricultural field of China, the traditional breeding industry is gradually developed into welfare breeding along with the rapid development of scientific technology and the adjustment of relevant pig industry policies, the pig breeding industry is increasingly intelligent and large-scale, wherein the timely monitoring of the oestrus of sows is one of important measures for ensuring the yield of pigs and improving the number of piglets born and broken (PSY), and is also a key part in pig management. The artificial insemination is performed when the sow is in estrus, so that the conception rate of the sow can be improved, and the reproductive performance of the sow can be improved.

At present, the monitoring of the oestrus of sows is mainly divided into two categories, one category is that the oestrus of the sows monitored in most pig farms in China is mainly judged by a traditional manual observation mode, and the method not only has high labor cost, but also is long in time consumption, difficult to timely and accurately master the oestrus of the sows and not suitable for a large-scale breeding mode. The other type is that the oestrus of the sows is monitored based on a digital monitoring technology, and Wangkai and the like adopt an attitude sensor at the neck of the sows and identify oestrus behaviors through climbing behaviors and activity; sykes and the like use a digital thermal infrared imager to distinguish the estrus and the anophelifuge of the pig in the estrus cycle; ostersen and the like record the visiting duration time of the sow visiting the boar window through RFID and realize automatic oestrus detection of the sow; freson et al converts behavior changes of the sow into temperature changes based on infrared images and reflects the temperature changes in a voltage mode, and judges whether the sow is oestrous by taking the average daily activity of the sow as an index.

Animals can generate different sound signals under different states such as hunger, companioning, cough, fighting and the like, and the sound signals are not only an important mode for communication among the animals, but also can directly reflect information such as internal health conditions, emotions and behaviors of the animals. With the rapid development of related technologies such as digital signal processing and deep learning, the monitoring of the oestrus of the sows by using a voice recognition technology becomes possible.

The traditional sound classification method mainly obtains partial features from sound signals for classification, such as Power Spectral Density (PSD), short-time Energy (Energy), mel-frequency cepstrum coefficient (MFCC) and the like, and the information extraction method is slow in speed and low in accuracy. The Mel spectrogram belongs to a two-dimensional spectrum analysis graph, the abscissa displays time, the ordinate displays frequency, the abscissa point value displays the energy of sound data, three-dimensional information can be expressed, different sounding states can be reflected according to the variation relation among the time, the frequency and the sound energy, the rapid development of deep learning in the fields of image processing and the like is combined, sound classification by using the spectrogram becomes an effective identification means, and the classification accuracy and efficiency can be well improved.

Disclosure of Invention

The invention provides a sow oestrus voice recognition method based on deep learning, aiming at overcoming the defects in the conventional oestrus monitoring and realizing accurate monitoring of the oestrus state of a sow.

The technical scheme adopted by the invention is as follows:

a sow oestrus voice recognition method based on deep learning comprises the following steps:

step 1, collecting sound signals of an oestrus sow through data collection equipment, wherein the sound signals comprise oestrus sound signals and non-oestrus sound signals;

step 2, preprocessing the sound signals acquired in the step 1, cutting, sampling and marking the noise-reduced sound signals according to sounding characteristics, and dividing data labels into sow oestrus sounds and non-oestrus sounds;

step 3, converting the sound signal data processed in the step 2 into a logarithmic Mel spectrum data set;

step 4, building a deep learning model, inputting a logarithmic Mel spectrogram serving as a model, training and learning the oestrus characteristics of the sows, and testing to finally obtain the deep learning model capable of classifying oestrus and non-oestrus of the sows;

and 5, judging the sound signals of the sows by using the deep learning model trained in the step 4 and outputting a classification result.

Further, in the step 1, the non-oestrous sound signals comprise screaming, eating, ear throwing and other sound signals sent by the sow.

Further, the step 2 comprises the following steps:

s1: uniformly converting the collected sound signals into a WAV format, and selecting a 44100Hz sampling rate for conversion;

s2: carrying out noise reduction processing on the converted sound signal by using a spectral subtraction method;

s3: the whole noise-reduced sound signal is manually cut, sampled and labeled according to the sounding characteristics, the data label is divided into a sow oestrus sound and a non-oestrus sound by an artificial auditory method, and the sampling time interval is 1-2 s.

Further, the step 3 of converting the sound signal data into a log mel-frequency spectrum data set comprises the following steps:

s1: firstly, pre-emphasis processing is carried out on the sound signal, and a transfer function H (z) -1-alphaz is selected ^-1 Is realized by the first-order FIR high-pass digital filter, wherein the pre-emphasis coefficient alpha is in the range of 0.9<α<1.0；

S2: selecting a Hanning window to perform time domain windowing on the filtered signals, performing Fourier transform on each frame of signal, selecting a logarithm operation result, generating a logarithm Mel language spectrogram according to a time sequence, and labeling a sound label to form a logarithm Mel spectrogram data set.

Further, in the step 3, determining the optimal parameters for generating the logarithmic mel frequency spectrum map through different spectrum parameter tests, and constructing a logarithmic mel frequency spectrum map data set on the basis; the optimal parameters for generating the logarithmic Mel frequency spectrogram are determined through different spectrogram parameter tests by performing grouping experiments by using different window lengths and window shift parameters, performing permutation, combination and collocation by using 256-point FFT, 512-point FFT, 1024-point FFT and 1\2, 1\4 window shifts, performing 5 independent experiments on each group of experiments, finally selecting the 256-point FFT and 1/2 window shift parameters with the best effect as spectrogram conversion parameters, and constructing a logarithmic Mel frequency spectrogram data set by using the parameters.

Further, the log mel frequency spectrum data set constructed in the step 3 is distributed with a training set and a testing set according to a ratio of 8:2, and the training set and the testing set data comprise estrus and non-estrus signal types.

Further, the deep learning model in the step 4 is an improved mobrienet v3 network, the model uses a 224 × 224 × 3 logarithmic mel spectrum as the input of the network parameter as the model, firstly, the logarithmic mel spectrum is subjected to standard convolution by using a 3 × 3 convolution kernel, the number of the convolution kernels is 16, h-swish is selected as an activation function, and the attention mechanism module uses an ECA module; after the features of 112 × 112 × 16 are obtained, the features are input into 3 × 3 convolution kernels and 8 × 5 block units to obtain features of 7 × 7 × 96, the features are subjected to dimension raising to 7 × 7 × 576 through the convolution kernels of 1 × 1, the features are changed into one-dimensional vectors by means of global average pooling, and finally, the two predicted classification results are obtained after standard convolution is carried out through 2 convolution kernels of 1 × 1, and compared with the input labels, the same classification results can be expressed as input defined utterance classes. The ECA module replaces the original SE module, is an improved model of the SE module, and replaces a full connection layer in the SE module with 1D convolution.

The invention has the beneficial effects that: according to the invention, by collecting the standard signal of the oestrus of the sows in the pigsty and using the logarithmic Mel-map capable of displaying three-dimensional characteristics to extract the features of the oestrus signal, the comprehensiveness of the oestrus information is ensured; the improved MoblienetV3 network identification model is established, the deep features of the oestrus sound signals can be automatically obtained through the multilayer convolution and the attention mechanism introduction module, compared with a conventional deep learning algorithm, the method can more effectively carry out nondestructive monitoring on the oestrus sound of the sows, and the identification and classification accuracy is improved.

Drawings

Fig. 1 is an overall architecture diagram of the estrus classification method.

Fig. 2 is a time domain diagram of a segment of a sound signal collected in a pigsty before noise reduction.

FIG. 3 is a time domain plot of a noise-reduced sound signal collected in a pig house.

Fig. 4 is a log mel-frequency spectrum of different sound signals.

Fig. 5 is a modified mobrienet v3 network model structure.

For a person skilled in the art, without inventive effort, other relevant figures can be derived from the above figures.

Detailed Description

In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.

A sow oestrus recognition method based on deep learning comprises the following steps:

step 1, collecting sound signals of oestrous sows in a pigsty through data collection equipment, wherein the sound signals comprise oestrous sound signals and non-oestrous sound signals such as screaming, humming, ear throwing and the like;

the sound collection place of the sows is in Derun pig farm in Anping county of Hebei province, China, the experiment collection object is 4-5-fetus-plus-system Changbai and white-pored sow oestrus in a colony pigsty and a limiting fence, and the data collection is carried out by utilizing a Sony ICD-UX560F digital recording pen for manual collection and a WIFI intelligent network sound pickup for uninterrupted suspension collection for 24 hours. The sampling rate of the collected data is 44.1KHz double-channel audio, the audio is stored in equipment in a wav format, and the test equipment is installed in the middle of the swinery in a hoisting mode.

Step 2, preprocessing the collected sound signals;

(1) uniformly converting the collected sound signals of the sows into a WAV format, and selecting 44100Hz sampling rate for conversion;

(2) carrying out noise reduction processing on the converted sound signal by using a spectral subtraction method; let the i-th frame have pure sound x (n), noisy signal Y (n), noisy signal d (n), and Y _i (w) is y _i (n) Fourier transformed (FFT) value, D _i (w) is d _i (n) a value after Fourier transform (FFT), n being a sound sampling point number.

The spectrum of the spectrally subtracted clean speech is estimated as

In the formula X _i (w) is x _i (n) Fourier transformed (FFT) value, | Y _i (w) | is the amplitude spectrum of the noisy signal,

for the estimation of the average amplitude spectrum of the noise signal,

is y _i (n) wherein the over-subtraction factor is set to 3 and the gain compensation factor is set to 0.002. And then, after Discrete Cosine Transform (DCT), overlapping and adding each frame of pure sound signals to obtain the denoised sound signals. Fig. 2 and 3 show the comparison of the oestrus sound of the original sow and the denoised sound.

(3) Cutting, sampling and labeling the whole noise-reduced sound signal according to sounding characteristics, dividing a data label into a sow oestrus sound and a non-oestrus sound by an artificial auditory method, and setting the sampling time interval to be 1-2 s. Sound and invalid sound segments in various states may exist in a segment of audio frequency of the sound data collected by the experiment, and the lengths of the audio frequencies are different, so manual labeling and batch segmentation operations need to be further performed to construct a data set required by the experiment. The software used for manual labeling is Adobe audio processing software, and sound segments capable of clearly judging the type of the swine sounds are intercepted and classified and marked through repeated playback and according to judgment experience. 3500 sow oestrus samples and 2500 non-oestrus samples are obtained after data segmentation, wherein the non-oestrus samples comprise 200 ear-fling sound samples, 500 eating sound samples, 1000 screaming sound samples and 800 humming sound samples.

Step 3, converting the sound signal into a spectrogram, comprising the following 3 steps:

(1) firstly, pre-emphasis processing is carried out on a sound signal, a first-order FIR high-pass digital filter is selected for realization, and a transfer function is as follows:

H(z)＝1-αz ^-1 (2)

wherein the pre-emphasis coefficient alpha is in a value range of 0.9< alpha < 1.0.

(2) Selecting a Hanning window to perform time domain windowing on the filtered signals, performing Fourier transform on each frame of signal, selecting a logarithm operation result, generating a logarithm Mel spectrogram according to a time sequence, and labeling a sound label to form a logarithm Mel spectrogram data set of sow oestrus, wherein the logarithm Mel spectrogram is a frequency spectrum analysis chart, three-dimensional information is expressed by adopting a two-dimensional plane, the abscissa of the spectrogram is time, the ordinate is frequency, and the coordinate point value is a voice data energy value.

(3) Further, different types of spectrograms have certain influence on model performances such as identification precision and the like, and in order to select optimal spectrogram parameters, the spectrogram parameters comprise FFT (fast Fourier transform) point number and window shift size; setting different parameters for 5 independent experiments, randomly distributing a data set into a training set and a testing set according to the ratio of 8:2, inputting the training set and the testing set into an improved MobileNet V3 network model by adopting the size of 224 multiplied by 224 spectrogram, selecting an average value of 5 test results as comparison, wherein the specific parameters and the recognition effect are shown in Table 1, the test results show that the FFT point number of 256 points and the window shift of 1/2 have the best effect as the parameters of a logarithmic Mel spectrogram model, and constructing the logarithmic Mel spectrogram data set by using the parameters.

TABLE 1 statistics of the recognition rate (% by percentage) for different FFT point numbers and window shifts

And 4, step 4: constructing an improved lightweight MobileNetV3 network training learning sow oestrus feature and testing, wherein an improved model uses a 224 multiplied by 3 RGB logarithmic Mel spectrogram as the input of a network parameter as a model, firstly, standard convolution is carried out on the spectrogram by using a 3 multiplied by 3 convolution kernel, the number of the convolution kernels is 16, h-swish is selected as an activation function, and an attention mechanism module uses an ECA module; after the features of 112 × 112 × 16 are obtained, the features are input into 3 × 3 convolution kernels and 8 × 5 block units to obtain features of 7 × 7 × 96, the features are subjected to dimension raising to 7 × 7 × 576 through the convolution kernels of 1 × 1, the features are changed into one-dimensional vectors by means of global average pooling, and finally, the two predicted classification results are obtained after standard convolution is carried out through 2 convolution kernels of 1 × 1, and compared with the input labels, the same classification results can be expressed as input defined utterance classes. The ECA module replaces the original SE module, is an improved model of the SE module, uses 1D convolution to replace a full connection layer in the SE module, avoids data dimension reduction, effectively captures data information existing in cross-channel interaction, and improves the overall accuracy of the identification model.

Taking a training set in a constructed sow logarithmic Mel spectrogram dataset as an input of an improved MobileNetV3 deep learning network structure, training and testing an improved model by adopting a Pytrch deep learning frame and a Python language, wherein the batch size is 64, the iteration number is 50, each iteration comprises two processes of training and testing, and in each iteration training process, an Adam algorithm and a random gradient descent algorithm are adopted to optimize the training model; the learning rate is set to be 0.001, the learning rate of each iteration is kept unchanged, and the accuracy and the loss rate of each classification under different turns are recorded simultaneously so as to detect the model performance.

And 5, judging the oestrus sound signals of the sows by using the deep learning model trained in the step 4 and outputting a classification result.

In order to verify the effectiveness of the improved model, a comparison experiment is respectively carried out on the original MobileneetV 3, the improved MobileneetV 3, Resnet34 and ShuffleNet V2 network models by utilizing the constructed logarithm Mel spectrogram data set, the estrus sounds, the non-estrus sounds and the overall recognition rate under different models are shown in the table 2, the improved MobileneetV 3 model obtains a better prediction result compared with other models, and the highest accuracy of the estrus sounds can reach 96.4%. Test results show that the sow oestrus sound classification method based on deep learning can effectively realize monitoring and classification of sow oestrus.

TABLE 2 comparison of different network model identification Performance

The above description is a preferred embodiment of the present invention, and should not be construed as limiting the present invention in any way, the preferred embodiments and related technologies of the present invention have been described above, but the present invention is not limited thereto, and those skilled in the art can make modifications or substitutions by using the technical principles described above without departing from the scope of the technical principles of the present invention, including modifications and equivalent changes according to the core content of the scheme of the present invention, which are within the protection scope of the present invention.

Claims

1. A sow oestrus recognition method based on deep learning is characterized by comprising the following steps:

step 1, collecting sound signals of an estrus sow through data collection equipment, wherein the sound signals comprise an estrus sound signal and a non-estrus sound signal;

step 2, preprocessing the sound signals acquired in the step 1, cutting, sampling and marking the noise-reduced sound signals according to sounding characteristics, and dividing data labels into sow oestrus and non-oestrus;

2. The sow oestrus recognition method based on deep learning of claim 1, wherein: in the step 1, the non-oestrous sound signals comprise screaming, eating, ear throwing and other sound signals sent by the sow.

3. The sow oestrus recognition method based on deep learning of claim 1, wherein: the step 2 comprises the following steps:

4. The sow oestrus recognition method based on deep learning of claim 1, wherein: the step 3 of converting the acoustic signal data into a log mel-frequency spectrum data set comprises the following steps:

S2: selecting a Hanning window to perform time domain windowing on the filtered signals, performing Fourier transform on each frame of signal, selecting a logarithm operation result, generating a logarithm Mel spectrogram according to a time sequence, and labeling a sound label to form a logarithm Mel spectrogram data set.

5. The sow oestrus recognition method based on deep learning of claim 4, wherein: in the step 3, determining the optimal parameters for generating the logarithmic mel-frequency spectrum by different spectrum parameter tests, and constructing a logarithmic mel-frequency spectrum data set on the basis; the optimal parameters for generating the logarithmic Mel spectrograms determined by different spectrogram parameter tests are to carry out grouping tests by using different window lengths and window shift parameters, carry out permutation, combination and collocation by using 256-point FFT, 512-point FFT, 1024-point FFT and 1\2, 1\4 window shifts, carry out 5 independent tests on each group of tests, finally select the 256-point FFT and 1/2 window shift parameters with the best effect as spectrogram conversion parameters, and construct a logarithmic Mel spectrogram data set by using the parameters.

6. The sow oestrus recognition method based on deep learning of claim 1, wherein: and (4) allocating a training set and a test set to the logarithmic Mel-spectrogram data set constructed in the step (3) according to a ratio of 8:2, wherein the training set and the test set data comprise estrus and non-estrus signal types.

7. The sow oestrus recognition method based on deep learning of claim 1, wherein: the deep learning model in the step 4 adopts an improved MoblienetV3 network, the model uses a 224 multiplied by 3 logarithmic Mel spectrogram as the input of the model, firstly, the logarithmic Mel spectrogram uses a 3 multiplied by 3 convolution kernel to carry out standard convolution, the number of the convolution kernels is 16, h-swish is selected as an activation function, and an attention mechanism module uses an ECA module; after obtaining 112 × 112 × 16 features, inputting the features into 3 × 3 convolution kernels and 8 × 5 block units to obtain 7 × 7 × 96 features, then performing dimensionality raising on the features to 7 × 7 × 576 through 1 × 1 convolution kernels, changing the features into one-dimensional vectors by using global average pooling, finally performing standard convolution through 2 1 × 1 convolution kernels to obtain two classification results of prediction, comparing the two classification results with input labels, and expressing the same as the input defined utterance classes.