CN115424639A - Dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics - Google Patents

Dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics Download PDF

Info

Publication number
CN115424639A
CN115424639A CN202210522575.4A CN202210522575A CN115424639A CN 115424639 A CN115424639 A CN 115424639A CN 202210522575 A CN202210522575 A CN 202210522575A CN 115424639 A CN115424639 A CN 115424639A
Authority
CN
China
Prior art keywords
sound
signal
dolphin
centroid
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210522575.4A
Other languages
Chinese (zh)
Other versions
CN115424639B (en
Inventor
戴阳
杨昱皞
何瑞麟
王书献
伍玉梅
韦波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Ocean University
East China Sea Fishery Research Institute Chinese Academy of Fishery Sciences
Original Assignee
Dalian Ocean University
East China Sea Fishery Research Institute Chinese Academy of Fishery Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Ocean University, East China Sea Fishery Research Institute Chinese Academy of Fishery Sciences filed Critical Dalian Ocean University
Priority to CN202210522575.4A priority Critical patent/CN115424639B/en
Publication of CN115424639A publication Critical patent/CN115424639A/en
Application granted granted Critical
Publication of CN115424639B publication Critical patent/CN115424639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to a dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics, which comprises the following steps: acquiring a sound digital signal in the sea; cutting the acquired sound digital signal into digital signals with certain length, and carrying out normalization processing; performing framing processing on each normalized digital signal; calculating the short-time energy, the weighted spectrum centroid and the second-order spectrum centroid migration rate of each frame of signal; drawing a short-time energy threshold, a spectrum centroid threshold and a spectrum centroid second-order offset rate threshold; extracting frames higher than the short-time energy threshold and the spectrum centroid threshold as standby valid sound segments; and comparing the second-order spectrum centroid shift rates at the two ends of the standby effective sound segment, and taking the parts of the second-order spectrum centroid shift rates higher than the threshold value of the second-order spectrum centroid shift rates as the starting point and the ending point of the dolphin sound to generate the effective sound segment. The invention can quickly and accurately extract the dolphin sound signal in the marine environment and reduce the interference of burst noise.

Description

Dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics
Technical Field
The invention relates to the technical field of passive acoustics, in particular to a dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics.
Background
Dolphins are whale mammals, widely live in all large sea areas around the world, are distributed in the inner sea and in the salty and fresh water near the sea entrance, and are one of the key protection wild animals in China. The sound signals of dolphins are mainly classified into three categories: echo positioning signals (click), pulse signals (burst-pulse) and communication signals (whistle), wherein the whistle signals are mainly concentrated at 8 k-15kHz, the burst-pulse signals are mainly concentrated at 15 k-30kHz, and the click signals are mainly concentrated at 100 k-150 kHz. The echo positioning signal is a sound signal of dolphin predation, positioning and other activities, and has important significance for protecting dolphin populations and related resource investigation.
Marine environment noise fields are background sound fields that are ubiquitous and undesirable in marine environments. The acoustic background interference is generated by factors such as wind waves, rain fall, ships, marine life, human industrial activities and the like, and is the main acoustic background interference of the active sonar and the passive sonar. The acoustic composition of a marine environment is relatively complex, covering various frequency bands and is energetic. Thus, the presence of a noise field in the marine environment makes it difficult to identify the sound of a target animal in the sea. Although there are many studies to convert the sound problem into the image problem by drawing a signal spectrogram and then identify the sound problem by using deep learning or machine learning, the method involves interference of a large amount of noise regardless of whether the sound problem is processed by an acoustic method or converted into an image, so that preprocessing of data becomes important. Besides noise interference of marine environment, another difficulty is that dolphins have three sound signals, wherein the frequency of echo positioning signals often exceeds the hearing range of human ears, and the propagation characteristics of sound make the attenuation of high-frequency components faster, so that it becomes difficult to artificially process signals to make data sets, and identification by using a deep learning method is difficult.
Common methods for endpoint detection include: the double-threshold method end point detection utilizes the short-time energy and the short-time zero crossing rate, and the short-time energy calculation formula is as follows:
Figure BDA0003642326580000011
in the formula, E i Is the short-time energy of the ith frame, N is the number of sampling points of the single-frame signal, x n (m) is the amplitude magnitude of each sample point. The two thresholds are set according to the short-time energy and the short-time zero-crossing rate, so that voiced sound, unvoiced sound and mute sound can be well distinguished, but the zero-crossing rate can be changed due to noise fields and other biological sounds in the marine environment, and the zero-crossing rate cannot be used as a parameter for detecting the sound of the dolphin;
the spectrum centroid method comprises the following calculation formula:
Figure BDA0003642326580000012
wherein f is the signal frequency; e (n) is the spectrum energy of the corresponding frequency after short-time Fourier transform of the continuous signal. The spectrum centroid is the center of frequency components, the frequency distribution condition of sound can be expressed, and a lot of animals with high sound frequency and high-frequency noise exist in the marine environment, so that the sound behavior of the dolphin cannot be completely represented; the spectral entropy method is to express the degree of order of signals according to entropy, all sound signals in the marine environment are disordered, and the entropy cannot distinguish the sound of a target organism in a complex environment. These methods are not applicable in marine environments where species information is abundant, frequency ranges are wide, energy is large, and disorder is caused.
Disclosure of Invention
The invention aims to solve the technical problem of providing a dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics, which can detect the sound signals of dolphins in real time in a marine environment.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for detecting the dolphin sound endpoint under the environmental noise based on the time-frequency characteristics comprises the following steps:
a data acquisition step: acquiring a sound digital signal in the sea;
a data preprocessing step: cutting the acquired sound digital signal into digital signals with certain length, and performing normalization processing;
a signal framing step: performing framing processing on each normalized digital signal;
a characteristic extraction step: calculating time domain characteristics of each frame signal, and calculating the short-time energy of each frame signal; calculating frequency domain characteristics of each frame of signal, calculating a weighted spectrum centroid of each frame of signal, and calculating a second-order offset rate of the spectrum centroid after obtaining the weighted spectrum centroid;
a threshold value drawing step: drawing a short-time energy threshold, a spectrum centroid threshold and a spectrum centroid second-order offset rate threshold according to time-frequency characteristics of a marine environment noise field and dolphin sound;
extracting the standby valid sound segment: extracting frames higher than the short-time energy threshold and the spectrum centroid threshold as standby effective sound segments;
an effective sound segment generating step: and obtaining second-order spectrum centroid migration rates at two ends of the standby effective sound segment, and when the second-order spectrum centroid migration rates at the two ends of the standby effective sound segment are higher than the threshold value of the second-order spectrum centroid migration rate, taking the frame as the starting point and the ending point of the dolphin sound to generate the effective sound segment.
In the data preprocessing step, the maximum amplitude of the input sound signal is 1 to be normalized.
In the signal framing step, a square window function is selected for each normalized digital signal, and framing processing is performed on each digital signal by adopting a method that the step length is equal to the window length.
When the detected dolphin sound is an echo positioning signal emitted by a dolphin, the window length is selected to be 10ms, and when the dolphin sound is a pulse signal and a communication signal emitted by the dolphin, the window length is selected to be 20ms-50ms.
The characteristicsWhen the weighted spectrum centroid of each frame signal is calculated in the extraction step, short-time Fourier transform is firstly carried out on each frame signal, the weight of the dolphin sound frequency spectrum range and the weights of other frequency spectrum ranges are set, and then the weight of the dolphin sound frequency spectrum range and the weights of other frequency spectrum ranges are set
Figure BDA0003642326580000031
Calculating a weighted spectral centroid for each frame signal, wherein C i Representing the weighted spectral centroid, Q, of the ith frame signal 1 Weight, Q, representing the spectral range of dolphin's sound 2 Weights, F, representing other spectral ranges k1 Frequency of dolphin sounds; x k1 The spectral energy of the frequency of the dolphin sound corresponding to the continuous signal after short-time Fourier transform, N1 represents the frequency range of the dolphin sound, F k2 Frequencies of other sounds; x k2 N2 represents the frequency range of other sounds corresponding to the spectral energy of the frequencies of other sounds after the short-time fourier transform of the continuous signal.
In the feature extraction step by
Figure BDA0003642326580000032
Calculating the second order shift rate of the spectrum centroid, wherein B i Second order shift rate, S, representing the spectral centroid of the ith frame signal i Represents a spectral centroid shift rate of the ith frame signal,
Figure BDA0003642326580000033
C i representing the weighted spectral centroid of the ith frame signal.
In the threshold value setting step, the calculated 2/3 digit number and the average value in the short-time energy data are summed and then divided by 2 to be used as the short-time energy threshold value, the calculated 3/4 digit number in the spectrum centroid data is used as the spectrum centroid threshold value, and the calculated 3/4 digit number in the spectrum centroid second-order offset rate data is used as the spectrum centroid second-order offset rate threshold value.
In the spare valid sound segment extracting step, when the interval between adjacent spare valid sound segments does not exceed N frames, the adjacent spare valid sound segments are combined into one spare valid sound segment, wherein N = 4-6.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention uses the second-order offset rate of the spectrum centroid, combines the short-time energy and the spectrum centroid to be applied to the endpoint detection of the aquatic organisms, is based on the passive acoustic technology, and can not affect dolphin and other marine organisms; the method uses three characteristics, namely short-time energy and weighted spectrum mass center, which can respectively detect signals conforming to the characteristic of sounding behavior energy and signals conforming to the frequency range of dolphin sound in marine environment, the signals and the signals are mutually verified to obtain sounding behavior conforming to the frequency of dolphin sound, then the starting point and the ending point of the sounding behavior are detected through the second-order offset rate of the spectrum mass center, and complete dolphin sound is extracted according to the characteristic of duration of the sounding of dolphin sound, and the three characteristics complement each other to reduce the detection limitation; the method can continuously detect the threshold of each feature in the detection, can correct the abnormal value in time when the abnormal value occurs, and has better and better detection effect as the detection time is increased and the threshold is more and more fit with the boundary line of the noise time-frequency features in the marine environment; when the method is used for reading an environmental sample, the method has stronger adaptability, can meet the dolphin sound endpoint detection under the conditions of different signal-to-noise ratios, and the detection accuracy rate can reach more than 95%; the method can protect the dolphin and analyze the sound production habit of the dolphin, and lays a foundation for deep learning and researching dolphin sound production data sets.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a spectrum diagram of three sound signals of a dolphin;
FIG. 3 is a graph of temporal energy and spectral centroid changes for a segment of dolphin sound;
FIG. 4 is the results of three different sets of thresholds on the same segment of click signal detection;
FIG. 5 is a diagram showing the detection effect of the method under different SNR conditions;
fig. 6 is a graph comparing first and second order shift rates of the spectral centroid.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention can be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the appended claims of the present application.
The embodiment of the invention relates to a dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics, which comprises the following steps as shown in figure 1:
a data acquisition step: acquiring a sound digital signal in the sea; the sound digital signal includes a dolphin echo positioning signal (click), a pulse-pulse signal (burst-pulse), and a communication signal (whistle), and the spectrograms of these three signals are shown in fig. 2.
A data preprocessing step: cutting the acquired sound digital signal into digital signals with certain length, and performing normalization processing; since the time length of the voice data is different to affect the spectrogram of the observation signal, the embodiment stores the voice as a digital signal with a certain length, for example, the time length is 1s, so as to prevent the data from crossing the boundary, and performs normalization processing with the maximum amplitude of the input voice signal being 1.
A signal framing step: performing framing processing on each normalized digital signal; in this step, a square window function is selected for each normalized digital signal, and each digital signal is subjected to framing processing by adopting a method of which the step length is equal to the window length. The framing of the digital signal is essentially windowing in the time domain, and then the frequency domain analysis is performed on the windowed signal, wherein the longer the frame length is, the higher the frequency resolution is, and the lower the time resolution is. Because the duration of the click signal in the dolphin sound is 10 ms-183.5 ms, the frequency is concentrated in 100-150KHz, the duration of the burst-pulse signal and the whistler signal is 0.3-0.9 s, the frequency is concentrated in 8000-20000 Hz, the requirement of the extracted dolphin signal on the time resolution is higher than the requirement on the time resolution, so the frame length should be selected as small as possible, in order to adapt to different dolphin sound signals, 10ms is selected as the window length when the click signal is extracted, and 20ms-50ms is selected as the window length when the burst-pulse signal and the whistler signal are extracted.
A characteristic extraction step: calculating time domain characteristics of each frame signal, and calculating the short-time energy of each frame signal; and calculating the frequency domain characteristics of each frame of signal, calculating the weighted spectrum centroid of each frame of signal, and calculating the second-order offset rate of the spectrum centroid after obtaining the weighted spectrum centroid. The short-time energy method and the spectral centroid method are generally applied to voice endpoint detection, cannot adapt to endpoint detection in a marine environment, and are fuzzy for the detection of the starting point and the ending point of sound. The method uses the spectrum centroid second-order offset rate, combines with the short-time energy and the spectrum centroid to detect the end point of the dolphin sound, the short-time energy and the spectrum centroid can be verified mutually, a signal which is in accordance with the frequency range of the dolphin sound and is increased in energy is found out, and then the time point when the spectrum centroid is about to change is detected according to the spectrum centroid second-order offset rate in the method, so that the start point and the end point of the dolphin sound are detected. When the spectral centroid is calculated, the dolphin sound ranges are considered to be concentrated, weighting is carried out on different frequency ranges, the weight of the dolphin sound in the spectral range is set to be 0.6, the weight of the non-dolphin sound in the frequency range is set to be 0.4, and the weighted spectral centroid is obtained to increase the discrimination.
The short-time energy, the weighted spectrum centroid and the spectrum centroid second-order offset rate calculation formula are as follows:
short-time energy:
Figure BDA0003642326580000051
wherein, E i Is the short-time energy of the ith frame, N is the number of sampling points of the single-frame signal, x n (m) is the amplitude magnitude of each sample point.
Weighted spectral centroid:
Figure BDA0003642326580000052
wherein, C i Representing the weighted spectral centroid, Q, of the ith frame signal 1 The weight representing the dolphin sound spectral range is set to 0.6 in this embodiment 2 Representing other frequency spectraThe weight of the range is set to 0.4,F in this embodiment k1 The frequency of the dolphin sound; x k1 The spectral energy of the frequency of the dolphin sound corresponding to the continuous signal after short-time Fourier transform is shown as N1, F k2 Frequencies of other sounds; x k2 N2 represents the frequency range of other sounds corresponding to the spectral energy of the frequencies of other sounds after the short-time fourier transform of the continuous signal.
Because the signal components are complex, the waveform function in the signal is unknown, and the time interval between each frame in the time domain is small, the second-order offset ratio can be approximated by directly calculating through the spectrum centroid of each frame. Second order shift rate of spectral centroid:
Figure BDA0003642326580000061
wherein, B i Second order shift rate, S, representing the spectral centroid of the ith frame signal i Represents the spectral centroid shift rate of the ith frame signal,
Figure BDA0003642326580000062
C i representing the weighted spectral centroid of the ith frame signal.
A threshold value drawing step: and drawing up a short-time energy threshold, a spectrum centroid threshold and a spectrum centroid second-order offset rate threshold according to the time-frequency characteristics of the marine environment noise field and the dolphin sound. By observing the noise field in the marine environment, the noise field is generally uniformly distributed on a time axis, the energy is large, and the spectrum distribution range is wide and stable. The dolphin sounds are generally short-lived emergencies in the time domain, and occupy a small amount in the time domain, and are more concentrated in the frequency domain than the noise spectrum distribution. Therefore, this step can determine the threshold value by comparing the short-time energy and the fluctuation range of the spectrum centroid according to the noise distribution characteristics. Considering the propagation characteristics of sound, the high frequency component attenuation speed is faster, the short-time energy threshold value planned in the step is half of the sum of the average value of the whole signal energy and the 2/3 digit number of the short-time energy, and the spectrum centroid threshold value and the spectrum centroid second-order offset rate threshold value are the 3/4 digit number of the whole signal spectrum centroid and the 3/4 digit number of the spectrum centroid second-order offset rate.
Extracting the standby valid sound segment: and extracting frames above the short-time energy threshold and the spectral centroid threshold as spare valid sound segments. In the step, frames with short-time energy and spectral centroid higher than threshold are extracted as alternative effective sound segments, the maximum mute length is set to be 5 frames, namely if the interval between adjacent alternative effective sound segments does not exceed 5 frames, the alternative effective sound segments are combined into a spare effective sound segment.
An effective sound segment generating step: and obtaining second-order spectrum centroid migration rates at two ends of the standby effective sound segment, and when the second-order spectrum centroid migration rates at the two ends of the standby effective sound segment are higher than the threshold value of the second-order spectrum centroid migration rate, taking the frame as the starting point and the ending point of the dolphin sound to generate the effective sound segment. Comparing the magnitude of the spectrum centroid shift rate at two ends of each alternative effective sound segment, when the second-order shift rate of the spectrum centroid suddenly increases, the spectrum centroid is about to greatly increase at the moment, the dolphin is about to start sounding at the moment, when the second-order shift rate of the spectrum centroid suddenly decreases, the spectrum centroid is about to greatly decrease at the moment, and the sounding of the dolphin is about to terminate at the moment, so that the starting point and the terminating point of the dolphin sound can be detected by the method.
The invention is further illustrated by the following specific example.
1. Obtaining data
Experimental data sources are from foreign datasets such as "Voice in the sea" (https:// voicesite. Ucsd. Edu.), "whoi" (https:// cis. Whoi. Edu/science/B/whalesouns/index. Cfm), "Dolphins Underwater sources Database" (https:// ie-data. Org /). The experimental platform of this embodiment is python and edition, and three sound signals of dolphin are selected for experiment. Considering that the dolphin echo location signal is 10.5-183.5 ms, the audio is uniformly cut into 1s. In order to reduce the amount of calculation by a computer and prevent data from being out of range, normalization processing is performed with the maximum amplitude of an input sound signal being 1.
2. Data analysis and processing
To compare the effect of different window lengths on the test results. According to the spectrogram of the sound of the dolphin, the syllable duration of the dolphin sound is found to be 10 ms-1 s. And combining the rule that the longer the window is, the lower the time domain resolution is and the higher the frequency domain resolution is. In the method, the requirement on the time domain resolution is higher than that on the frequency domain resolution, so that windows of 10ms,20ms and 50ms are respectively selected for detection, and the obtained short-time energy and spectral centroid change are shown in fig. 3. The result shows that the shorter the window length is, the more rapid the change of the short-time energy and the spectrum mass center is, and the more sensitive the change of the sound is; the longer the window length, the smoother the variation of the short-term energy and spectral centroid. When detecting the dolphin echo positioning signal, higher time domain resolution and higher sensitivity are required, and the window length should be selected to be a smaller value; a slightly longer window length may be used when detecting dolphin pulse signals and communication ac signals.
In this embodiment, the echo positioning signal of the dolphin sound is detected, so that the frame division is performed according to the frame length of 10ms, the short-time energy of each frame is calculated, the short-time fourier transform is performed on each frame, the energy of the frequency component corresponding to each frame signal is obtained, wherein the weights of 8k to 30kHz and 100k to 150kHz are 0.6, the weights of other frequency ranges are 0.4, the weighted spectrum centroid is calculated, and the second-order offset rate of the weighted spectrum centroid is calculated.
And determining a threshold, wherein the threshold is easily interfered by burst high-frequency noise when a dolphin click signal is detected, so that the influence of different thresholds in an experiment on the endpoint detection accuracy is avoided. Three different sets of thresholds were chosen for the experiment: (1) A 55% demarcation value of short-time energy, a 2/3 digit number of a spectrum centroid and a 2/3 digit number of a second-order offset rate of the spectrum centroid; (2) The mean value of the sum of the mean value of the short-time energy and the 2/3 digit, the 3/4 digit of the spectrum centroid, and the 3/4 digit of the second-order offset rate of the spectrum centroid; (3) The average of the short-time energy 2/3 digit number, the spectrum centroid 2/3 digit number and the spectrum centroid 3/4 digit number, and the average of the spectrum centroid second-order offset rate 2/3 digit number and the spectrum centroid second-order offset rate 3/4 digit number respectively carry out end point detection on the dolphin click signal in the same section of marine environment, and the detection result is shown in figure 4. The 1 st set of thresholds is too sensitive to signal variations, cutting the signal very sporadically, the 3 rd set of thresholds missing part of the click signal segment, and finally the 2 nd set of thresholds is selected as the detection threshold.
In order to test the adaptability of the endpoint detection, the sound under the environment with the signal to noise ratio of 25dB, -10dB and-24 dB is respectively selected for experiment, experimental data are screened by audio software, and the obtained results are 6102500Q in the voice in the sea, 61025008 in the whoi and 61025004 in the whoi respectively, and are shown in fig. 5, wherein (a) is the detection result under the environment with the signal to noise ratio of 25dB, (b) is the detection result under the environment with the signal to noise ratio of-10 dB, and (c) is the detection result under the environment with the signal to noise ratio of-24 dB. The result shows that the dolphin sound endpoint detection method based on the time-frequency characteristic ground environment noise used in the embodiment can reduce the interference of burst noise, and can still have good performance even in the environment with low signal-to-noise ratio.
Extracting spare valid sound segments, extracting frames exceeding the threshold values of short-time energy and spectral centroid, adding the frames into alternative valid sound segments, and combining adjacent alternative valid sound segments into one segment when the interval between the alternative limited sound segments does not exceed 5 frames.
In order to find the starting point and the ending point of the sound production of the dolphin, a first-order spectrum centroid shift rate, namely the change of each frame of spectrum centroid relative to the last frame of spectrum centroid, and a second-order spectrum centroid shift rate, namely the change trend of the spectrum centroid is introduced, as shown in fig. 6. The method can find that when the spectrum centroid changes at a certain moment, the first-order offset rate of the spectrum centroid does not change yet, expression of the change has hysteresis, and the trend of the change of the spectrum centroid can be shown when the second-order offset rate reaches the peak value at the point, so that the starting point and the ending point of the sound production of the dolphin can be detected.
And comparing the second-order spectrum centroid migration rate outwards at the two ends of the alternative effective sound segment, and adding the second-order spectrum centroid migration rate into the alternative effective sound segment as a starting point and an ending point when the second-order spectrum centroid migration rate is greater than a threshold value to generate a complete effective sound segment.
3. Checking threshold value in detection process
Recording the short-time energy, the spectrum centroid and the second-order offset rate threshold of the spectrum centroid of each frame, checking the threshold once every 10s, and taking the average value of the thresholds of the previous 10s as the detection threshold of the frame and the initial value of the next check when the threshold has an abnormal value.
4. Generating valid segments
And cutting the processed digital signal cache according to the detected effective sound segment, transferring the cut file to an effective sound segment folder, deleting the digital signal cache, and deleting the digital signal cache from the cache file list, thereby accelerating the speed of reading the cache folder next time. After the treatment, the statistical results are shown in table 1.
TABLE 1 end point detection results
Figure BDA0003642326580000081
The method is simple, and does not influence the dolphin; the method of the embodiment can be suitable for dolphin sound endpoint detection in marine environments with different signal-to-noise ratios; the method of the embodiment can avoid the interference of part of burst high-frequency noise to the detection, has high speed and small error, does not need manual intervention, and improves the efficiency of dolphin sound detection.

Claims (8)

1. A dolphin sound endpoint detection method under environmental noise based on time-frequency characteristics is characterized by comprising the following steps:
a data acquisition step: acquiring a sound digital signal in the ocean;
a data preprocessing step: cutting the acquired sound digital signal into digital signals with certain length, and carrying out normalization processing;
a signal framing step: performing framing processing on each normalized digital signal;
a characteristic extraction step: calculating time domain characteristics of each frame of signal, and calculating the short-time energy of each frame of signal; calculating frequency domain characteristics of each frame of signal, calculating a weighted spectrum centroid of each frame of signal, and calculating a second-order offset rate of the spectrum centroid after obtaining the weighted spectrum centroid;
threshold value drafting: drawing a short-time energy threshold, a spectrum centroid threshold and a spectrum centroid second-order offset rate threshold according to time-frequency characteristics of a marine environment noise field and dolphin sound;
extracting the standby valid sound segment: extracting frames higher than the short-time energy threshold and the spectrum centroid threshold as standby valid sound segments;
an effective sound segment generating step: and obtaining second-order spectrum centroid migration rates at two ends of the standby effective sound segment, and when the second-order spectrum centroid migration rates at the two ends of the standby effective sound segment are higher than the threshold value of the second-order spectrum centroid migration rate, taking the frame as a starting point and an ending point of the dolphin sound to generate the effective sound segment.
2. The method for detecting dolphin's sound endpoint under environmental noise based on time-frequency characteristics as claimed in claim 1, wherein in said step of data preprocessing, the maximum amplitude of the input sound signal is normalized to 1.
3. The method according to claim 1, wherein in the signal framing step, a square window function is selected for each normalized digital signal, and each digital signal is framed by a method with a step length equal to the window length.
4. The method as claimed in claim 3, wherein the window length is 10ms when detecting that the dolphin sound is an echo location signal emitted from a dolphin, and is 20ms to 50ms when the dolphin sound is an impulse signal and a communication signal emitted from a dolphin.
5. The method as claimed in claim 1, wherein the calculating the weighted spectral centroid of each frame of signal in the feature extraction step comprises performing short-time Fourier transform on each frame of signal, setting the weights of the spectral range of dolphin's sound and the weights of other spectral ranges, and performing Fourier transform on the frames of signal to obtain the weighted spectral centroid of each frame of signal
Figure FDA0003642326570000011
Calculating a weighted spectral centroid of each frame signal, wherein C i Representing weighted spectral centroid, Q, of the ith frame signal 1 Weight, Q, representing the dolphin's sound spectral range 2 Weights, F, representing other spectral ranges k1 Frequency of dolphin sounds; x k1 The spectral energy of the frequency of the dolphin sound corresponding to the continuous signal after short-time Fourier transform, N1 represents the frequency range of the dolphin sound, F k2 Frequencies of other sounds; x k2 N2 represents the frequency range of other sounds corresponding to the spectral energy of the frequencies of other sounds after the short-time fourier transform of the continuous signal.
6. The method for detecting dolphin's sound endpoint under environmental noise based on time-frequency characteristics as claimed in claim 1, wherein said characteristic extraction step is performed by
Figure FDA0003642326570000021
Calculating the second order shift rate of the spectrum centroid, wherein B i Second order shift rate, S, representing the spectral centroid of the ith frame signal i Represents a spectral centroid shift rate of the ith frame signal,
Figure FDA0003642326570000022
C i representing the weighted spectral centroid of the ith frame signal.
7. The method for detecting the sound endpoint of the dolphin under the environmental noise based on the time-frequency characteristics as claimed in claim 1, wherein in the threshold value setting step, the calculated 2/3 bits and the average value in the short-time energy data are summed and then divided by 2 to obtain the short-time energy threshold value, the calculated 3/4 bits in the spectral centroid data are used as the spectral centroid threshold value, and the calculated 3/4 bits in the spectral centroid second-order offset rate data are used as the spectral centroid second-order offset rate threshold value.
8. The method for detecting Dolphin sound endpoint under ambient noise based on time-frequency characteristics as claimed in claim 1, wherein in said step of extracting the valid standby segments, when the interval between adjacent valid standby segments does not exceed N frames, the adjacent valid standby segments are combined into one valid standby segment, where N = 4-6.
CN202210522575.4A 2022-05-13 2022-05-13 Dolphin sound endpoint detection method under environment noise based on time-frequency characteristics Active CN115424639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210522575.4A CN115424639B (en) 2022-05-13 2022-05-13 Dolphin sound endpoint detection method under environment noise based on time-frequency characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210522575.4A CN115424639B (en) 2022-05-13 2022-05-13 Dolphin sound endpoint detection method under environment noise based on time-frequency characteristics

Publications (2)

Publication Number Publication Date
CN115424639A true CN115424639A (en) 2022-12-02
CN115424639B CN115424639B (en) 2024-07-16

Family

ID=84196890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210522575.4A Active CN115424639B (en) 2022-05-13 2022-05-13 Dolphin sound endpoint detection method under environment noise based on time-frequency characteristics

Country Status (1)

Country Link
CN (1) CN115424639B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170004840A1 (en) * 2015-06-30 2017-01-05 Zte Corporation Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof
US20180158463A1 (en) * 2016-12-07 2018-06-07 Interactive Intelligence Group, Inc. System and method for neural network based speaker classification
CN110415729A (en) * 2019-07-30 2019-11-05 安谋科技(中国)有限公司 Voice activity detection method, device, medium and system
CN110800053A (en) * 2017-06-13 2020-02-14 米纳特有限公司 Method and apparatus for obtaining event indications based on audio data
US10602270B1 (en) * 2018-11-30 2020-03-24 Microsoft Technology Licensing, Llc Similarity measure assisted adaptation control

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170004840A1 (en) * 2015-06-30 2017-01-05 Zte Corporation Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof
US20180158463A1 (en) * 2016-12-07 2018-06-07 Interactive Intelligence Group, Inc. System and method for neural network based speaker classification
CN110800053A (en) * 2017-06-13 2020-02-14 米纳特有限公司 Method and apparatus for obtaining event indications based on audio data
US10602270B1 (en) * 2018-11-30 2020-03-24 Microsoft Technology Licensing, Llc Similarity measure assisted adaptation control
CN110415729A (en) * 2019-07-30 2019-11-05 安谋科技(中国)有限公司 Voice activity detection method, device, medium and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NAN JIANG,等: "An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means", 《MATHEMATICAL PROBLEMS IN ENGINEERING》, 12 September 2020 (2020-09-12) *
杨昱皞,等: "时频特征的海豚发声端点检测方法研究", 《应用声学》, 12 August 2022 (2022-08-12) *
杨昱皞: "基于被动声学的中华白海豚声音信号检测与分析", 《中国优秀硕士学位论文全文数据库》, 15 July 2023 (2023-07-15) *

Also Published As

Publication number Publication date
CN115424639B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN105611477B (en) The voice enhancement algorithm that depth and range neutral net are combined in digital deaf-aid
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN105118522B (en) Noise detection method and device
CN110120225A (en) A kind of audio defeat system and method for the structure based on GRU network
Wang et al. ia-PNCC: Noise Processing Method for Underwater Target Recognition Convolutional Neural Network.
CN113567969B (en) Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals
CN110376575B (en) Low-frequency line spectrum detection method based on damping parameter matching stochastic resonance
CN112951259B (en) Audio noise reduction method and device, electronic equipment and computer readable storage medium
CN112786059A (en) Voiceprint feature extraction method and device based on artificial intelligence
WO2019232833A1 (en) Speech differentiating method and device, computer device and storage medium
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
CN112394324A (en) Microphone array-based remote sound source positioning method and system
CN111524520A (en) Voiceprint recognition method based on error reverse propagation neural network
Castro et al. Automatic manatee count using passive acoustics
Qiao et al. Spectral entropy based dolphin whistle detection algorithm and its possible application for biologically inspired communication
CN115424639B (en) Dolphin sound endpoint detection method under environment noise based on time-frequency characteristics
CN111261192A (en) Audio detection method based on LSTM network, electronic equipment and storage medium
CN115932808A (en) Passive sonar intelligent detection method based on multi-feature fusion
Cournapeau et al. Evaluation of real-time voice activity detection based on high order statistics.
Li et al. Robust speech endpoint detection based on improved adaptive band-partitioning spectral entropy
CN110610724A (en) Voice endpoint detection method and device based on non-uniform sub-band separation variance
Pham et al. Performance analysis of wavelet subband based voice activity detection in cocktail party environment
Roch et al. Detection, classification, and localization of cetaceans by groups at the scripps institution of oceanography and San Diego state university (2003-2013)
CN118173107B (en) Bird sound quality analysis method based on multi-mode depth feature level fusion
TWI841271B (en) A method for detecting blue whale acoustic signals based on energy spectrum entropy of intrinsic mode function concentration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant